Kaggle Competition Solution (5th)

NeurIPS 2024 - Predict New Medicines with BELKA

https://www.kaggle.com/competitions/leash-BELKA

For discussion, please refer to:
https://www.kaggle.com/competitions/leash-BELKA/discussion/456084

1. Hardware

GPU: 2x Nvidia Ada A6000 (Ampere), each with VRAM 48 GB
CPU: Intel® Xeon(R) w7-3455 CPU @ 2.5GHz, 24 cores, 48 threads
Memory: 256 GB RAM

2. OS

ubuntu 22.04.4 LTS

3. Set Up Environment

Install Python >=3.10.9
Install requirements.txt in the python environment
Set up the directory structure as shown below.

└── <solution_dir>
    ├── src 
    ├── result
    ├── data
    |   ├── processed
    |   |     ├──all_buildingblock.csv
    |   ├── kaggle 
    |         ├── leash-BELKA
    |               ├── sample_submission.csv
    │               ├── train.parquet
    │               ├── test.parquet
    ├── LICENSE 
    ├── README.md

Modify the path setting by editing "/src/third_party/_current_dir_.py"

# please use full path 
KAGGLE_DATA_DIR = '<solution_dir>/data/kaggle'
PROCESSED_DATA_DIR = '<solution_dir>/data/processed'
RESULT_DIR = '<solution_dir>/result'

Download kaggle dataset "leash-BELKA" from:
https://www.kaggle.com/competitions/leash-BELKA/data
Create processed data by run the python script:

python "/src/process-data-01/run_make_data.py"

There are 98 millions molecules in the train data. Hence processing the data can take very long time.
Alternatively, you can download processed data from the share google drive at :
/leash-BELKA-solution/data/processed
https://drive.google.com/drive/folders/1bEBGtTJrQlYc_MQRYceBp0Kb9zGYue9H?usp=drive_link

4. Training the model

Warning !!! training output will be overwritten to the "/result" folder

Please run the following python scripts to learn the model files

python "/src/cnn1d-nonshare-05-mean-layer5-bn/run_train.py"
output model:
- /result/cnn1d-mean-pool-ly5-bn-01/fold-0/checkpoint/00400000.pth
- /result/cnn1d-mean-pool-ly5-bn-01/fold-1/checkpoint/00550000.pth
- /result/cnn1d-mean-pool-ly5-bn-01/fold-3/checkpoint/00415000.pth

python "/src/transformer-fa-03/run_train.py"
output model:
- /result/transfomer-fa-03/fold-2/checkpoint/00264000.pth
- /result/transfomer-fa-03/fold-4/checkpoint/00264000.pth

python "/src/mamba-03/run_train.py"
output model:
- /result/mamba-03/checkpoint/00255000.pth

If you want to do local validation, you can run the scripts:

python "/src/cnn1d-nonshare-05-mean-layer5-bn/run_valid.py"
python "/src/transformer-fa-03/run_valid.py"
python "/src/mamba-03/run_valid.py"

5. Submission csv

Please run the following scripts:

python "/src/cnn1d-nonshare-05-mean-layer5-bn/run_submit.py"
python "/src/transformer-fa-03/run_submit.py"
python "/src/mamba-03/run_submit.py"
python "/src/run_ensemble.py"
output file:
- /result/final-3fold-tx2a-mamba-fix.submit.csv

6. Reference trained models and validation results

Reference results can also be found in the share google drive at :
/leash-BELKA-solution/result
https://drive.google.com/drive/folders/1bEBGtTJrQlYc_MQRYceBp0Kb9zGYue9H?usp=drive_link
It includes the weight files, train/validation logs.

Authors

https://www.kaggle.com/hengck23

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgement

"We extend our thanks to HP for providing the Z8 Fury-G5 Data Science Workstation, which empowered our deep learning experiments. The high computational power and large GPU memory enabled us to design our models swiftly."

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

readme.md

readme.md

Kaggle Competition Solution (5th)

NeurIPS 2024 - Predict New Medicines with BELKA

1. Hardware

2. OS

3. Set Up Environment

4. Training the model

Warning !!! training output will be overwritten to the "/result" folder

5. Submission csv

6. Reference trained models and validation results

Authors

License

Acknowledgement

Files

readme.md

Latest commit

History

readme.md

File metadata and controls

Kaggle Competition Solution (5th)

NeurIPS 2024 - Predict New Medicines with BELKA

1. Hardware

2. OS

3. Set Up Environment

4. Training the model

Warning !!! training output will be overwritten to the "/result" folder

5. Submission csv

6. Reference trained models and validation results

Authors

License

Acknowledgement