GitHub - ChunhanLi/dsb2019: https://www.kaggle.com/c/data-science-bowl-2019

Below you can find a outline of how to reproduce my solution for the 2019 Data Science Bowl competition. If you run into any trouble with the setup/code or have any questions please contact me at kfaceapi@gmail.com

Archive contents

3rd_solution.tgz : contains original code, trained models etc

3rd_solution/
├── input/
│   ├── processed/
│   └── data-science-bowl-2019/
├── models/ # 
└── code/

input/processed/ : will be created by the preprocessing step (by running python prepare_data.py)
input/data-science-bowl-2019/ : raw data dir of the competition (should contain 'train.csv', 'test.csv', etc)
models/ : contains trained models used to generate a 'submission.csv' file
code/ : contains codes for training and prediction

Requirements

Ubuntu 18.04.4 LTS
Python 3.7.5
pytorch 1.3
transformers 2.3.0 (or pytorch-transformers 1.2.0)

You can use the pip install -r requirements.txt to install the necessary packages.

Hardware

CPU: 3 x AMD® Ryzen 9 3900X (3 PCs)
GPU: 5 x NVIDIA RTX2080Ti 11G (2 GPUs in 1 PC)
RAM: 64G

The above is just my PC spec. In fact, GTX 1080 is enough for training.

Prepare Data

You can generate bowl.pt, bowl_info.pt files by running prepare_data.py

.../3rd_solution/code$ python prepare_data.py

Train Model (GPU needed)

You can reproduce models already in the model/ directory by running the train.sh script. The train.sh reproduce 5 models (5-fold)

.../3rd_solution/code$ bash train.sh

Predict

You can reproduce the submissions/submission.csv by running the predict.py

.../3rd_solution/code$ python predict.py

Option - Validate (GPU needed)

You can reproduce the local CV score and coefficients by running the validate.py

VALID KAPPA_SCORE:0.5684061832361282
coefficients=[0.53060865, 1.66266655, 2.31145611]

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
code		code
models		models
README.md		README.md
directory_structure.txt		directory_structure.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Archive contents

Requirements

Hardware

Prepare Data

Train Model (GPU needed)

Predict

Option - Validate (GPU needed)

About

Releases

Packages

Languages

ChunhanLi/dsb2019

Folders and files

Latest commit

History

Repository files navigation

Archive contents

Requirements

Hardware

Prepare Data

Train Model (GPU needed)

Predict

Option - Validate (GPU needed)

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages