This is my code for kaggle's Product Classification Challenge. I had a write-up about the solution in my blog. The final model uses an ensemble of two levels by stacking. 30 runs can get 0.4192 on private LB (top 5%). Training data for level 1 and level 2 was roughly split into 1:1 and one can fine tune this to get a better result.
This competition attracted me because:
- Lots of competitors and lots of solutions in the forum to learn from.
- Feature engineering was quite limited, thus I could focus on trying different models.
- A good chance to use model ensembling.
- CUDA Toolkit 7.0
- Python 2.7.6
- Lasagne 0.1.dev
- nolearn 0.5
- numpy 1.8.2
- pandas 0.13.1
- scikit-learn 0.16.1
- scipy 0.13.3
- Theano 0.7.0
- xgboost 0.4
- Put data into the
data
dir - run
python ensembler.py