Update tutorial with lightgbm to catboost conversion

Note: mandatory check (NEED_CHECK) was skipped ref:b1a5299a8e5e892627442f727d41de53d4929953
stasvmk · Sep 25, 2019 · befadef · befadef
1 parent f6b05eb
commit befadef
Show file tree

Hide file tree

Showing 2 changed files with 34 additions and 12 deletions.
diff --git a/README.md b/README.md
@@ -73,6 +73,10 @@ It's better to start CatBoost exploring from this basic tutorials.
 * [Apply CatBoost model from Rust](apply_model/rust/train_model.ipynb)
     * Explore how to apply CatBoost model from Rust application. If you just want to look at code snippets you can go directly to [main.rs](apply_model/rust/src/main.rs)
 
+* [Convert LightGBM to CatBoost to use CatBoost fast appliers](apply_model/fast_light_gbm_applier.ipynb)
+    * Convert LightGBM to CatBoost, save resulting CatBoost model and use CatBoost C++, Python, C# or other applier, which in case of not symmetric trees will be around 7-10 faster than native LightGBM one.
+    * Note that CatBoost applier with CatBoost models is even faster, because it uses specific fast symmetric trees.
+
 ## Tools
 
 * [Gradient Boosting: CPU vs GPU](tools/google_colaboratory_cpu_vs_gpu_tutorial.ipynb)

diff --git a/..._model/tutorial_convert_onnx_models.ipynb → apply_model/fast_light_gbm_applier.ipynb b/..._model/tutorial_convert_onnx_models.ipynb → apply_model/fast_light_gbm_applier.ipynb
@@ -4,12 +4,14 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Using ONNX models in CatBoost\n",
+    "# Save LightGBM model in CatBoost format to use fast CatBoost appliers\n",
     "\n",
-    "It is easy to apply ONNX models using CatBoost.\n",
-    "+ Save your model in the ONNX format\n",
+    "To save LightGBM in CatBoost format you need to convert LightGBM model to ONNX, and then to convert the model from ONNX to CatBoost.\n",
+    "+ Save LightGBM model in the ONNX format\n",
     "+ Load the ONNX model into CatBoost using the load_model() method\n",
-    "+ Apply your model in CatBoost using the predict() method\n",
+    "+ Apply your model in CatBoost using the predict() method or save it as file and use with other appliers\n",
+    "\n",
+    "Note, that this tutorial will only work for Python 3.* since onnxmltools \n",
     "\n",
     "Let us follow this scenario step-by-step for a LightGBM model."
    ]
@@ -24,7 +26,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 1,
+   "execution_count": 2,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -50,7 +52,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 2,
+   "execution_count": 3,
    "metadata": {},
    "outputs": [
     {
@@ -78,7 +80,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 3,
+   "execution_count": 4,
    "metadata": {
     "scrolled": false
    },
@@ -103,12 +105,12 @@
    "metadata": {},
    "source": [
     "\n",
-    "Load the ONNX model into CatBoost and compare the CatBoost and LightGBM predictions:"
+    "Load the ONNX model into CatBoost and print predictions to make sure they are correct."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 4,
+   "execution_count": 5,
    "metadata": {},
    "outputs": [
     {
@@ -125,13 +127,29 @@
     "catboost_predict = catboost_model.predict(X)\n",
     "print(catboost_predict)"
    ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Save the model in CatBoost format for later use in C++, C#, Java or Python code with fast CatBoost applier."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "catboost_model.save_model('model.bin')"
+   ]
   }
  ],
  "metadata": {
   "kernelspec": {
-   "display_name": "Python 3",
+   "display_name": "mypy37",
    "language": "python",
-   "name": "python3"
+   "name": "mypy37"
   },
   "language_info": {
    "codemirror_mode": {
@@ -143,7 +161,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.7.3"
+   "version": "3.6.7"
   }
  },
  "nbformat": 4,