Added text to ELMo Embeddings notebook

jzqb · Oct 2, 2018 · 7010c23 · 7010c23
1 parent fcd6133
commit 7010c23
Show file tree

Hide file tree

Showing 2 changed files with 92 additions and 25 deletions.
diff --git a/Elmo Embeddings.ipynb b/Elmo Embeddings.ipynb
@@ -1,5 +1,23 @@
 {
  "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Keras sentiment analysis with Elmo Embeddings\n",
+    "\n",
+    "One of the recent trends in Natural Language Processing is transfer learning. Transfer learning allows NLP models to learn more from fewer examples. In this notebook, we experiment with so-called [ELMo Embeddings](https://allennlp.org/elmo), a new approach to word embeddings that relies on a large unlabelled text corpus to understand word meaning in context. ELMo Embeddings are available from [Tensorflow Hub](https://alpha.tfhub.dev/google/elmo/2)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Preparation\n",
+    "\n",
+    "Let's first install and import all the required libraries."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 1,
@@ -124,6 +142,15 @@
     "sess.run(tf.tables_initializer())"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## ELMo Embeddings\n",
+    "\n",
+    "A quick example will illustrate how ELMo Embeddings work. When we pass to our model a list of sentences (either as strings or as lists of tokens), we get back a list of 1024-dimensional embeddings for every sentence. These are the ELMo embeddings of the tokens in the sentence. "
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 5,
@@ -136,6 +163,15 @@
     "    as_dict=True)[\"elmo\"]"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Sentiment analysis\n",
+    "\n",
+    "In this experiment, we're going to build a simple neural network for sentiment analysis. As our training and test data, we use the IMDB movie reviews that come pre-packaged with Keras. We shuffle the reviews and pad all texts to a maximum length of 500."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 6,
@@ -181,6 +217,15 @@
     "X_test = sequence.pad_sequences(X_test, maxlen=SEQ_LENGTH)"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Simple embeddings\n",
+    "\n",
+    "For our baseline, we're going to work with standard word embeddings. These map every token to a 300-dimensional embedding, irrespective of the context in which the token occurs. We'll use the English word embeddings from Facebook Research's [MUSE project](https://github.com/facebookresearch/MUSE)."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 7,
@@ -208,6 +253,13 @@
     "!wget https://s3.amazonaws.com/arrival/embeddings/wiki.multi.en.vec -O /tmp/wiki.multi.en.vec"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We load these embeddings and put the ones we need in an embedding matrix, where their row indices correspond to the token indices that Keras has assigned to the tokens in the IMDB corpus."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 8,
@@ -277,6 +329,15 @@
     "                                              embeddings_en, VOCABULARY_SIZE+INDEX_FROM-1, EMBEDDING_DIM)"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Models\n",
+    "\n",
+    "We build two models for text classification, which are identical apart from the first layer. Our basic model has a simple embedding layer, were tokens are mapped to their static embeddings. Our ELMo model has a more complex first layer, where the static embedding for each token is concatenated to the ELMo embedding for that token in the relevant context. This results in a 1,324-dimensional embedding for each token in context. In both models, this embedding layer is followed by a simple convolution with kernel size 3, a maximum pooling operation, a dense layer, and finally a final layer that predicts the sentiment of each text as a number between 0 and 1.  "
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 42,
@@ -344,6 +405,15 @@
     "    return model"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Training\n",
+    "\n",
+    "We train both models for a maximum of 100 epochs, but we stop earlier when the validation loss hasn't improved for two epochs. We save and evaluate the model with the lowest validation loss. Although we didn't make a big effort to tune the learning rate, we did find that the ELMo model benefits from having a much smaller initial learning rate than the basic model. We use the same decay rate for both models. "
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 43,
@@ -378,6 +448,13 @@
     "    return scores[1]*100"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Because the ELMo model is quite slow, we chose to work with relatively small datasets. We train on just 200 training examples, validate the model on another 200 examples after each epoch, and test its final performance on 500 test examples. We repeat this process 10 times, and choose the training and validation examples randomly from the larger IMDB training set on each iteration. The ELMo model is trained, validated and tested on exactly the same examples as the basic model.  "
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 45,
@@ -517,13 +594,7 @@
       "200/200 [==============================] - 119s 596ms/step - loss: 0.0348 - acc: 1.0000 - val_loss: 0.5247 - val_acc: 0.7200\n",
       "Epoch 6/100\n",
       "200/200 [==============================] - 119s 595ms/step - loss: 0.0129 - acc: 1.0000 - val_loss: 0.4654 - val_acc: 0.7600\n",
-      "Epoch 7/100\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
+      "Epoch 7/100\n",
       "200/200 [==============================] - 119s 595ms/step - loss: 0.0063 - acc: 1.0000 - val_loss: 0.5201 - val_acc: 0.7350\n",
       "Epoch 8/100\n",
       "200/200 [==============================] - 119s 595ms/step - loss: 0.0041 - acc: 1.0000 - val_loss: 0.4939 - val_acc: 0.7400\n",
@@ -789,13 +860,7 @@
       "200/200 [==============================] - 139s 696ms/step - loss: 0.7269 - acc: 0.5750 - val_loss: 0.6576 - val_acc: 0.5750\n",
       "Epoch 2/100\n",
       "200/200 [==============================] - 119s 595ms/step - loss: 0.4709 - acc: 0.8800 - val_loss: 0.6180 - val_acc: 0.6500\n",
-      "Epoch 3/100\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
+      "Epoch 3/100\n",
       "200/200 [==============================] - 119s 596ms/step - loss: 0.2449 - acc: 0.9650 - val_loss: 0.6223 - val_acc: 0.6700\n",
       "Epoch 4/100\n",
       "200/200 [==============================] - 119s 596ms/step - loss: 0.0785 - acc: 1.0000 - val_loss: 0.5302 - val_acc: 0.7650\n",
@@ -908,6 +973,15 @@
     "print(np.mean(elmo_accuracies))\n"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Results\n",
+    "\n",
+    "When we train a basic sentiment analysis model on just 200 training examples, the results are hit and miss: the accuracies on unseen texts range from just 48% to 78%. When we replace the simple embedding layer by an ELMo embedding layer, however, the model performs about 10% better on average. Its accuracies were also much more consistent, between 73% and 78%. "
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 48,
@@ -973,20 +1047,13 @@
     "\n",
     "plt.show()"
    ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": []
   }
  ],
  "metadata": {
   "kernelspec": {
-   "display_name": "conda_tensorflow_p36",
+   "display_name": "Python 3",
    "language": "python",
-   "name": "conda_tensorflow_p36"
+   "name": "python3"
   },
   "language_info": {
    "codemirror_mode": {
@@ -998,7 +1065,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.6.5"
+   "version": "3.6.3"
   }
  },
  "nbformat": 4,

diff --git a/README.md b/README.md
@@ -13,4 +13,4 @@ A collection of notebooks for Natural Language Processing from NLP Town
 
 ## Transfer Learning
 
-1. [Keras text classification with Elmo Embeddings](https://github.com/nlptown/nlp-notebooks/blob/master/Elmo%20Embeddings.ipynb)
+1. [Keras sentiment analysis with Elmo Embeddings](https://github.com/nlptown/nlp-notebooks/blob/master/Elmo%20Embeddings.ipynb)