Cuixiaom/transformer mlperf int8 pr (intel#133)

* Added transformer_mlperf int8 inference model into model zoo * Added missing transformer_mlperf int8 launch_benchmark api files * Updated the README files, and move the instruction to run the transformer_mlperf inference into the README file in the benchmark directory * Added README files for transformer_mlperf inference model with different data type * Minor typos fix * minor README changes
ZhaoqiongZ · Sep 20, 2021 · 24e4444 · 24e4444
1 parent 469a073
commit 24e4444
Show file tree

Hide file tree

Showing 33 changed files with 5,386 additions and 240 deletions.
diff --git a/benchmarks/README.md b/benchmarks/README.md
@@ -45,6 +45,7 @@ For information on running more advanced use cases using the workload containers
 | Language Translation    | [BERT](https://arxiv.org/pdf/1810.04805.pdf) | Inference |  | [FP32](language_translation/tensorflow/bert/README.md#fp32-inference-instructions) |
 | Language Translation    | [GNMT*](https://arxiv.org/pdf/1609.08144.pdf) | Inference | Model Containers: [FP32](https://software.intel.com/content/www/us/en/develop/articles/containers/gnmt-fp32-inference-tensorflow-container.html) <br> Model Packages: [FP32](https://software.intel.com/content/www/us/en/develop/articles/containers/gnmt-fp32-inference-tensorflow-model.html) | [FP32](language_translation/tensorflow/mlperf_gnmt/inference/fp32/README.md) |
 | Language Translation    | [Transformer_LT_mlperf](https://arxiv.org/pdf/1706.03762.pdf) | Training | Model Containers: [FP32](https://software.intel.com/content/www/us/en/develop/articles/containers/transformer-lt-mlperf-fp32-training-tensorflow-container.html) [BFloat16**](https://software.intel.com/content/www/us/en/develop/articles/containers/transformer-lt-mlperf-bfloat16-training-tensorflow-container.html) <br> Model Packages: [FP32](https://software.intel.com/content/www/us/en/develop/articles/containers/transformer-lt-mlperf-fp32-training-tensorflow-model.html) [BFloat16**](https://software.intel.com/content/www/us/en/develop/articles/containers/transformer-lt-mlperf-bfloat16-training-tensorflow-model.html) | [FP32](language_translation/tensorflow/transformer_mlperf/training/fp32/README.md) [BFloat16**](language_translation/tensorflow/transformer_mlperf/training/bfloat16/README.md) |
+| Language Translation    | [Transformer_LT_mlperf](https://arxiv.org/pdf/1706.03762.pdf) | Inference | | [FP32](language_translation/tensorflow/transformer_mlperf/Inference/fp32/README.md) [BFloat16**](language_translation/tensorflow/transformer_mlperf/Inference/bfloat16/README.md) [INT8](language_translation/tensorflow/transformer_mlperf/Inference/int8/README.md) |
 | Language Translation    | [Transformer_LT_Official](https://arxiv.org/pdf/1706.03762.pdf) | Inference | Model Containers: [FP32](https://software.intel.com/content/www/us/en/develop/articles/containers/transformer-lt-official-fp32-inference-tensorflow-container.html) <br> Model Packages: [FP32](https://software.intel.com/content/www/us/en/develop/articles/containers/transformer-lt-official-fp32-inference-tensorflow-model.html)  | [FP32](language_translation/tensorflow/transformer_lt_official/inference/fp32/README.md) |
 | Object Detection        | [Faster R-CNN](https://arxiv.org/pdf/1506.01497.pdf) | Inference | Model Containers: [Int8](https://software.intel.com/content/www/us/en/develop/articles/containers/faster-rcnn-int8-inference-tensorflow-container.html) [FP32](https://software.intel.com/content/www/us/en/develop/articles/containers/faster-rcnn-fp32-inference-tensorflow-container.html) <br> Model Packages: [Int8](https://software.intel.com/content/www/us/en/develop/articles/containers/faster-rcnn-int8-inference-tensorflow-model.html) [FP32](https://software.intel.com/content/www/us/en/develop/articles/containers/faster-rcnn-fp32-inference-tensorflow-model.html)  | [Int8](object_detection/tensorflow/faster_rcnn/inference/int8/README.md) [FP32](object_detection/tensorflow/faster_rcnn/inference/fp32/README.md) |
 | Object Detection        | [R-FCN](https://arxiv.org/pdf/1605.06409.pdf) | Inference | Model Containers: [Int8](https://software.intel.com/content/www/us/en/develop/articles/containers/rfcn-int8-inference-tensorflow-container.html) [FP32](https://software.intel.com/content/www/us/en/develop/articles/containers/rfcn-fp32-inference-tensorflow-container.html) <br> Model Packages: [Int8](https://software.intel.com/content/www/us/en/develop/articles/containers/rfcn-int8-inference-tensorflow-model.html) [FP32](https://software.intel.com/content/www/us/en/develop/articles/containers/rfcn-fp32-inference-tensorflow-model.html)  | [Int8](object_detection/tensorflow/rfcn/inference/int8/README.md) [FP32](object_detection/tensorflow/rfcn/inference/fp32/README.md) |

diff --git a/benchmarks/common/tensorflow/start.sh b/benchmarks/common/tensorflow/start.sh
@@ -1344,7 +1344,7 @@ function transformer_mlperf() {
   fi
 
   if [[ ${MODE} == "inference" ]]; then
-    if [[ (${PRECISION} == "bfloat16") || ( ${PRECISION} == "fp32") ]]; then
+    if [[ (${PRECISION} == "bfloat16") || ( ${PRECISION} == "fp32") || ( ${PRECISION} == "int8") ]]; then
 
       if [[ -z "${params}" ]]; then
           echo "transformer-language requires --params arg to be defined"

diff --git a/benchmarks/language_translation/tensorflow/transformer_mlperf/README.md b/benchmarks/language_translation/tensorflow/transformer_mlperf/README.md
@@ -4,118 +4,8 @@ The following documents have instructions for how to run Transformer Language us
 Benchmark suites for the following modes/platforms:
 * [FP32 training](/benchmarks/language_translation/tensorflow/transformer_mlperf/training/fp32/README.md)
 * [Bfloat16 training](/benchmarks/language_translation/tensorflow/transformer_mlperf/training/bfloat16/README.md)
-* [FP32 inference](#fp32-inference-instructions)
-* [Bfloat16 inference](#bfloat16-inference-instructions)
+* [FP32 inference](/benchmarks/language_translation/tensorflow/transformer_mlperf/inference/fp32/README.md)
+* [Bfloat16 inference](/benchmarks/language_translation/tensorflow/transformer_mlperf/inference/bfloat16/README.md)
+* [int8 inference](/benchmarks/language_translation/tensorflow/transformer_mlperf/inference/int8/README.md)
 
 Detailed information on Benchmark can be found in [mlperf/training](https://github.com/mlperf/training/tree/master/translation/tensorflow/transformer)
-
-# <a name="fp32-inference-instructions"></a> FP32 Inference Instructions
-
-1. Clone this [intelai/models](https://github.com/IntelAI/models)
-repository:
-
-```
-git clone https://github.com/IntelAI/models.git
-```
-
-2. Obtain the dataset.
-Decide the problem you want to run to get the appropriate dataset.
-We will get the training data of it as an example:
-
-    Download dataset for computing BLEU score reported in the paper
-    ```
-    export DATA_DIR=/home/<user>/transformer_data
-    mkdir $DATA_DIR && cd $DATA_DIR
-    wget https://nlp.stanford.edu/projects/nmt/data/wmt14.en-de/newstest2014.en
-    wget https://nlp.stanford.edu/projects/nmt/data/wmt14.en-de/newstest2014.de
-    ```
-
-3. Next, navigate to the `benchmarks` directory in your local clone of
-the [intelai/models](https://github.com/IntelAI/models) repo (from step 1).
-The `launch_benchmark.py` script in the `benchmarks` directory is
-used for starting a model run in a optimized TensorFlow docker
-container. It has arguments to specify which model, framework, mode,
-precision, and docker image to use, along with your path to the dataset location (from step 2).
-
-
-Before running inference, users should have the model fully trained and have saved checkpoints ready at the path $CHECKPOINT_DIR:
-
-```
-python launch_benchmark.py \
-    --framework tensorflow \
-    --precision fp32 \
-    --mode inference \
-    --model-name transformer_mlperf \
-    --batch-size 64 \
-    -i 0 --data-location $DATA_DIR \
-    --checkpoint $CHECKPOINT_DIR \
-    --docker-image intel/intel-optimized-tensorflow:latest \
-    --verbose \
-    -- file=newstest2014.en  file_out=translate.txt reference=newstest2014.de
-```
-4.  The log file is saved to the value of --output-dir. if not value spacified, the log will be at the models/benchmarks/common/tensorflow/logs in workspace.
-The performance and accuracy in the the log output when the benchmarking completes should look
-something like this, the real throughput and inferencing time varies:
-```
-	Total inferencing time: xxx
-	Throughput: xxx  sentences/second
-	Case-insensitive results: 26.694846153259277
-	Case-sensitive results: 26.182371377944946
-```
-
-
-## <a name="bfloat16-inference-instructions"></a> Bfloat16 Inference Instructions
-
-1. Clone this [intelai/models](https://github.com/IntelAI/models)
-repository:
-
-```
-git clone https://github.com/IntelAI/models.git
-```
-
-2. Obtain the dataset.
-Decide the problem you want to run to get the appropriate dataset.
-We will get the training data of it as an example:
-
-    Download dataset for computing BLEU score reported in the paper
-    ```
-    export DATA_DIR=/home/<user>/transformer_data
-    mkdir $DATA_DIR && cd $DATA_DIR
-    wget https://nlp.stanford.edu/projects/nmt/data/wmt14.en-de/newstest2014.en
-    wget https://nlp.stanford.edu/projects/nmt/data/wmt14.en-de/newstest2014.de
-    ```
-
-3. Next, navigate to the `benchmarks` directory in your local clone of
-the [intelai/models](https://github.com/IntelAI/models) repo (from step 1).
-The `launch_benchmark.py` script in the `benchmarks` directory is
-used for starting a model run in a optimized TensorFlow docker
-container. It has arguments to specify which model, framework, mode,
-precision, and docker image to use, along with your path to the dataset location (from step 2).
-
-
-Before running inference, users should have the model fully trained and have saved checkpoints ready at the path $CHECKPOINT_DIR:
-
-```
-python launch_benchmark.py \
-    --framework tensorflow \
-    --precision bfloat16 \
-    --mode inference \
-    --model-name transformer_mlperf \
-    --batch-size 64 \
-    -i 0 --data-location $DATA_DIR \
-    --checkpoint $CHECKPOINT_DIR \
-    --docker-image intel/intel-optimized-tensorflow:latest \
-    --verbose \
-    -- file=newstest2014.en  file_out=translate.txt reference=newstest2014.de
-```
-The log file is saved to the value of --output-dir. if not value spacified, the log will be at the models/benchmarks/common/tensorflow/logs in workspace.
-The performance and accuracy in the the log output when the benchmarking completes should look
-something like this, the real throughput and inferencing time varies:
-```
-	Total inferencing time: xxx
-	Throughput: xxx  sentences/second
-	Case-insensitive results: 27.636119723320007
-	Case-sensitive results: 27.127626538276672
-```
-
-
diff --git a/...language_translation/tensorflow/transformer_mlperf/inference/bfloat16/README.md b/...language_translation/tensorflow/transformer_mlperf/inference/bfloat16/README.md
@@ -0,0 +1,116 @@
+<!--- 0. Title -->
+# Transformer Language BFLOAT16 Inference
+
+<!-- 10. Description -->
+## Description
+
+This document has instructions for running Transformer Language FP32 Inference in mlperf
+Benchmark suits using Intel-optimized TensorFlow.
+
+Detailed information on mlperf Benchmark can be found in [mlperf/training](https://github.com/mlperf/training/tree/master/translation/tensorflow/transformer)
+
+The inference code is based on the trasnformer mlperf evaluation code, but Intel has optimized the inference model by modify the code of the model, so that it can achieve better performance on Intel CPUs. 
+The bfloat16 model is manually modfied by cast fp32 data type tensor to bfloat16 in the model, and we trained the modified model to reach the same or higher accuracy. The inference is based on the bfloat16 model we trained.
+
+<!--- 30. Datasets -->
+## Datasets
+
+Decide the problem you want to run to get the appropriate dataset.
+We will need to download and generate necessary files from the training data as an example:
+
+Download dataset for computing BLEU score
+```
+export DATASET_DIR=/home/<user>/transformer_data
+mkdir $DATASET_DIR && cd $DATASET_DIR
+wget https://nlp.stanford.edu/projects/nmt/data/wmt14.en-de/newstest2014.en
+wget https://nlp.stanford.edu/projects/nmt/data/wmt14.en-de/newstest2014.de
+```
+
+For the training dataset, run the `data_download.py` script from the Model Zoo directory.
+The Model Zoo directory comes with [AI Kit](/docs/general/tensorflow/AIKit.md). If
+you are not using AI kit, you will need a clone of the Model Zoo repo.
+```
+export PYTHONPATH=$PYTHONPATH:<model zoo dir>/models/common/tensorflow
+export DATASET_DIR=/home/<user>/transformer_data
+
+cd <model zoo dir>/models/language_translation/tensorflow/transformer_mlperf/training/fp32/transformer
+python data_download.py --data_dir=$DATASET_DIR
+```
+
+Running `python data_download.py --data_dir=$DATASET_DIR` assumes you have a python environment similar to what the `intel/intel-optimized-tensorflow:ubuntu-18.04` container provides. One option would be to run the above within the `intel/intel-optimized-tensorflow:ubuntu-18.04` container eg: `docker run -u $(id -u):$(id -g) --privileged  --entrypoint /bin/bash -v /home/<user>:/home/<user> -it intel/intel-optimized-tensorflow:ubuntu-18.04`
+
+<!--- 40. Quick Start Scripts -->
+## Quick Start Scripts
+
+Transformer Language in mlperf benchmark can run with full training or
+fewer training steps. During training we can control if it will do the evaluation
+or not.
+
+## Run the model
+
+Before running inference, users should have the model fully trained and have saved checkpoints ready at the path $CHECKPOINT_DIR.
+In order to improve the performance, we added a new script to generate a frozen model from a fully trained model checkpoint.
+To generate the frozen model, users need to run the following command in the tranformer model directory where export_transformer.py in:
+
+```
+export PYTHONPATH=$PYTHONPATH:<PATH_TO_MODEL_ZOO_ROOT>/models/common/tensorflow
+
+python export_transformer.py --model_dir=<$CHECKPOINT_DIR> --pb_path=<frozen_graph_full_path>
+```
+The translate can be run with accuracy mode or benchmark mode. The benchmark mode will run with the best performance by setting warmup steps and the total steps users want to run. The accuracy mode will just run for testing accuracy without setting warmup steps and steps.
+
+#### Benchmark mode run:
+```
+  python3 ./benchmarks/launch_benchmark.py    \
+     --benchmark-only --framework tensorflow  \
+     --in-graph=$PB_FILE \
+     --model-name transformer_mlperf \
+     --mode inference --precision bfloat16\
+     --batch-size $BATCH_SIZE \
+     --num-intra-threads $NUM_CORES --num-inter-threads $NUM_SOCKETS \
+     --verbose \
+     --data-location $DATA_DIR \
+     --docker-image intel/intel-optimized-tensorflow:latest \
+     -- params=big \
+        file=newstest2014.en \
+        vocab_file=vocab.ende.32768 \
+        file_out=translation.en \
+        reference=newstest2014.de \
+        warmup_steps=3 \
+        steps=100 
+```
+#### accuracy mode run:
+```
+  python3 ./benchmarks/launch_benchmark.py    \
+     --accuracy-only --framework tensorflow  \
+     --in-graph=$PB_FILE \
+     --model-name transformer_mlperf \
+     --mode inference --precision bfloat16 \
+     --batch-size $BATCH_SIZE \
+     --num-intra-threads $NUM_CORES --num-inter-threads $NUM_SOCKETS \
+     --verbose \
+     --data-location $DATA_DIR \
+     --docker-image intel/intel-optimized-tensorflow:latest \
+     -- params=big \
+        file=newstest2014.en \
+        vocab_file=vocab.ende.32768 \
+        file_out=translation.en \
+        reference=newstest2014.de \
+        steps=100 
+```
+where:
+   * $DATA_DIR -- the input data directory, which should include newstest2014.en, newstest2014.de and vocab.ende.32768
+   * $PB_FILE  -- the path of the frozen model generated with the script
+   * steps -- the number of batches of data to feed into the model for inference, if the number is greater than avaialable batches in the input data, it will only run number of batches available in the data.
+
+The log file is saved to the value of --output-dir. if not value spacified, the log will be at the models/benchmarks/common/tensorflow/logs in workspace.
+With accuracy mode, the official BLEU score will be printed
+
+The performance and accuracy in the the log output when the benchmarking completes should look
+something like this, the real throughput and inferencing time varies:
+```
+  Total inferencing time: xxx
+  Throughput: xxx  sentences/second
+  Case-insensitive results: 27.636119723320007
+  Case-sensitive results: 27.127626538276672
+