Skip to content

Commit

Permalink
Cuixiaom/transformer mlperf int8 pr (intel#133)
Browse files Browse the repository at this point in the history
* Added transformer_mlperf int8 inference model into model zoo

* Added missing transformer_mlperf int8 launch_benchmark api files

* Updated the README files, and move the instruction to run the transformer_mlperf inference into the README file in the benchmark directory

* Added README files for transformer_mlperf inference model with different data type

* Minor typos fix

* minor README changes
  • Loading branch information
cuixiaom committed Sep 20, 2021
1 parent 469a073 commit 24e4444
Show file tree
Hide file tree
Showing 33 changed files with 5,386 additions and 240 deletions.
1 change: 1 addition & 0 deletions benchmarks/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@ For information on running more advanced use cases using the workload containers
| Language Translation | [BERT](https://arxiv.org/pdf/1810.04805.pdf) | Inference | | [FP32](language_translation/tensorflow/bert/README.md#fp32-inference-instructions) |
| Language Translation | [GNMT*](https://arxiv.org/pdf/1609.08144.pdf) | Inference | Model Containers: [FP32](https://software.intel.com/content/www/us/en/develop/articles/containers/gnmt-fp32-inference-tensorflow-container.html) <br> Model Packages: [FP32](https://software.intel.com/content/www/us/en/develop/articles/containers/gnmt-fp32-inference-tensorflow-model.html) | [FP32](language_translation/tensorflow/mlperf_gnmt/inference/fp32/README.md) |
| Language Translation | [Transformer_LT_mlperf](https://arxiv.org/pdf/1706.03762.pdf) | Training | Model Containers: [FP32](https://software.intel.com/content/www/us/en/develop/articles/containers/transformer-lt-mlperf-fp32-training-tensorflow-container.html) [BFloat16**](https://software.intel.com/content/www/us/en/develop/articles/containers/transformer-lt-mlperf-bfloat16-training-tensorflow-container.html) <br> Model Packages: [FP32](https://software.intel.com/content/www/us/en/develop/articles/containers/transformer-lt-mlperf-fp32-training-tensorflow-model.html) [BFloat16**](https://software.intel.com/content/www/us/en/develop/articles/containers/transformer-lt-mlperf-bfloat16-training-tensorflow-model.html) | [FP32](language_translation/tensorflow/transformer_mlperf/training/fp32/README.md) [BFloat16**](language_translation/tensorflow/transformer_mlperf/training/bfloat16/README.md) |
| Language Translation | [Transformer_LT_mlperf](https://arxiv.org/pdf/1706.03762.pdf) | Inference | | [FP32](language_translation/tensorflow/transformer_mlperf/Inference/fp32/README.md) [BFloat16**](language_translation/tensorflow/transformer_mlperf/Inference/bfloat16/README.md) [INT8](language_translation/tensorflow/transformer_mlperf/Inference/int8/README.md) |
| Language Translation | [Transformer_LT_Official](https://arxiv.org/pdf/1706.03762.pdf) | Inference | Model Containers: [FP32](https://software.intel.com/content/www/us/en/develop/articles/containers/transformer-lt-official-fp32-inference-tensorflow-container.html) <br> Model Packages: [FP32](https://software.intel.com/content/www/us/en/develop/articles/containers/transformer-lt-official-fp32-inference-tensorflow-model.html) | [FP32](language_translation/tensorflow/transformer_lt_official/inference/fp32/README.md) |
| Object Detection | [Faster R-CNN](https://arxiv.org/pdf/1506.01497.pdf) | Inference | Model Containers: [Int8](https://software.intel.com/content/www/us/en/develop/articles/containers/faster-rcnn-int8-inference-tensorflow-container.html) [FP32](https://software.intel.com/content/www/us/en/develop/articles/containers/faster-rcnn-fp32-inference-tensorflow-container.html) <br> Model Packages: [Int8](https://software.intel.com/content/www/us/en/develop/articles/containers/faster-rcnn-int8-inference-tensorflow-model.html) [FP32](https://software.intel.com/content/www/us/en/develop/articles/containers/faster-rcnn-fp32-inference-tensorflow-model.html) | [Int8](object_detection/tensorflow/faster_rcnn/inference/int8/README.md) [FP32](object_detection/tensorflow/faster_rcnn/inference/fp32/README.md) |
| Object Detection | [R-FCN](https://arxiv.org/pdf/1605.06409.pdf) | Inference | Model Containers: [Int8](https://software.intel.com/content/www/us/en/develop/articles/containers/rfcn-int8-inference-tensorflow-container.html) [FP32](https://software.intel.com/content/www/us/en/develop/articles/containers/rfcn-fp32-inference-tensorflow-container.html) <br> Model Packages: [Int8](https://software.intel.com/content/www/us/en/develop/articles/containers/rfcn-int8-inference-tensorflow-model.html) [FP32](https://software.intel.com/content/www/us/en/develop/articles/containers/rfcn-fp32-inference-tensorflow-model.html) | [Int8](object_detection/tensorflow/rfcn/inference/int8/README.md) [FP32](object_detection/tensorflow/rfcn/inference/fp32/README.md) |
Expand Down
2 changes: 1 addition & 1 deletion benchmarks/common/tensorflow/start.sh
Original file line number Diff line number Diff line change
Expand Up @@ -1344,7 +1344,7 @@ function transformer_mlperf() {
fi

if [[ ${MODE} == "inference" ]]; then
if [[ (${PRECISION} == "bfloat16") || ( ${PRECISION} == "fp32") ]]; then
if [[ (${PRECISION} == "bfloat16") || ( ${PRECISION} == "fp32") || ( ${PRECISION} == "int8") ]]; then

if [[ -z "${params}" ]]; then
echo "transformer-language requires --params arg to be defined"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,118 +4,8 @@ The following documents have instructions for how to run Transformer Language us
Benchmark suites for the following modes/platforms:
* [FP32 training](/benchmarks/language_translation/tensorflow/transformer_mlperf/training/fp32/README.md)
* [Bfloat16 training](/benchmarks/language_translation/tensorflow/transformer_mlperf/training/bfloat16/README.md)
* [FP32 inference](#fp32-inference-instructions)
* [Bfloat16 inference](#bfloat16-inference-instructions)
* [FP32 inference](/benchmarks/language_translation/tensorflow/transformer_mlperf/inference/fp32/README.md)
* [Bfloat16 inference](/benchmarks/language_translation/tensorflow/transformer_mlperf/inference/bfloat16/README.md)
* [int8 inference](/benchmarks/language_translation/tensorflow/transformer_mlperf/inference/int8/README.md)

Detailed information on Benchmark can be found in [mlperf/training](https://github.com/mlperf/training/tree/master/translation/tensorflow/transformer)

# <a name="fp32-inference-instructions"></a> FP32 Inference Instructions

1. Clone this [intelai/models](https://github.com/IntelAI/models)
repository:

```
git clone https://github.com/IntelAI/models.git
```

2. Obtain the dataset.
Decide the problem you want to run to get the appropriate dataset.
We will get the training data of it as an example:

Download dataset for computing BLEU score reported in the paper
```
export DATA_DIR=/home/<user>/transformer_data
mkdir $DATA_DIR && cd $DATA_DIR
wget https://nlp.stanford.edu/projects/nmt/data/wmt14.en-de/newstest2014.en
wget https://nlp.stanford.edu/projects/nmt/data/wmt14.en-de/newstest2014.de
```

3. Next, navigate to the `benchmarks` directory in your local clone of
the [intelai/models](https://github.com/IntelAI/models) repo (from step 1).
The `launch_benchmark.py` script in the `benchmarks` directory is
used for starting a model run in a optimized TensorFlow docker
container. It has arguments to specify which model, framework, mode,
precision, and docker image to use, along with your path to the dataset location (from step 2).


Before running inference, users should have the model fully trained and have saved checkpoints ready at the path $CHECKPOINT_DIR:

```
python launch_benchmark.py \
--framework tensorflow \
--precision fp32 \
--mode inference \
--model-name transformer_mlperf \
--batch-size 64 \
-i 0 --data-location $DATA_DIR \
--checkpoint $CHECKPOINT_DIR \
--docker-image intel/intel-optimized-tensorflow:latest \
--verbose \
-- file=newstest2014.en file_out=translate.txt reference=newstest2014.de
```
4. The log file is saved to the value of --output-dir. if not value spacified, the log will be at the models/benchmarks/common/tensorflow/logs in workspace.
The performance and accuracy in the the log output when the benchmarking completes should look
something like this, the real throughput and inferencing time varies:
```
Total inferencing time: xxx
Throughput: xxx sentences/second
Case-insensitive results: 26.694846153259277
Case-sensitive results: 26.182371377944946
```


## <a name="bfloat16-inference-instructions"></a> Bfloat16 Inference Instructions

1. Clone this [intelai/models](https://github.com/IntelAI/models)
repository:

```
git clone https://github.com/IntelAI/models.git
```

2. Obtain the dataset.
Decide the problem you want to run to get the appropriate dataset.
We will get the training data of it as an example:

Download dataset for computing BLEU score reported in the paper
```
export DATA_DIR=/home/<user>/transformer_data
mkdir $DATA_DIR && cd $DATA_DIR
wget https://nlp.stanford.edu/projects/nmt/data/wmt14.en-de/newstest2014.en
wget https://nlp.stanford.edu/projects/nmt/data/wmt14.en-de/newstest2014.de
```

3. Next, navigate to the `benchmarks` directory in your local clone of
the [intelai/models](https://github.com/IntelAI/models) repo (from step 1).
The `launch_benchmark.py` script in the `benchmarks` directory is
used for starting a model run in a optimized TensorFlow docker
container. It has arguments to specify which model, framework, mode,
precision, and docker image to use, along with your path to the dataset location (from step 2).


Before running inference, users should have the model fully trained and have saved checkpoints ready at the path $CHECKPOINT_DIR:

```
python launch_benchmark.py \
--framework tensorflow \
--precision bfloat16 \
--mode inference \
--model-name transformer_mlperf \
--batch-size 64 \
-i 0 --data-location $DATA_DIR \
--checkpoint $CHECKPOINT_DIR \
--docker-image intel/intel-optimized-tensorflow:latest \
--verbose \
-- file=newstest2014.en file_out=translate.txt reference=newstest2014.de
```
The log file is saved to the value of --output-dir. if not value spacified, the log will be at the models/benchmarks/common/tensorflow/logs in workspace.
The performance and accuracy in the the log output when the benchmarking completes should look
something like this, the real throughput and inferencing time varies:
```
Total inferencing time: xxx
Throughput: xxx sentences/second
Case-insensitive results: 27.636119723320007
Case-sensitive results: 27.127626538276672
```


Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
<!--- 0. Title -->
# Transformer Language BFLOAT16 Inference

<!-- 10. Description -->
## Description

This document has instructions for running Transformer Language FP32 Inference in mlperf
Benchmark suits using Intel-optimized TensorFlow.

Detailed information on mlperf Benchmark can be found in [mlperf/training](https://github.com/mlperf/training/tree/master/translation/tensorflow/transformer)

The inference code is based on the trasnformer mlperf evaluation code, but Intel has optimized the inference model by modify the code of the model, so that it can achieve better performance on Intel CPUs.
The bfloat16 model is manually modfied by cast fp32 data type tensor to bfloat16 in the model, and we trained the modified model to reach the same or higher accuracy. The inference is based on the bfloat16 model we trained.

<!--- 30. Datasets -->
## Datasets

Decide the problem you want to run to get the appropriate dataset.
We will need to download and generate necessary files from the training data as an example:

Download dataset for computing BLEU score
```
export DATASET_DIR=/home/<user>/transformer_data
mkdir $DATASET_DIR && cd $DATASET_DIR
wget https://nlp.stanford.edu/projects/nmt/data/wmt14.en-de/newstest2014.en
wget https://nlp.stanford.edu/projects/nmt/data/wmt14.en-de/newstest2014.de
```

For the training dataset, run the `data_download.py` script from the Model Zoo directory.
The Model Zoo directory comes with [AI Kit](/docs/general/tensorflow/AIKit.md). If
you are not using AI kit, you will need a clone of the Model Zoo repo.
```
export PYTHONPATH=$PYTHONPATH:<model zoo dir>/models/common/tensorflow
export DATASET_DIR=/home/<user>/transformer_data
cd <model zoo dir>/models/language_translation/tensorflow/transformer_mlperf/training/fp32/transformer
python data_download.py --data_dir=$DATASET_DIR
```

Running `python data_download.py --data_dir=$DATASET_DIR` assumes you have a python environment similar to what the `intel/intel-optimized-tensorflow:ubuntu-18.04` container provides. One option would be to run the above within the `intel/intel-optimized-tensorflow:ubuntu-18.04` container eg: `docker run -u $(id -u):$(id -g) --privileged --entrypoint /bin/bash -v /home/<user>:/home/<user> -it intel/intel-optimized-tensorflow:ubuntu-18.04`

<!--- 40. Quick Start Scripts -->
## Quick Start Scripts

Transformer Language in mlperf benchmark can run with full training or
fewer training steps. During training we can control if it will do the evaluation
or not.

## Run the model

Before running inference, users should have the model fully trained and have saved checkpoints ready at the path $CHECKPOINT_DIR.
In order to improve the performance, we added a new script to generate a frozen model from a fully trained model checkpoint.
To generate the frozen model, users need to run the following command in the tranformer model directory where export_transformer.py in:

```
export PYTHONPATH=$PYTHONPATH:<PATH_TO_MODEL_ZOO_ROOT>/models/common/tensorflow
python export_transformer.py --model_dir=<$CHECKPOINT_DIR> --pb_path=<frozen_graph_full_path>
```
The translate can be run with accuracy mode or benchmark mode. The benchmark mode will run with the best performance by setting warmup steps and the total steps users want to run. The accuracy mode will just run for testing accuracy without setting warmup steps and steps.

#### Benchmark mode run:
```
python3 ./benchmarks/launch_benchmark.py \
--benchmark-only --framework tensorflow \
--in-graph=$PB_FILE \
--model-name transformer_mlperf \
--mode inference --precision bfloat16\
--batch-size $BATCH_SIZE \
--num-intra-threads $NUM_CORES --num-inter-threads $NUM_SOCKETS \
--verbose \
--data-location $DATA_DIR \
--docker-image intel/intel-optimized-tensorflow:latest \
-- params=big \
file=newstest2014.en \
vocab_file=vocab.ende.32768 \
file_out=translation.en \
reference=newstest2014.de \
warmup_steps=3 \
steps=100
```
#### accuracy mode run:
```
python3 ./benchmarks/launch_benchmark.py \
--accuracy-only --framework tensorflow \
--in-graph=$PB_FILE \
--model-name transformer_mlperf \
--mode inference --precision bfloat16 \
--batch-size $BATCH_SIZE \
--num-intra-threads $NUM_CORES --num-inter-threads $NUM_SOCKETS \
--verbose \
--data-location $DATA_DIR \
--docker-image intel/intel-optimized-tensorflow:latest \
-- params=big \
file=newstest2014.en \
vocab_file=vocab.ende.32768 \
file_out=translation.en \
reference=newstest2014.de \
steps=100
```
where:
* $DATA_DIR -- the input data directory, which should include newstest2014.en, newstest2014.de and vocab.ende.32768
* $PB_FILE -- the path of the frozen model generated with the script
* steps -- the number of batches of data to feed into the model for inference, if the number is greater than avaialable batches in the input data, it will only run number of batches available in the data.

The log file is saved to the value of --output-dir. if not value spacified, the log will be at the models/benchmarks/common/tensorflow/logs in workspace.
With accuracy mode, the official BLEU score will be printed

The performance and accuracy in the the log output when the benchmarking completes should look
something like this, the real throughput and inferencing time varies:
```
Total inferencing time: xxx
Throughput: xxx sentences/second
Case-insensitive results: 27.636119723320007
Case-sensitive results: 27.127626538276672
Loading

0 comments on commit 24e4444

Please sign in to comment.