Enable mixed precision training for Transformer models #211

guillaumekln · 2018-10-03T16:02:39Z

Closes #57.

…penNMT#211) on Oct 3, 2018

mehmedes · 2019-01-18T15:30:18Z

Hi @guillaumekln,
Would you mind giving us some input on your speed gains with your mixed precision implementation vs. fp32:
tensorflow/tensor2tensor#1221

guillaumekln · 2019-01-18T16:27:09Z

Hi,

I gathered some fresh values on a P3 instance (1 x V100) using the tensorflow/tensorflow:nightly-gpu-py3 Docker image. Same configuration in both tests to highlight the raw gain:

Model type: TransformerBase (without shared weights)
Batch size: 8192

	vocab size	step/s	source tokens/s	target tokens/s
FP32	32,001	2.64	18.1k	20.4k
FP16	32,001	3.56	24.5k	27.6k
FP16	32,000	4.03	27.8k	31.4k
FP16 (with #309)	32,000	4.68	32.8k	37.1k

mehmedes · 2019-01-18T19:29:07Z

Thank you for the feedback. Looks like we share the same fate :_(

guillaumekln · 2019-01-29T16:20:22Z

@mehmedes Please note that it's important to make the vocabulary size a multiple of 8. In my initial experiment it was actually 32,000 + 1 (the <unk> token). Changing it to 31,999 + 1 makes a difference, see the table above.

guillaumekln · 2019-01-30T09:26:54Z

Similarly, the batch size should ideally be a multiple of 8. With #309, additional gains are observed (see the updated table above).

guillaumekln · 2019-01-31T09:04:00Z

@mehmedes Here are additional data for a big Transformer model with a batch size of 4096 and the latest updates:

	step/s	source tokens/s	target tokens/s
FP32	1.92	6.6k	7.4k
FP16	4.27	15.3k	17.3k

So to summarize, here are the current gains (with equal batch size):

base Transformer: x1.77
big Transformer: x2.22

which are in line with the expected FP16 gains (but generally lower than what one can achieve in PyTorch for example).

Enable mixed precision training for Transformer models

f1cfc1f

guillaumekln merged commit 87f6f3c into OpenNMT:master Oct 3, 2018

guillaumekln deleted the mixed-precision branch October 3, 2018 16:10

wanghm92 added a commit to wanghm92/OpenNMT-tf that referenced this pull request Jan 5, 2019

merged with: Enable mixed precision training for Transformer models (O…

ab1a819

…penNMT#211) on Oct 3, 2018

leod pushed a commit to leod/OpenNMT-tf that referenced this pull request Jun 3, 2023

Enable mixed precision training for Transformer models (OpenNMT#211)

95a6df6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable mixed precision training for Transformer models #211

Enable mixed precision training for Transformer models #211

guillaumekln commented Oct 3, 2018

mehmedes commented Jan 18, 2019

guillaumekln commented Jan 18, 2019 •

edited

Loading

mehmedes commented Jan 18, 2019

guillaumekln commented Jan 29, 2019

guillaumekln commented Jan 30, 2019

guillaumekln commented Jan 31, 2019

Enable mixed precision training for Transformer models #211

Enable mixed precision training for Transformer models #211

Conversation

guillaumekln commented Oct 3, 2018

mehmedes commented Jan 18, 2019

guillaumekln commented Jan 18, 2019 • edited Loading

mehmedes commented Jan 18, 2019

guillaumekln commented Jan 29, 2019

guillaumekln commented Jan 30, 2019

guillaumekln commented Jan 31, 2019

guillaumekln commented Jan 18, 2019 •

edited

Loading