Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Previously, the prefetch GPU -> top GPU and prefetch CPU -> prefetch GPU copies were launched concurrently in separate streams, allowing the next batch to be copied in before the current one is read. This patch explicitly synchronizes the prefetch -> top copy wrt the host, preventing the CPU -> GPU from being launched until its completion.
- Loading branch information