fix GPU data race

Previously, the prefetch GPU -> top GPU and prefetch CPU -> prefetch GPU copies were launched concurrently in separate streams, allowing the next batch to be copied in before the current one is read. This patch explicitly synchronizes the prefetch -> top copy wrt the host, preventing the CPU -> GPU from being launched until its completion.
fangzheng354 · Aug 30, 2015 · 846f2c3 · 846f2c3
1 parent 4c561fd
commit 846f2c3
Showing 1 changed file with 3 additions and 1 deletion.
diff --git a/src/caffe/layers/base_data_layer.cu b/src/caffe/layers/base_data_layer.cu
@@ -20,7 +20,9 @@ void BasePrefetchingDataLayer<Dtype>::Forward_gpu(
     caffe_copy(batch->label_.count(), batch->label_.gpu_data(),
         top[1]->mutable_gpu_data());
   }
-
+  // Ensure the copy is synchronous wrt the host, so that the next batch isn't
+  // copied in meanwhile.
+  CUDA_CHECK(cudaStreamSynchronize(cudaStreamDefault));
   prefetch_free_.push(batch);
 }