Fix TSAN data race in EventListenerTest.MultiCF #9528

hx235 · 2022-02-08T20:56:40Z

Context:
EventListenerTest.MultiCF occasionally failed on TSAN data race as below:

WARNING: ThreadSanitizer: data race (pid=2047633)
  Read of size 8 at 0x7b6000001440 by main thread:
    #0 std::vector<rocksdb::DB*, std::allocator<rocksdb::DB*> >::size()  (listener_test+0x52337c)
    #1 rocksdb::EventListenerTest_MultiCF_Test::TestBody() db/listener_test.cc:384:7 (listener_test+0x52337c)

  Previous write of size 8 at 0x7b6000001440 by thread T2:
    #0 void std::vector<rocksdb::DB*, std::allocator<rocksdb::DB*> >::_M_realloc_insert<rocksdb::DB* const&>(__gnu_cxx::__normal_iterator<rocksdb::DB**, std::vector<rocksdb::DB*, std::allocator<rocksdb::DB*> > >, rocksdb::DB* const&) (listener_test+0x550654)
    #1 std::vector<rocksdb::DB*, std::allocator<rocksdb::DB*> >::push_back(rocksdb::DB* const&)  (listener_test+0x550654)
    #2 rocksdb::TestFlushListener::OnFlushCompleted(rocksdb::DB*, rocksdb::FlushJobInfo const&) db/listener_test.cc:255:18 (listener_test+0x550654)

After investigation, it is due to the following:
(1) ASSERT_OK(Flush(i)); before the read std::vector::size() is supposed to be blocked on DB::Impl::bg_cv_ for memtable flush to finish and get signaled at the end of background flush , which happens after the write std::vector::push_back() . So the sequence of execution should have been synchronized as call flush() -> write -> return from flush() -> read and would not cause any TSAN data race.

The subsequent ASSERT_OK(dbfull()->TEST_WaitForFlushMemTable()); serves a similar purpose based on the previous attempt to deflake the test.

(2) However, there are multiple places in the code can signal this DB::Impl::bg_cv_ and mistakenly wake up ASSERT_OK(Flush(i)); (or ASSERT_OK(dbfull()->TEST_WaitForFlushMemTable());) too early (and with the lock available to them), resulting in non-synchronized read and write thus a TSAN data race.

Reproduced by the following, suggested by @ajkr:

diff --git a/db/db_impl/db_impl_compaction_flush.cc b/db/db_impl/db_impl_compaction_flush.cc
index 4ff87c1e4..52492e9cf 100644
--- a/db/db_impl/db_impl_compaction_flush.cc
+++ b/db/db_impl/db_impl_compaction_flush.cc
@@ -22,7 +22,7 @@
 #include "test_util/sync_point.h"
 #include "util/cast_util.h"
 #include "util/concurrent_task_limiter_impl.h"
 namespace ROCKSDB_NAMESPACE {
 
 bool DBImpl::EnoughRoomForCompaction(
@@ -855,6 +855,7 @@ void DBImpl::NotifyOnFlushCompleted(
        mutable_cf_options.level0_stop_writes_trigger);
   // release lock while notifying events
   mutex_.Unlock();
+  bg_cv_.SignalAll();

Summary:

Added synchornization between read and write by ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->LoadDependency() mechanism

Test: ./listener_test --gtest_filter=EventListenerTest.MultiCF --gtest_repeat=10

pre-fix:

Repeating all tests (iteration 3) 
Note: Google Test filter = EventListenerTest.MultiCF
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from EventListenerTest
[ RUN      ] EventListenerTest.MultiCF
==================
WARNING: ThreadSanitizer: data race (pid=3377137)
  Read of size 8 at 0x7b6000000840 by main thread:
    #0 std::vector<rocksdb::DB*, std::allocator<rocksdb::DB*> >::size() 
    #1 rocksdb::EventListenerTest_MultiCF_Test::TestBody() db/listener_test.cc:384 (listener_test+0x4bb300)

  Previous write of size 8 at 0x7b6000000840 by thread T2:
    #0 void std::vector<rocksdb::DB*, std::allocator<rocksdb::DB*> >::_M_realloc_insert<rocksdb::DB* const&>(__gnu_cxx::__normal_iterator<rocksdb::DB**, std::vector<rocksdb::DB*, std::allocator<rocksdb::DB*> > >, rocksdb::DB* const&) 
    #1 std::vector<rocksdb::DB*, std::allocator<rocksdb::DB*> >::push_back(rocksdb::DB* const&) 
    #2 rocksdb::TestFlushListener::OnFlushCompleted(rocksdb::DB*, rocksdb::FlushJobInfo const&) db/listener_test.cc:255 (listener_test+0x4e820f)

post-fix: All passed

facebook-github-bot · 2022-02-08T20:56:56Z

@hx235 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2022-02-08T21:20:32Z

@hx235 has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2022-02-08T22:42:49Z

@hx235 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

ajkr · 2022-02-09T00:50:00Z

db/listener_test.cc

      ASSERT_OK(Flush(i));
      ASSERT_OK(dbfull()->TEST_WaitForFlushMemTable());
-      ASSERT_EQ(listener->flushed_dbs_.size(), i);
-      ASSERT_EQ(listener->flushed_column_family_names_.size(), i);
+      ASSERT_TRUE(call_back);


I would have thought the problem is TEST_WaitForFlushMemTable does not wait for event listener to finish. What did you think is the problem?

Try this patch, for example:

diff --git a/db/db_impl/db_impl_compaction_flush.cc b/db/db_impl/db_impl_compaction_flush.cc index 4ff87c1e4..8afb1c782 100644 --- a/db/db_impl/db_impl_compaction_flush.cc +++ b/db/db_impl/db_impl_compaction_flush.cc @@ -855,6 +855,8 @@ void DBImpl::NotifyOnFlushCompleted( mutable_cf_options.level0_stop_writes_trigger); // release lock while notifying events mutex_.Unlock(); + bg_cv_.SignalAll(); + usleep(100000); { for (auto& info : *flush_jobs_info) { info->triggered_writes_slowdown = triggered_writes_slowdown;

Yeah that's the problem but I am thinking of fixing it in the way we write test. The reason is that the design of "// release lock while notifying events" seems to be valid for optimizing performance and TEST_WaitForFlushMemTable's underlying bg_cv_ seems to be waiting for that same lock.

And I'm thinking technically TEST_WaitForFlushMemTable waits for memtable to be flushed but not necessarily needs to include listeners.

Maybe we could have another TEST_WaitForFlushNotificationFinish, which is essentially the same as using sync point here without using another cv.

(Intended to write more context in the PR but got distracted... )

(I am also open for more discussion for fixing these flaky cases, including #9516 as they are not urgent to be merged in)

I think it is close to fixing in the test as you said, but the reason I commented this line in particular is because call_back might not have been set to true by the time this assertion executes, because TEST_WaitForFlushMemTable() doesn't wait long enough.

I think it is close to fixing in the test as you said, but the reason I commented this line in particular is because call_back might not have been set to true by the time this assertion executes, because TEST_WaitForFlushMemTable() doesn't wait long enough.

Oh ..... yep I somehow overlooked that and repeated myself with the same mistake behind the flaky test …

facebook-github-bot · 2022-02-10T00:51:26Z

@hx235 has updated the pull request. You must reimport the pull request before landing.

ajkr

LGTM. I was thinking you could assert the callback was called 8 times after Close()

hx235 · 2022-02-10T02:18:07Z

LGTM. I was thinking you could assert the callback was called 8 times after Close()

Until I realized there is something called SyncPoint::LoadDependency(). And by the way, still writing my context....

ajkr · 2022-02-10T02:58:48Z

SyncPoint::LoadDependency()

Works for me.

facebook-github-bot · 2022-02-10T03:05:29Z

@hx235 has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2022-02-10T03:05:38Z

@hx235 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Summary: We often see flaky tests due to `DB::Flush()` or `DBImpl::TEST_WaitForFlushMemTable()` not waiting until event listeners complete. For example, #9084, #9400, #9528, plus two new ones this week: "EventListenerTest.OnSingleDBFlushTest" and "DBFlushTest.FireOnFlushCompletedAfterCommittedResult". I ran a `make check` with the below race condition-coercing patch and fixed issues it found besides old BlobDB. ``` diff --git a/db/db_impl/db_impl_compaction_flush.cc b/db/db_impl/db_impl_compaction_flush.cc index 0e1864788..aaba68c4a 100644 --- a/db/db_impl/db_impl_compaction_flush.cc +++ b/db/db_impl/db_impl_compaction_flush.cc @@ -861,6 +861,8 @@ void DBImpl::NotifyOnFlushCompleted( mutable_cf_options.level0_stop_writes_trigger); // release lock while notifying events mutex_.Unlock(); + bg_cv_.SignalAll(); + sleep(1); { for (auto& info : *flush_jobs_info) { info->triggered_writes_slowdown = triggered_writes_slowdown; ``` The reason I did not fix old BlobDB issues is because it appears to have a fundamental (non-test) issue. In particular, it uses an EventListener to keep track of the files. OnFlushCompleted() could be delayed until even after a compaction involving that flushed file completes, causing the compaction to unexpectedly delete an untracked file. Pull Request resolved: #9617 Test Plan: `make check` including the race condition coercing patch Reviewed By: hx235 Differential Revision: D34384022 Pulled By: ajkr fbshipit-source-id: 2652ded39b415277c5d6a628414345223930514e

…race condition (#10291) Summary: **Context/Summary:** Previous experience with bugs and flaky tests taught us there exist features in RocksDB vulnerable to race condition caused by acquiring db mutex at a particular timing. This PR aggressively exposes those vulnerable features by injecting spurious wakeup and sleep to cause acquiring db mutex at various timing in order to expose such race condition **Testing:** - `COERCE_CONTEXT_SWITCH=1 make -j56 check / make -j56 db_stress` should reveal - flaky tests caused by db mutex related race condition - Reverted #9528 - A/B testing on `COMPILE_WITH_TSAN=1 make -j56 listener_test` w/ and w/o `COERCE_CONTEXT_SWITCH=1` followed by `./listener_test --gtest_filter=EventListenerTest.MultiCF --gtest_repeat=10` - `COERCE_CONTEXT_SWITCH=1` can cause expected test failure (i.e, expose target TSAN data race error) within 10 run while the other couldn't. - This proves our injection can expose flaky tests caused by db mutex related race condition faster. - known or new race-condition-type of internal bug by continuously running this PR - Performance - High ops-threads time: COERCE_CONTEXT_SWITCH=1 regressed by 4 times slower (2:01.16 vs 0:22.10 elapsed ). This PR will be run as a separate CI job so this regression won't affect any existing job. ``` TEST_TMPDIR=$db /usr/bin/time ./db_stress \ --ops_per_thread=100000 --expected_values_dir=$exp --clear_column_family_one_in=0 \ --write_buffer_size=524288 —target_file_size_base=524288 —ingest_external_file_one_in=100 —compact_files_one_in=1000 —compact_range_one_in=1000 ``` - Start-up time: COERCE_CONTEXT_SWITCH=1 didn't regress by 25% (0:01.51 vs 0:01.29 elapsed) ``` TEST_TMPDIR=$db ./db_stress -ops_per_thread=100000000 -expected_values_dir=$exp --clear_column_family_one_in=0 & sleep 120; pkill -9 db_stress TEST_TMPDIR=$db /usr/bin/time ./db_stress \ --ops_per_thread=1 -reopen=0 --expected_values_dir=$exp --clear_column_family_one_in=0 --destroy_db_initially=0 ``` Pull Request resolved: #10291 Reviewed By: ajkr Differential Revision: D39231182 Pulled By: hx235 fbshipit-source-id: 7ab6695430460e0826727fd8c66679b32b3e44b6

facebook-github-bot added the CLA Signed label Feb 8, 2022

hx235 force-pushed the listener-multicf-tsan branch from a3c388d to b81a0d9 Compare February 8, 2022 21:20

ajkr requested changes Feb 9, 2022

View reviewed changes

hx235 marked this pull request as draft February 9, 2022 01:12

hx235 changed the title ~~Attempt to fix TSAN data race in EventListenerTest.MultiCF~~ [Draft] Attempt to fix TSAN data race in EventListenerTest.MultiCF Feb 9, 2022

hx235 force-pushed the listener-multicf-tsan branch from b81a0d9 to 26942df Compare February 10, 2022 00:51

hx235 changed the title ~~[Draft] Attempt to fix TSAN data race in EventListenerTest.MultiCF~~ Fix TSAN data race in EventListenerTest.MultiCF Feb 10, 2022

ajkr approved these changes Feb 10, 2022

View reviewed changes

add sync point

3809abd

hx235 force-pushed the listener-multicf-tsan branch from 26942df to 3809abd Compare February 10, 2022 03:05

hx235 marked this pull request as ready for review February 10, 2022 03:08

facebook-github-bot closed this in c5cd31c Feb 10, 2022

ajkr mentioned this pull request Feb 22, 2022

Fix test race conditions with OnFlushCompleted() #9617

Closed

hx235 mentioned this pull request Sep 3, 2022

Inject spurious wakeup and sleep before acquiring db mutex to expose race condition #10291

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix TSAN data race in EventListenerTest.MultiCF #9528

Fix TSAN data race in EventListenerTest.MultiCF #9528

hx235 commented Feb 8, 2022 •

edited

Loading

facebook-github-bot commented Feb 8, 2022

facebook-github-bot commented Feb 8, 2022

facebook-github-bot commented Feb 8, 2022

ajkr Feb 9, 2022 •

edited

Loading

hx235 Feb 9, 2022 •

edited

Loading

ajkr Feb 9, 2022

hx235 Feb 9, 2022 •

edited

Loading

facebook-github-bot commented Feb 10, 2022

ajkr left a comment

hx235 commented Feb 10, 2022 •

edited

Loading

ajkr commented Feb 10, 2022

facebook-github-bot commented Feb 10, 2022

facebook-github-bot commented Feb 10, 2022

Fix TSAN data race in EventListenerTest.MultiCF #9528

Fix TSAN data race in EventListenerTest.MultiCF #9528

Conversation

hx235 commented Feb 8, 2022 • edited Loading

facebook-github-bot commented Feb 8, 2022

facebook-github-bot commented Feb 8, 2022

facebook-github-bot commented Feb 8, 2022

ajkr Feb 9, 2022 • edited Loading

Choose a reason for hiding this comment

hx235 Feb 9, 2022 • edited Loading

Choose a reason for hiding this comment

ajkr Feb 9, 2022

Choose a reason for hiding this comment

hx235 Feb 9, 2022 • edited Loading

Choose a reason for hiding this comment

facebook-github-bot commented Feb 10, 2022

ajkr left a comment

Choose a reason for hiding this comment

hx235 commented Feb 10, 2022 • edited Loading

ajkr commented Feb 10, 2022

facebook-github-bot commented Feb 10, 2022

facebook-github-bot commented Feb 10, 2022

hx235 commented Feb 8, 2022 •

edited

Loading

ajkr Feb 9, 2022 •

edited

Loading

hx235 Feb 9, 2022 •

edited

Loading

hx235 Feb 9, 2022 •

edited

Loading

hx235 commented Feb 10, 2022 •

edited

Loading