-
Notifications
You must be signed in to change notification settings - Fork 6.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix TSAN data race in EventListenerTest.MultiCF #9528
Conversation
@hx235 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
a3c388d
to
b81a0d9
Compare
@hx235 has updated the pull request. You must reimport the pull request before landing. |
@hx235 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
db/listener_test.cc
Outdated
ASSERT_OK(Flush(i)); | ||
ASSERT_OK(dbfull()->TEST_WaitForFlushMemTable()); | ||
ASSERT_EQ(listener->flushed_dbs_.size(), i); | ||
ASSERT_EQ(listener->flushed_column_family_names_.size(), i); | ||
ASSERT_TRUE(call_back); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would have thought the problem is TEST_WaitForFlushMemTable does not wait for event listener to finish. What did you think is the problem?
Try this patch, for example:
diff --git a/db/db_impl/db_impl_compaction_flush.cc b/db/db_impl/db_impl_compaction_flush.cc
index 4ff87c1e4..8afb1c782 100644
--- a/db/db_impl/db_impl_compaction_flush.cc
+++ b/db/db_impl/db_impl_compaction_flush.cc
@@ -855,6 +855,8 @@ void DBImpl::NotifyOnFlushCompleted(
mutable_cf_options.level0_stop_writes_trigger);
// release lock while notifying events
mutex_.Unlock();
+ bg_cv_.SignalAll();
+ usleep(100000);
{
for (auto& info : *flush_jobs_info) {
info->triggered_writes_slowdown = triggered_writes_slowdown;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that's the problem but I am thinking of fixing it in the way we write test. The reason is that the design of "// release lock while notifying events"
seems to be valid for optimizing performance and TEST_WaitForFlushMemTable
's underlying bg_cv_
seems to be waiting for that same lock.
And I'm thinking technically TEST_WaitForFlushMemTable
waits for memtable to be flushed but not necessarily needs to include listeners.
Maybe we could have another TEST_WaitForFlushNotificationFinish
, which is essentially the same as using sync point here without using another cv.
(Intended to write more context in the PR but got distracted... )
(I am also open for more discussion for fixing these flaky cases, including #9516 as they are not urgent to be merged in)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is close to fixing in the test as you said, but the reason I commented this line in particular is because call_back
might not have been set to true by the time this assertion executes, because TEST_WaitForFlushMemTable()
doesn't wait long enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is close to fixing in the test as you said, but the reason I commented this line in particular is because
call_back
might not have been set to true by the time this assertion executes, becauseTEST_WaitForFlushMemTable()
doesn't wait long enough.
Oh ..... yep I somehow overlooked that and repeated myself with the same mistake behind the flaky test …
b81a0d9
to
26942df
Compare
@hx235 has updated the pull request. You must reimport the pull request before landing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I was thinking you could assert the callback was called 8 times after Close()
Until I realized there is something called SyncPoint::LoadDependency(). And by the way, still writing my context.... |
Works for me. |
26942df
to
3809abd
Compare
@hx235 has updated the pull request. You must reimport the pull request before landing. |
@hx235 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Summary: We often see flaky tests due to `DB::Flush()` or `DBImpl::TEST_WaitForFlushMemTable()` not waiting until event listeners complete. For example, #9084, #9400, #9528, plus two new ones this week: "EventListenerTest.OnSingleDBFlushTest" and "DBFlushTest.FireOnFlushCompletedAfterCommittedResult". I ran a `make check` with the below race condition-coercing patch and fixed issues it found besides old BlobDB. ``` diff --git a/db/db_impl/db_impl_compaction_flush.cc b/db/db_impl/db_impl_compaction_flush.cc index 0e1864788..aaba68c4a 100644 --- a/db/db_impl/db_impl_compaction_flush.cc +++ b/db/db_impl/db_impl_compaction_flush.cc @@ -861,6 +861,8 @@ void DBImpl::NotifyOnFlushCompleted( mutable_cf_options.level0_stop_writes_trigger); // release lock while notifying events mutex_.Unlock(); + bg_cv_.SignalAll(); + sleep(1); { for (auto& info : *flush_jobs_info) { info->triggered_writes_slowdown = triggered_writes_slowdown; ``` The reason I did not fix old BlobDB issues is because it appears to have a fundamental (non-test) issue. In particular, it uses an EventListener to keep track of the files. OnFlushCompleted() could be delayed until even after a compaction involving that flushed file completes, causing the compaction to unexpectedly delete an untracked file. Pull Request resolved: #9617 Test Plan: `make check` including the race condition coercing patch Reviewed By: hx235 Differential Revision: D34384022 Pulled By: ajkr fbshipit-source-id: 2652ded39b415277c5d6a628414345223930514e
…race condition (#10291) Summary: **Context/Summary:** Previous experience with bugs and flaky tests taught us there exist features in RocksDB vulnerable to race condition caused by acquiring db mutex at a particular timing. This PR aggressively exposes those vulnerable features by injecting spurious wakeup and sleep to cause acquiring db mutex at various timing in order to expose such race condition **Testing:** - `COERCE_CONTEXT_SWITCH=1 make -j56 check / make -j56 db_stress` should reveal - flaky tests caused by db mutex related race condition - Reverted #9528 - A/B testing on `COMPILE_WITH_TSAN=1 make -j56 listener_test` w/ and w/o `COERCE_CONTEXT_SWITCH=1` followed by `./listener_test --gtest_filter=EventListenerTest.MultiCF --gtest_repeat=10` - `COERCE_CONTEXT_SWITCH=1` can cause expected test failure (i.e, expose target TSAN data race error) within 10 run while the other couldn't. - This proves our injection can expose flaky tests caused by db mutex related race condition faster. - known or new race-condition-type of internal bug by continuously running this PR - Performance - High ops-threads time: COERCE_CONTEXT_SWITCH=1 regressed by 4 times slower (2:01.16 vs 0:22.10 elapsed ). This PR will be run as a separate CI job so this regression won't affect any existing job. ``` TEST_TMPDIR=$db /usr/bin/time ./db_stress \ --ops_per_thread=100000 --expected_values_dir=$exp --clear_column_family_one_in=0 \ --write_buffer_size=524288 —target_file_size_base=524288 —ingest_external_file_one_in=100 —compact_files_one_in=1000 —compact_range_one_in=1000 ``` - Start-up time: COERCE_CONTEXT_SWITCH=1 didn't regress by 25% (0:01.51 vs 0:01.29 elapsed) ``` TEST_TMPDIR=$db ./db_stress -ops_per_thread=100000000 -expected_values_dir=$exp --clear_column_family_one_in=0 & sleep 120; pkill -9 db_stress TEST_TMPDIR=$db /usr/bin/time ./db_stress \ --ops_per_thread=1 -reopen=0 --expected_values_dir=$exp --clear_column_family_one_in=0 --destroy_db_initially=0 ``` Pull Request resolved: #10291 Reviewed By: ajkr Differential Revision: D39231182 Pulled By: hx235 fbshipit-source-id: 7ab6695430460e0826727fd8c66679b32b3e44b6
Context:
EventListenerTest.MultiCF
occasionally failed on TSAN data race as below:After investigation, it is due to the following:
(1)
ASSERT_OK(Flush(i));
before the readstd::vector::size()
is supposed to be blocked onDB::Impl::bg_cv_
for memtable flush to finish and get signaled at the end of background flush , which happens after the writestd::vector::push_back()
. So the sequence of execution should have been synchronized ascall flush() -> write -> return from flush() -> read
and would not cause any TSAN data race.ASSERT_OK(dbfull()->TEST_WaitForFlushMemTable());
serves a similar purpose based on the previous attempt to deflake the test.(2) However, there are multiple places in the code can signal this
DB::Impl::bg_cv_
and mistakenly wake upASSERT_OK(Flush(i));
(orASSERT_OK(dbfull()->TEST_WaitForFlushMemTable());
) too early (and with the lock available to them), resulting in non-synchronized read and write thus a TSAN data race.Summary:
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->LoadDependency()
mechanismTest:
./listener_test --gtest_filter=EventListenerTest.MultiCF --gtest_repeat=10
All passed