Integrate block cache tracer into db_impl #5433

HaoyuHuang · 2019-06-08T00:44:47Z

This PR integrates the block cache tracer class into db_impl.cc.
db_impl.cc contains a member variable of AtomicBlockCacheTraceWriter class and passes its reference to the block_based_table_reader.

facebook-github-bot

@HaoyuHuang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

…lt type (facebook#5432) Summary: This affects some TSAN builds: env/env_test.cc: In member function ‘virtual void rocksdb::EnvPosixTestWithParam_MultiRead_Test::TestBody()’: env/env_test.cc:1126:76: error: type qualifiers ignored on cast result type [-Werror=ignored-qualifiers] auto data = NewAligned(kSectorSize * 8, static_cast<const char>(i + 1)); ^ env/env_test.cc:1154:77: error: type qualifiers ignored on cast result type [-Werror=ignored-qualifiers] auto buf = NewAligned(kSectorSize * 8, static_cast<const char>(i*2 + 1)); ^ Pull Request resolved: facebook#5432 Differential Revision: D15727277 Pulled By: ltamasi fbshipit-source-id: dc0e687b123e7c4d703ccc0c16b7167e07d1c9b0

facebook-github-bot · 2019-06-10T17:51:41Z

@HaoyuHuang has updated the pull request. Re-import the pull request

Summary: The patch reduces the contention over prepared_mutex_ using these techniques: 1) Move ::RemovePrepared() to be called from the commit callback when we have two write queues. 2) Use two separate mutex for PreparedHeap, one prepared_mutex_ needed for ::RemovePrepared, and one ::push_pop_mutex() needed for ::AddPrepared(). Given that we call ::AddPrepared only from the first write queue and ::RemovePrepared mostly from the 2nd, this will result into each the two write queues not competing with each other over a single mutex. ::RemovePrepared might occasionally need to acquire ::push_pop_mutex() if ::erase() ends up with calling ::pop() 3) Acquire ::push_pop_mutex() on the first callback of the write queue and release it on the last. Pull Request resolved: facebook#5420 Differential Revision: D15741985 Pulled By: maysamyabandeh fbshipit-source-id: 84ce8016007e88bb6e10da5760ba1f0d26347735

…cebook#5413) Summary: In regular RocksDB instance, `MemTable::earliest_seqno_` is "db sequence number at the time of creation". However, we cannot use the db sequence number to set the value of `MemTable::earliest_seqno_` for secondary instance, i.e. `DBImplSecondary` due to the logic of MANIFEST and WAL replay. When replaying the log files of the primary, the secondary instance first replays MANIFEST and updates the db sequence number if necessary. Next, the secondary replays WAL files, creates new memtables if necessary and inserts key-value pairs into memtables. The following can occur when the db has two or more column families. Assume the db has column family "default" and "cf1". At a certain in time, both "default" and "cf1" have data in memtables. 1. Primary triggers a flush and flushes "cf1". "default" is **not** flushed. 2. Secondary replays the MANIFEST updates its db sequence number to the latest value learned from the MANIFEST. 3. Secondary starts to replay WAL that contains the writes to "default". It is possible that the write batches' sequence numbers are smaller than the db sequence number. In this case, these write batches will be skipped, and these updates will not be visible to reader until "default" is later flushed. Pull Request resolved: facebook#5413 Differential Revision: D15637407 Pulled By: riversand963 fbshipit-source-id: 3de3fe35cfc6f1b9f844f3f926f0df29717b6580

…ook#5314) Summary: Instead of creating a new DataBlockIterator for every key in a MultiGet batch, reuse it if the next key is in the same block. This results in a small 1-2% cpu improvement. TEST_TMPDIR=/dev/shm/multiget numactl -C 10 ./db_bench.tmp -use_existing_db=true -benchmarks="readseq,multireadrandom" -write_buffer_size=4194304 -target_file_size_base=4194304 -max_bytes_for_level_base=16777216 -num=12000000 -reads=12000000 -duration=90 -threads=1 -compression_type=none -cache_size=4194304000 -batch_size=32 -disable_auto_compactions=true -bloom_bits=10 -cache_index_and_filter_blocks=true -pin_l0_filter_and_index_blocks_in_cache=true -multiread_batched=true -multiread_stride=4 Without the change - multireadrandom : 3.066 micros/op 326122 ops/sec; (29375968 of 29375968 found) With the change - multireadrandom : 3.003 micros/op 332945 ops/sec; (29983968 of 29983968 found) Pull Request resolved: facebook#5314 Differential Revision: D15742108 Pulled By: anand1976 fbshipit-source-id: 220fb0b8eea9a0d602ddeb371528f7af7936d771

…racing. (facebook#5421) Summary: BlockCacheLookupContext only contains the caller for now. We will trace block accesses at five places: 1. BlockBasedTable::GetFilter. 2. BlockBasedTable::GetUncompressedDict. 3. BlockBasedTable::MaybeReadAndLoadToCache. (To trace access on data, index, and range deletion block.) 4. BlockBasedTable::Get. (To trace the referenced key and whether the referenced key exists in a fetched data block.) 5. BlockBasedTable::MultiGet. (To trace the referenced key and whether the referenced key exists in a fetched data block.) We create the context at: 1. BlockBasedTable::Get. (kUserGet) 2. BlockBasedTable::MultiGet. (kUserMGet) 3. BlockBasedTable::NewIterator. (either kUserIterator, kCompaction, or external SST ingestion calls this function.) 4. BlockBasedTable::Open. (kPrefetch) 5. Index/Filter::CacheDependencies. (kPrefetch) 6. BlockBasedTable::ApproximateOffsetOf. (kCompaction or kUserApproximateSize). I loaded 1 million key-value pairs into the database and ran the readrandom benchmark with a single thread. I gave the block cache 10 GB to make sure all reads hit the block cache after warmup. The throughput is comparable. Throughput of this PR: 231334 ops/s. Throughput of the master branch: 238428 ops/s. Experiment setup: RocksDB: version 6.2 Date: Mon Jun 10 10:42:51 2019 CPU: 24 * Intel Core Processor (Skylake) CPUCache: 16384 KB Keys: 20 bytes each Values: 100 bytes each (100 bytes after compression) Entries: 1000000 Prefix: 20 bytes Keys per prefix: 0 RawSize: 114.4 MB (estimated) FileSize: 114.4 MB (estimated) Write rate: 0 bytes/second Read rate: 0 ops/second Compression: NoCompression Compression sampling rate: 0 Memtablerep: skip_list Perf Level: 1 Load command: ./db_bench --benchmarks="fillseq" --key_size=20 --prefix_size=20 --keys_per_prefix=0 --value_size=100 --statistics --cache_index_and_filter_blocks --cache_size=10737418240 --disable_auto_compactions=1 --disable_wal=1 --compression_type=none --min_level_to_compress=-1 --compression_ratio=1 --num=1000000 Run command: ./db_bench --benchmarks="readrandom,stats" --use_existing_db --threads=1 --duration=120 --key_size=20 --prefix_size=20 --keys_per_prefix=0 --value_size=100 --statistics --cache_index_and_filter_blocks --cache_size=10737418240 --disable_auto_compactions=1 --disable_wal=1 --compression_type=none --min_level_to_compress=-1 --compression_ratio=1 --num=1000000 --duration=120 TODOs: 1. Create a caller for external SST file ingestion and differentiate the callers for iterator. 2. Integrate tracer to trace block cache accesses. Pull Request resolved: facebook#5421 Differential Revision: D15704258 Pulled By: HaoyuHuang fbshipit-source-id: 4aa8a55f8cb1576ffb367bfa3186a91d8f06d93a

Summary: Use `CreateLoggerFromOptions` function to reduce code duplication. Test plan (on my machine) ``` $make clean && make -j32 db_secondary_test $KEEP_DB=1 ./db_secondary_test ``` Verify all info logs of the secondary instance are properly logged. Pull Request resolved: facebook#5427 Differential Revision: D15748922 Pulled By: riversand963 fbshipit-source-id: bad7261df1b8373efc504f141efc7871e375a311

Summary: FlushScheduler's methods are instrumented with debug-time locks to check the scheduler state against a simple container definition. Since facebook#2286 the scope of such locks are widened to the entire methods' body. The result is that the concurrency tested during testing (in debug mode) is stricter than the concurrency level manifested at runtime (in release mode). The patch reverts this change to reduce the scope of such locks. Pull Request resolved: facebook#5372 Differential Revision: D15545831 Pulled By: maysamyabandeh fbshipit-source-id: 01d69191afb1dd807d4bdc990fc74813ae7b5426

Summary: To avoid deadlock mutex_ should never be acquired before log_write_mutex_. The patch documents that and also fixes one case in ::FlushWAL that acquires mutex_ through ::WriteStatusCheck when it already holds lock on log_write_mutex_. Pull Request resolved: facebook#5437 Differential Revision: D15749722 Pulled By: maysamyabandeh fbshipit-source-id: f57b69c44b4b80cc6d7ddf3d3fdf4a9eb5a5a45a

…acebook#5438) Summary: This affects our "no compression" automated tests. Since PR facebook#5368, DBTest.DynamicMiscOptions has been failing with: db/db_test.cc:4889: Failure dbfull()->SetOptions({{"compression", "kSnappyCompression"}}) Invalid argument: Compression type Snappy is not linked with the binary. Pull Request resolved: facebook#5438 Differential Revision: D15752100 Pulled By: ltamasi fbshipit-source-id: 3f19eff7cafc03b333965be0203c5853d2a9cb71

Summary: If a memtable definitely covers a key, there isn't a need to check older memtables. We can skip them by checking the earliest sequence number. Pull Request resolved: facebook#4941 Differential Revision: D13932666 fbshipit-source-id: b9d52f234b8ad9dd3bf6547645cd457175a3ca9b

Summary: This PR contains the first commit for block cache trace analyzer. It reads a block cache trace file and prints statistics of the traces. We will extend this class to provide more functionalities. Pull Request resolved: facebook#5425 Differential Revision: D15709580 Pulled By: HaoyuHuang fbshipit-source-id: 2f43bd2311f460ab569880819d95eeae217c20bb

Summary: In secondary mode, it is possible that the secondary lists the primary's WAL directory, finds a WAL and tries to open it. It is possible that the primary deletes the WAL after secondary listing dir but before the secondary opening it. Then the secondary will fail to open the WAL file with a PathNotFound status. In this case, we can return OK without replaying WAL and optionally replay more MANIFEST. Test Plan (on my dev machine): Without this PR, the following will fail several times out of 100 runs. ``` ~/gtest-parallel/gtest-parallel -r 100 -w 16 ./db_secondary_test --gtest_filter=DBSecondaryTest.SwitchToNewManifestDuringOpen ``` With this PR, the above should always succeed. Pull Request resolved: facebook#5323 Differential Revision: D15763878 Pulled By: riversand963 fbshipit-source-id: c7164fa7cb8d9001abc258b6a2dc93613e4f38ff

…acebook#5111)" (facebook#5440) Summary: This reverts commit f3a7847. Pull Request resolved: facebook#5440 Differential Revision: D15765967 Pulled By: ltamasi fbshipit-source-id: d027fe24132e3729289cd7c01857a7eb449d9dd0

Summary: This is a port of this PR into WriteUnprepared: facebook#5014 This also reverts this test change to restore some flaky write unprepared tests: facebook#5315 Tested with: $ gtest-parallel ./transaction_test --gtest_filter=MySQLStyleTransactionTest/MySQLStyleTransactionTest.TransactionStressTest/9 --repeat=128 [128/128] MySQLStyleTransactionTest/MySQLStyleTransactionTest.TransactionStressTest/9 (18250 ms) Pull Request resolved: facebook#5439 Differential Revision: D15761405 Pulled By: lth fbshipit-source-id: ae2581fd942d8a5b3f9278fd6bc3c1ac0b2c964c

…book#5436) Summary: Internally PreparedHeap is currently using a priority_queue. The rationale was the in the initial design PreparedHeap::AddPrepared could be called in arbitrary order. With the recent optimizations, we call ::AddPrepared only from the main write queue, which results into in-order insertion into PreparedHeap. The patch thus replaces the underlying priority_queue with a more efficient deque implementation. Pull Request resolved: facebook#5436 Differential Revision: D15752147 Pulled By: maysamyabandeh fbshipit-source-id: e6960f2b2097e13137dded1ceeff3b10b03b0aeb

Summary: CLANG complains that passing const to thread is not necessary. The patch removes it form PreparedHeap::Concurrent test. Pull Request resolved: facebook#5443 Differential Revision: D15781598 Pulled By: maysamyabandeh fbshipit-source-id: 3aceb05d96182fa4726d6d37eed45fd3aac4c016

Summary: TSAN tests report a race condition. We temporarily exclude kPipelinedWrite from MultiThreaded until the race condition is fixed. Pull Request resolved: facebook#5442 Differential Revision: D15782349 Pulled By: maysamyabandeh fbshipit-source-id: 42b4f9b3fa9137f0675e13ad132c0a06800c1bdd

Summary: The tsan crash tests are failing with a data race compliant with pipelined write option. Temporarily disable it until its concurrency issue are fixed. Pull Request resolved: facebook#5445 Differential Revision: D15783824 Pulled By: maysamyabandeh fbshipit-source-id: 413a0c3230b86f524fc7eeea2cf8e8375406e65b

Summary: CLANG would complain if we pass const to lambda function and appveyor complains if we don't (facebook#5443). The patch fixes that by using the default capture mode. Pull Request resolved: facebook#5447 Differential Revision: D15788722 Pulled By: maysamyabandeh fbshipit-source-id: 47e7f49264afe31fdafe42cb8bf93da126abfca9

riversand963

Thanks @HaoyuHuang for the PR. Left a few comments.

db/version_set.h

trace_replay/block_cache_tracer.h

table/block_based/block_based_table_reader.h

trace_replay/block_cache_tracer.cc

facebook-github-bot · 2019-06-13T17:58:20Z

@HaoyuHuang has updated the pull request. Re-import the pull request

facebook-github-bot · 2019-06-13T18:00:34Z

@HaoyuHuang has updated the pull request. Re-import the pull request

…may integrate these functionalities.

Summary: Verified with an Ampere Computing eMAG aarch64 system. Pull Request resolved: facebook#5258 Differential Revision: D15807309 Pulled By: maysamyabandeh fbshipit-source-id: ab85d2fd3fe40e6094430ab0eba557b1e979510d

facebook-github-bot · 2019-06-13T18:55:15Z

@HaoyuHuang has updated the pull request. Re-import the pull request

riversand963

LGTM

include/rocksdb/utilities/stackable_db.h

trace_replay/block_cache_tracer.cc

trace_replay/block_cache_tracer.h

facebook-github-bot · 2019-06-13T19:25:00Z

@HaoyuHuang has updated the pull request. Re-import the pull request

…may integrate these functionalities.

facebook-github-bot · 2019-06-13T19:37:55Z

@HaoyuHuang has updated the pull request. Re-import the pull request

facebook-github-bot

@HaoyuHuang is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2019-06-13T20:51:31Z

@HaoyuHuang has updated the pull request. Re-import the pull request

facebook-github-bot

@HaoyuHuang is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2019-06-13T23:39:45Z

@HaoyuHuang merged this pull request in bb41780.

Summary: This PR integrates the block cache tracer class into db_impl.cc. db_impl.cc contains a member variable of AtomicBlockCacheTraceWriter class and passes its reference to the block_based_table_reader. Pull Request resolved: facebook#5433 Differential Revision: D15728016 Pulled By: HaoyuHuang fbshipit-source-id: 23d5659e8c82d556833dcc1a5558aac8c1f7db71

Summary: This PR integrates the block cache tracer class into db_impl.cc. db_impl.cc contains a member variable of AtomicBlockCacheTraceWriter class and passes its reference to the block_based_table_reader. Pull Request resolved: facebook/rocksdb#5433 Differential Revision: D15728016 Pulled By: HaoyuHuang fbshipit-source-id: 23d5659e8c82d556833dcc1a5558aac8c1f7db71 Signed-off-by: Changlong Chen <levisonchen@live.cn>

HaoyuHuang added 2 commits June 7, 2019 17:28

Create block cache tracer object in db_impl

4123ac2

Update trace writer

2b8bdae

facebook-github-bot added the CLA Signed label Jun 8, 2019

facebook-github-bot reviewed Jun 8, 2019

View reviewed changes

HaoyuHuang requested a review from riversand963 June 8, 2019 00:46

ltamasi and others added 2 commits June 7, 2019 19:37

Acquiring lock on trace writer after downsampling

93b8793

Maysam Yabandeh and others added 8 commits June 10, 2019 11:53

HaoyuHuang requested a review from ltamasi June 11, 2019 17:17

siying and others added 10 commits June 11, 2019 11:46

riversand963 reviewed Jun 13, 2019

View reviewed changes

Address comments

9def503

remove tools/Makefile

96470fc

HaoyuHuang requested a review from riversand963 June 13, 2019 18:03

HaoyuHuang and others added 2 commits June 13, 2019 11:41

StackableDB delegates Start/EndBlockCacheTrace to db_impl so MyRocks …

f20f2c0

…may integrate these functionalities.

Support rocksdbjava aarch64 build and test (facebook#5258)

5c76ba9

Summary: Verified with an Ampere Computing eMAG aarch64 system. Pull Request resolved: facebook#5258 Differential Revision: D15807309 Pulled By: maysamyabandeh fbshipit-source-id: ab85d2fd3fe40e6094430ab0eba557b1e979510d

riversand963 approved these changes Jun 13, 2019

View reviewed changes

include/rocksdb/utilities/stackable_db.h Outdated Show resolved Hide resolved

trace_replay/block_cache_tracer.cc Show resolved Hide resolved

trace_replay/block_cache_tracer.cc Outdated Show resolved Hide resolved

trace_replay/block_cache_tracer.h Outdated Show resolved Hide resolved

Address comments

c405b50

HaoyuHuang added 8 commits June 13, 2019 12:25

Create block cache tracer object in db_impl

5fb2ab1

Update trace writer

f126b62

Acquiring lock on trace writer after downsampling

06c1148

Address comments

7f8394e

remove tools/Makefile

6c42633

StackableDB delegates Start/EndBlockCacheTrace to db_impl so MyRocks …

1c4fe1d

…may integrate these functionalities.

Address comments

081030b

merge conflicts

80a01d5

facebook-github-bot reviewed Jun 13, 2019

View reviewed changes

Update TARGET

51697c2

facebook-github-bot reviewed Jun 13, 2019

View reviewed changes

facebook-github-bot closed this in bb41780 Jun 13, 2019

HaoyuHuang deleted the inte_block_tracer branch June 13, 2019 23:00

facebook-github-bot added the Merged label Jun 13, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate block cache tracer into db_impl #5433

Integrate block cache tracer into db_impl #5433

HaoyuHuang commented Jun 8, 2019

facebook-github-bot left a comment

facebook-github-bot commented Jun 10, 2019

riversand963 left a comment

facebook-github-bot commented Jun 13, 2019

facebook-github-bot commented Jun 13, 2019

facebook-github-bot commented Jun 13, 2019

riversand963 left a comment

facebook-github-bot commented Jun 13, 2019

facebook-github-bot commented Jun 13, 2019

facebook-github-bot left a comment

facebook-github-bot commented Jun 13, 2019

facebook-github-bot left a comment

facebook-github-bot commented Jun 13, 2019

Integrate block cache tracer into db_impl #5433

Integrate block cache tracer into db_impl #5433

Conversation

HaoyuHuang commented Jun 8, 2019

facebook-github-bot left a comment

Choose a reason for hiding this comment

facebook-github-bot commented Jun 10, 2019

riversand963 left a comment

Choose a reason for hiding this comment

facebook-github-bot commented Jun 13, 2019

facebook-github-bot commented Jun 13, 2019

facebook-github-bot commented Jun 13, 2019

riversand963 left a comment

Choose a reason for hiding this comment

facebook-github-bot commented Jun 13, 2019

facebook-github-bot commented Jun 13, 2019

facebook-github-bot left a comment

Choose a reason for hiding this comment

facebook-github-bot commented Jun 13, 2019

facebook-github-bot left a comment

Choose a reason for hiding this comment

facebook-github-bot commented Jun 13, 2019