-
Notifications
You must be signed in to change notification settings - Fork 6.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Account memory of big memory users in BlockBasedTableReader in global memory limit #9748
Conversation
19622e4
to
a951297
Compare
b2537fc
to
827d320
Compare
@@ -419,6 +423,12 @@ BlockBasedTableFactory::BlockBasedTableFactory( | |||
: table_options_(_table_options) { | |||
InitializeOptions(); | |||
RegisterOptions(&table_options_, &block_based_table_type_info); | |||
|
|||
if (table_options_.reserve_table_reader_memory && | |||
table_options_.no_block_cache == false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Note] A fun little thing I found is when table_options_.no_block_cache
is set to 1 by option string, the negate of it !table_options_.no_block_cache
in dbg build (but not DEBUG_LEVEL=1) evaluates to True
incorrectly ......
table/meta_blocks.cc
Outdated
} else if (key == TablePropertiesNames::kDbId) { | ||
new_table_properties->db_id = raw_val.ToString(); | ||
new_table_properties->IncreaseApproximateMemoryUsage(raw_val.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not my favorite code repetition either but couldn't find an easy way to avoid that ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be an 'else' for pos != predefined_uint64_properties.end()
with the rest of these if-else under that 'else'.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will fix this (previously I avoided it due to the nested if-else but now I think it is still better than code duplication)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think one "longer-term" fix is to avoid "key == TablePropertiesNames::XX" pattern and use set. Opened a tech debt for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done by on-the-fly memory approximation for table properties
@hx235 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks very much for doing this work. Left a few suggestions/questions.
827d320
to
5325760
Compare
@hx235 has updated the pull request. You must reimport the pull request before landing. |
5325760
to
3e9992d
Compare
@hx235 has updated the pull request. You must reimport the pull request before landing. |
3e9992d
to
99552c2
Compare
@hx235 has updated the pull request. You must reimport the pull request before landing. |
Waiting for CI signals and re-benchmarking
|
@hx235 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
99552c2
to
432c0c2
Compare
@hx235 has updated the pull request. You must reimport the pull request before landing. |
432c0c2
to
69a743c
Compare
@hx235 has updated the pull request. You must reimport the pull request before landing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mind importing it? Sorry it's hard to read on GitHub.
Replies to outdated comments:
Not sure how "disable the cache Insert() while that particular table is created." is done but updated the test to use ConfigureTableFactory() and added more comment. Let's see if this helps.
I meant since you are the implementor of the cache, you could make Insert() return Status::OK() without doing any work while you calculate per-table memory.
Sure - although I need to fix the ci/circleci: build-linux-clang10-clang-analyze. Importing it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
table/table_properties.cc
Outdated
std::pair<const uint64_t*, const uint64_t*> TEST_GetUint64TPropStartEndPosition( | ||
const TableProperties* const_props) { | ||
TableProperties* props = const_cast<TableProperties*>(const_props); | ||
uint64_t* pu = &props->orig_file_number; | ||
assert(static_cast<void*>(pu) == static_cast<void*>(props)); | ||
const uint64_t* pu_end = reinterpret_cast<const uint64_t*>(&props->db_id); | ||
return std::pair<const uint64_t*, const uint64_t*>(pu, pu_end); | ||
} | ||
|
||
std::pair<const std::string*, const std::string*> | ||
TEST_GetStringPropStartEndPosition(const TableProperties* const_props) { | ||
TableProperties* props = const_cast<TableProperties*>(const_props); | ||
std::string* ps = &props->db_id; | ||
assert(static_cast<void*>(const_cast<uint64_t*>( | ||
TEST_GetUint64TPropStartEndPosition(props).second)) == | ||
static_cast<void*>(ps)); | ||
const std::string* ps_end = | ||
reinterpret_cast<const std::string*>(&props->user_collected_properties); | ||
return std::pair<const std::string*, const std::string*>(ps, ps_end); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK it makes enough sense to me. Just make sure to eliminate all TEST_
in non-test code, const_cast
s, and reinterpret_cast
s (ideally).
edit: I kind of doubt we should be doing this (reinterpret_cast<const std::string*>(&props->user_collected_properties)
) at all. What if user_collected_properties
has different alignment from string?
@hx235 has updated the pull request. You must reimport the pull request before landing. |
@hx235 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
…ator && protect ReleaseCacheReservation
…etXXPropStartEndPos()
3a0af6e
to
4c73d59
Compare
@hx235 has updated the pull request. You must reimport the pull request before landing. |
@hx235 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
4c73d59
to
321fb98
Compare
@hx235 has updated the pull request. You must reimport the pull request before landing. |
@hx235 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
…tween fields in TableProperties struct (#9812) Summary: Context: #9748 (comment) reveals an issue with TEST_SetRandomTableProperties when non-zero padding is used between the last string field and first non-string field in TableProperties. Fixed by #9748 (comment) Pull Request resolved: #9812 Test Plan: No production code changes and rely on existing CI Reviewed By: ajkr Differential Revision: D35423680 Pulled By: hx235 fbshipit-source-id: fd855eef3d32771bb79c65bd7012ab8bb3c400ab
…Test.CapMemoryUsageUnderCacheCapacity (#9869) Summary: **Context:** Unit test for #9748 keeps opening new files to see whether the new feature is able to correctly constrain the opening based on block cache capacity. However, the unit test has two places written inefficiently that can lead to opening too many new files relative to underlying operating system/file system constraint, even before hitting the block cache capacity: (1) [opened_table_reader_num < 2 * max_table_reader_num](https://github.com/facebook/rocksdb/pull/9748/files?show-viewed-files=true&file-filters%5B%5D=#diff-ec9f5353e317df71093094734ba29193b94a998f0f9c9af924e4c99692195eeaR438), which can leads to 1200 + open files because of (2) below (2) NewLRUCache(6 * CacheReservationManagerImpl<CacheEntryRole::kBlockBasedTableReader>::GetDummyEntrySize()) in [here](https://github.com/facebook/rocksdb/pull/9748/files?show-viewed-files=true&file-filters%5B%5D=#diff-ec9f5353e317df71093094734ba29193b94a998f0f9c9af924e4c99692195eeaR364) Therefore we see CI failures like this on machine with a strict open file limit ~1000 (see the "table_1021" naming in following error msg) https://app.circleci.com/pipelines/github/facebook/rocksdb/12886/workflows/75524682-3fa4-41ee-9a61-81827b51d99b/jobs/345270 ``` fs_->NewWritableFile(path, foptions, &file, nullptr) IO error: While open a file for appending: /dev/shm/rocksdb.Jedwt/run-block_based_table_reader_test-CapMemoryUsageUnderCacheCapacity-BlockBasedTableReaderCapMemoryTest.CapMemoryUsageUnderCacheCapacity-0/block_based_table_reader_test_1668910_829492452552920927/**table_1021**: Too many open files ``` **Summary:** - Revised the test more efficiently on the above 2 places, including using 1.1 instead 2 in the threshold and lowering down the block cache capacity a bit - Renamed some variables for clarity Pull Request resolved: #9869 Test Plan: - Manual inspection of max opened table reader in all test case, which is around ~389 - Circle CI to see if error is gone Reviewed By: ajkr Differential Revision: D35752655 Pulled By: hx235 fbshipit-source-id: 8a0953d39d561babfa4257b8ed8550bb21b04839
…Test.CapMemoryUsageUnderCacheCapacity (facebook#9869) Summary: **Context:** Unit test for facebook#9748 keeps opening new files to see whether the new feature is able to correctly constrain the opening based on block cache capacity. However, the unit test has two places written inefficiently that can lead to opening too many new files relative to underlying operating system/file system constraint, even before hitting the block cache capacity: (1) [opened_table_reader_num < 2 * max_table_reader_num](https://github.com/facebook/rocksdb/pull/9748/files?show-viewed-files=true&file-filters%5B%5D=#diff-ec9f5353e317df71093094734ba29193b94a998f0f9c9af924e4c99692195eeaR438), which can leads to 1200 + open files because of (2) below (2) NewLRUCache(6 * CacheReservationManagerImpl<CacheEntryRole::kBlockBasedTableReader>::GetDummyEntrySize()) in [here](https://github.com/facebook/rocksdb/pull/9748/files?show-viewed-files=true&file-filters%5B%5D=#diff-ec9f5353e317df71093094734ba29193b94a998f0f9c9af924e4c99692195eeaR364) Therefore we see CI failures like this on machine with a strict open file limit ~1000 (see the "table_1021" naming in following error msg) https://app.circleci.com/pipelines/github/facebook/rocksdb/12886/workflows/75524682-3fa4-41ee-9a61-81827b51d99b/jobs/345270 ``` fs_->NewWritableFile(path, foptions, &file, nullptr) IO error: While open a file for appending: /dev/shm/rocksdb.Jedwt/run-block_based_table_reader_test-CapMemoryUsageUnderCacheCapacity-BlockBasedTableReaderCapMemoryTest.CapMemoryUsageUnderCacheCapacity-0/block_based_table_reader_test_1668910_829492452552920927/**table_1021**: Too many open files ``` **Summary:** - Revised the test more efficiently on the above 2 places, including using 1.1 instead 2 in the threshold and lowering down the block cache capacity a bit - Renamed some variables for clarity Pull Request resolved: facebook#9869 Test Plan: - Manual inspection of max opened table reader in all test case, which is around ~389 - Circle CI to see if error is gone Reviewed By: ajkr Differential Revision: D35752655 Pulled By: hx235 fbshipit-source-id: 8a0953d39d561babfa4257b8ed8550bb21b04839
Summary: **Context:** Previous PR #9748, #9073, #8428 added separate flag for each charged memory area. Such API design is not scalable as we charge more and more memory areas. Also, we foresee an opportunity to consolidate this feature with other cache usage related features such as `cache_index_and_filter_blocks` using `CacheEntryRole`. Therefore we decided to consolidate all these flags with `CacheUsageOptions cache_usage_options` and this PR serves as the first step by consolidating memory-charging related flags. **Summary:** - Replaced old API reference with new ones, including making `kCompressionDictionaryBuildingBuffer` opt-out and added a unit test for that - Added missing db bench/stress test for some memory charging features - Renamed related test suite to indicate they are under the same theme of memory charging - Refactored a commonly used mocked cache component in memory charging related tests to reduce code duplication - Replaced the phrases "memory tracking" / "cache reservation" (other than CacheReservationManager-related ones) with "memory charging" for standard description of this feature. Pull Request resolved: #9926 Test Plan: - New unit test for opt-out `kCompressionDictionaryBuildingBuffer` `TEST_F(ChargeCompressionDictionaryBuildingBufferTest, Basic)` - New unit test for option validation/sanitization `TEST_F(CacheUsageOptionsOverridesTest, SanitizeAndValidateOptions)` - CI - db bench (in case querying new options introduces regression) **+0.5% micros/op**: `TEST_TMPDIR=/dev/shm/testdb ./db_bench -benchmarks=fillseq -db=$TEST_TMPDIR -charge_compression_dictionary_building_buffer=1(remove this for comparison) -compression_max_dict_bytes=10000 -disable_auto_compactions=1 -write_buffer_size=100000 -num=4000000 | egrep 'fillseq'` #-run | (pre-PR) avg micros/op | std micros/op | (post-PR) micros/op | std micros/op | change (%) -- | -- | -- | -- | -- | -- 10 | 3.9711 | 0.264408 | 3.9914 | 0.254563 | 0.5111933721 20 | 3.83905 | 0.0664488 | 3.8251 | 0.0695456 | **-0.3633711465** 40 | 3.86625 | 0.136669 | 3.8867 | 0.143765 | **0.5289363078** - db_stress: `python3 tools/db_crashtest.py blackbox -charge_compression_dictionary_building_buffer=1 -charge_filter_construction=1 -charge_table_reader=1 -cache_size=1` killed as normal Reviewed By: ajkr Differential Revision: D36054712 Pulled By: hx235 fbshipit-source-id: d406e90f5e0c5ea4dbcb585a484ad9302d4302af
…rance rate from 1% to 5% (#10113) Summary: **Context:** #9748 added support to charge table reader memory to block cache. In the test `ChargeTableReaderTest/ChargeTableReaderTest.Basic`, it estimated the table reader memory, calculated the expected number of table reader opened based on this estimation and asserted this number with actual number. The expected number of table reader opened calculated based on estimated table reader memory will not be 100% accurate and should have tolerance for error. It was previously set to 1% and recently encountered an assertion failure that `(opened_table_reader_num) <= (max_table_reader_num_capped_upper_bound), actual: 375 or 376 vs 374` where `opened_table_reader_num` is the actual opened one and `max_table_reader_num_capped_upper_bound` is the estimated opened one (=371 * 1.01). I believe it's safe to increase error tolerance from 1% to 5% hence there is this PR. Pull Request resolved: #10113 Test Plan: - CI again succeeds. Reviewed By: ajkr Differential Revision: D36911556 Pulled By: hx235 fbshipit-source-id: 259687dd77b450fea0f5658a5b567a1d31d4b1f7
Context:
Through heap profiling, we discovered that
BlockBasedTableReader
objects can accumulate and lead to high memory usage (e.g,max_open_file = -1
). These memories are currently not saved, not tracked, not constrained and not cache evict-able. As a first step to improve this, similar to #8428, this PR is to track an estimate ofBlockBasedTableReader
object's memory in block cache and fail future creation if the memory usage exceeds the available space of cache at the time of creation.Summary:
BlockBasedTable::Rep
andTableProperties
)' memory usage in addition to the existing estimated ones (filter block/index block/un-compression dictionary)BlockBasedTable::Open()
and release them on~BlockBasedTable()
as there is no memory usage fluctuation of concern in betweenTest:
OpenDb
: -0.52% in ms./db_bench -benchmarks=fillseq -db=/dev/shm/testdb -disable_auto_compactions=1 -write_buffer_size=1048576
OpenDb
:./db_bench -benchmarks=readrandom -use_existing_db=1 -db=/dev/shm/testdb -reserve_table_reader_memory=true (remove this when running w/o feature) -file_opening_threads=3 -open_files=-1 -report_open_timing=true| egrep 'OpenDb:'
fillseq
: +0.54% in micros/op./db_bench -benchmarks=fillseq -db=/dev/shm/testdb -disable_auto_compactions=1 -cost_write_buffer_to_cache=true -write_buffer_size=10000000000 | egrep 'fillseq'
bloom filter
: -0.78% in ms/key./filter_bench -impl=2 -quick -reserve_table_builder_memory=true | grep 'Build avg'
python3 tools/db_crashtest.py blackbox --reserve_table_reader_memory=1 --cache_size=1
killed as normal