Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Account memory of big memory users in BlockBasedTableReader in global memory limit #9748

Closed
wants to merge 9 commits into from

Conversation

hx235
Copy link
Contributor

@hx235 hx235 commented Mar 24, 2022

Context:
Through heap profiling, we discovered that BlockBasedTableReader objects can accumulate and lead to high memory usage (e.g, max_open_file = -1). These memories are currently not saved, not tracked, not constrained and not cache evict-able. As a first step to improve this, similar to #8428, this PR is to track an estimate of BlockBasedTableReader object's memory in block cache and fail future creation if the memory usage exceeds the available space of cache at the time of creation.

Summary:

  • Approximate big memory users (BlockBasedTable::Rep and TableProperties )' memory usage in addition to the existing estimated ones (filter block/index block/un-compression dictionary)
  • Charge all of these memory usages to block cache on BlockBasedTable::Open() and release them on ~BlockBasedTable() as there is no memory usage fluctuation of concern in between
  • Refactor on CacheReservationManager (and its call-sites) to add concurrent support for BlockBasedTable used in this PR.

Test:

  • New unit tests
  • db bench: OpenDb : -0.52% in ms
    • Setup ./db_bench -benchmarks=fillseq -db=/dev/shm/testdb -disable_auto_compactions=1 -write_buffer_size=1048576
    • Repeated run with pre-change w/o feature and post-change with feature, benchmark OpenDb: ./db_bench -benchmarks=readrandom -use_existing_db=1 -db=/dev/shm/testdb -reserve_table_reader_memory=true (remove this when running w/o feature) -file_opening_threads=3 -open_files=-1 -report_open_timing=true| egrep 'OpenDb:'
#-run (feature-off) avg milliseconds std milliseconds (feature-on) avg milliseconds std milliseconds change (%)
10 11.4018 5.95173 9.47788 1.57538 -16.87382694
20 9.23746 0.841053 9.32377 1.14074 0.9343477536
40 9.0876 0.671129 9.35053 1.11713 2.893283155
80 9.72514 2.28459 9.52013 1.0894 -2.108041632
160 9.74677 0.991234 9.84743 1.73396 1.032752389
320 10.7297 5.11555 10.547 1.97692 -1.70275031
640 11.7092 2.36565 11.7869 2.69377 0.6635807741
  • db bench on write with cost to cache in WriteBufferManager (just in case this PR's CRM refactoring accidentally slows down anything in WBM) : fillseq : +0.54% in micros/op
    ./db_bench -benchmarks=fillseq -db=/dev/shm/testdb -disable_auto_compactions=1 -cost_write_buffer_to_cache=true -write_buffer_size=10000000000 | egrep 'fillseq'
#-run (pre-PR) avg micros/op std micros/op (post-PR) avg micros/op std micros/op change (%)
10 6.15 0.260187 6.289 0.371192 2.260162602
20 7.28025 0.465402 7.37255 0.451256 1.267813605
40 7.06312 0.490654 7.13803 0.478676 1.060579461
80 7.14035 0.972831 7.14196 0.92971 0.02254791432
  • filter bench: bloom filter: -0.78% in ms/key
    • ./filter_bench -impl=2 -quick -reserve_table_builder_memory=true | grep 'Build avg'
#-run (pre-PR) avg ns/key std ns/key (post-PR) ns/key std ns/key change (%)
10 26.4369 0.442182 26.3273 0.422919 -0.4145720565
20 26.4451 0.592787 26.1419 0.62451 -1.1465262
  • Crash test python3 tools/db_crashtest.py blackbox --reserve_table_reader_memory=1 --cache_size=1 killed as normal

@hx235 hx235 added WIP Work in progress and removed CLA Signed labels Mar 24, 2022
@hx235 hx235 force-pushed the cap_table_reader_mem branch 2 times, most recently from b2537fc to 827d320 Compare March 24, 2022 20:51
@hx235 hx235 changed the title Account memory of BlockBasedTable::Rep + TableProperties of BlockBasedTableReader Account memory of BlockBasedTable::Rep + TableProperties of BlockBasedTableReader in global memory limit Mar 24, 2022
@@ -419,6 +423,12 @@ BlockBasedTableFactory::BlockBasedTableFactory(
: table_options_(_table_options) {
InitializeOptions();
RegisterOptions(&table_options_, &block_based_table_type_info);

if (table_options_.reserve_table_reader_memory &&
table_options_.no_block_cache == false) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Note] A fun little thing I found is when table_options_.no_block_cache is set to 1 by option string, the negate of it !table_options_.no_block_cache in dbg build (but not DEBUG_LEVEL=1) evaluates to True incorrectly ......

} else if (key == TablePropertiesNames::kDbId) {
new_table_properties->db_id = raw_val.ToString();
new_table_properties->IncreaseApproximateMemoryUsage(raw_val.size());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not my favorite code repetition either but couldn't find an easy way to avoid that ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be an 'else' for pos != predefined_uint64_properties.end() with the rest of these if-else under that 'else'.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will fix this (previously I avoided it due to the nested if-else but now I think it is still better than code duplication)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think one "longer-term" fix is to avoid "key == TablePropertiesNames::XX" pattern and use set. Opened a tech debt for it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done by on-the-fly memory approximation for table properties

@hx235 hx235 removed the WIP Work in progress label Mar 25, 2022
@hx235 hx235 requested a review from ajkr March 25, 2022 02:08
@facebook-github-bot
Copy link
Contributor

@hx235 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Contributor

@ajkr ajkr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks very much for doing this work. Left a few suggestions/questions.

table/block_based/block_based_table_reader_test.cc Outdated Show resolved Hide resolved
table/block_based/block_based_table_reader_test.cc Outdated Show resolved Hide resolved
include/rocksdb/table.h Outdated Show resolved Hide resolved
table/block_based/block_based_table_reader.cc Outdated Show resolved Hide resolved
memory/cache_capacity_based_memory_allocator.h Outdated Show resolved Hide resolved
@facebook-github-bot
Copy link
Contributor

@hx235 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@hx235 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@hx235 has updated the pull request. You must reimport the pull request before landing.

@hx235
Copy link
Contributor Author

hx235 commented Mar 29, 2022

Waiting for CI signals and re-benchmarking
Update:

  • Clarified BlockBasedTableOptins::reserve_table_reader_memory API doc
  • Refactored block_based_table_reader_test, including refactoring out BlockBasedTableReaderTestBase, encapsulating details on approximating table reader mem, removing a unused function and minor code changes
  • Used CacheReservationManagerThreadsafeWrapper::CacheReservationHandle for BlockBasedTableReader's reservation for better RAII on cache reservation. This includes:
    • Modified CacheReservationManager and its API comment to add better support for multi-threaded usage of CacheReservationHandle; renamed usage of previous CacheReservationManagerHandle
    • Removed unused class CacheCapacityBasedMemoryAllocator

@facebook-github-bot
Copy link
Contributor

@hx235 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@hx235 hx235 changed the title Account memory of BlockBasedTable::Rep + TableProperties of BlockBasedTableReader in global memory limit Account memory of big memory users in BlockBasedTable in global memory limit Mar 29, 2022
@facebook-github-bot
Copy link
Contributor

@hx235 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@hx235 has updated the pull request. You must reimport the pull request before landing.

Copy link
Contributor

@ajkr ajkr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mind importing it? Sorry it's hard to read on GitHub.

Replies to outdated comments:

Not sure how "disable the cache Insert() while that particular table is created." is done but updated the test to use ConfigureTableFactory() and added more comment. Let's see if this helps.

I meant since you are the implementor of the cache, you could make Insert() return Status::OK() without doing any work while you calculate per-table memory.

@hx235
Copy link
Contributor Author

hx235 commented Mar 29, 2022

Sure - although I need to fix the ci/circleci: build-linux-clang10-clang-analyze. Importing it

Copy link
Contributor

@ajkr ajkr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Comment on lines 329 to 348
std::pair<const uint64_t*, const uint64_t*> TEST_GetUint64TPropStartEndPosition(
const TableProperties* const_props) {
TableProperties* props = const_cast<TableProperties*>(const_props);
uint64_t* pu = &props->orig_file_number;
assert(static_cast<void*>(pu) == static_cast<void*>(props));
const uint64_t* pu_end = reinterpret_cast<const uint64_t*>(&props->db_id);
return std::pair<const uint64_t*, const uint64_t*>(pu, pu_end);
}

std::pair<const std::string*, const std::string*>
TEST_GetStringPropStartEndPosition(const TableProperties* const_props) {
TableProperties* props = const_cast<TableProperties*>(const_props);
std::string* ps = &props->db_id;
assert(static_cast<void*>(const_cast<uint64_t*>(
TEST_GetUint64TPropStartEndPosition(props).second)) ==
static_cast<void*>(ps));
const std::string* ps_end =
reinterpret_cast<const std::string*>(&props->user_collected_properties);
return std::pair<const std::string*, const std::string*>(ps, ps_end);
}
Copy link
Contributor

@ajkr ajkr Apr 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK it makes enough sense to me. Just make sure to eliminate all TEST_ in non-test code, const_casts, and reinterpret_casts (ideally).

edit: I kind of doubt we should be doing this (reinterpret_cast<const std::string*>(&props->user_collected_properties)) at all. What if user_collected_properties has different alignment from string?

@facebook-github-bot
Copy link
Contributor

@hx235 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@hx235 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@hx235 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@hx235 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@hx235 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@hx235 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot pushed a commit that referenced this pull request Apr 7, 2022
…tween fields in TableProperties struct (#9812)

Summary:
Context:
#9748 (comment) reveals an issue with TEST_SetRandomTableProperties when non-zero padding is used between the last string field and first non-string field in TableProperties.
Fixed by #9748 (comment)

Pull Request resolved: #9812

Test Plan: No production code changes and rely on existing CI

Reviewed By: ajkr

Differential Revision: D35423680

Pulled By: hx235

fbshipit-source-id: fd855eef3d32771bb79c65bd7012ab8bb3c400ab
facebook-github-bot pushed a commit that referenced this pull request Apr 20, 2022
…Test.CapMemoryUsageUnderCacheCapacity (#9869)

Summary:
**Context:**
Unit test for #9748 keeps opening new files to see whether the new feature is able to correctly constrain the opening based on block cache capacity.

However, the unit test has two places written inefficiently that can lead to opening too many new files relative to underlying operating system/file system constraint, even before hitting the block cache capacity:
(1) [opened_table_reader_num < 2 * max_table_reader_num](https://github.com/facebook/rocksdb/pull/9748/files?show-viewed-files=true&file-filters%5B%5D=#diff-ec9f5353e317df71093094734ba29193b94a998f0f9c9af924e4c99692195eeaR438), which can leads to 1200 + open files because of (2) below
(2) NewLRUCache(6 * CacheReservationManagerImpl<CacheEntryRole::kBlockBasedTableReader>::GetDummyEntrySize()) in [here](https://github.com/facebook/rocksdb/pull/9748/files?show-viewed-files=true&file-filters%5B%5D=#diff-ec9f5353e317df71093094734ba29193b94a998f0f9c9af924e4c99692195eeaR364)

Therefore we see CI failures like this on machine with a strict open file limit ~1000 (see the "table_1021" naming in following error msg)
https://app.circleci.com/pipelines/github/facebook/rocksdb/12886/workflows/75524682-3fa4-41ee-9a61-81827b51d99b/jobs/345270
```
fs_->NewWritableFile(path, foptions, &file, nullptr)
IO error: While open a file for appending: /dev/shm/rocksdb.Jedwt/run-block_based_table_reader_test-CapMemoryUsageUnderCacheCapacity-BlockBasedTableReaderCapMemoryTest.CapMemoryUsageUnderCacheCapacity-0/block_based_table_reader_test_1668910_829492452552920927/**table_1021**: Too many open files
```

**Summary:**
- Revised the test more efficiently on the above 2 places,  including using 1.1 instead 2 in the threshold and lowering down the block cache capacity a bit
- Renamed some variables for clarity

Pull Request resolved: #9869

Test Plan:
- Manual inspection of max opened table reader in all test case, which is around ~389
- Circle CI to see if error is gone

Reviewed By: ajkr

Differential Revision: D35752655

Pulled By: hx235

fbshipit-source-id: 8a0953d39d561babfa4257b8ed8550bb21b04839
hx235 added a commit to hx235/rocksdb that referenced this pull request Apr 20, 2022
…Test.CapMemoryUsageUnderCacheCapacity (facebook#9869)

Summary:
**Context:**
Unit test for facebook#9748 keeps opening new files to see whether the new feature is able to correctly constrain the opening based on block cache capacity.

However, the unit test has two places written inefficiently that can lead to opening too many new files relative to underlying operating system/file system constraint, even before hitting the block cache capacity:
(1) [opened_table_reader_num < 2 * max_table_reader_num](https://github.com/facebook/rocksdb/pull/9748/files?show-viewed-files=true&file-filters%5B%5D=#diff-ec9f5353e317df71093094734ba29193b94a998f0f9c9af924e4c99692195eeaR438), which can leads to 1200 + open files because of (2) below
(2) NewLRUCache(6 * CacheReservationManagerImpl<CacheEntryRole::kBlockBasedTableReader>::GetDummyEntrySize()) in [here](https://github.com/facebook/rocksdb/pull/9748/files?show-viewed-files=true&file-filters%5B%5D=#diff-ec9f5353e317df71093094734ba29193b94a998f0f9c9af924e4c99692195eeaR364)

Therefore we see CI failures like this on machine with a strict open file limit ~1000 (see the "table_1021" naming in following error msg)
https://app.circleci.com/pipelines/github/facebook/rocksdb/12886/workflows/75524682-3fa4-41ee-9a61-81827b51d99b/jobs/345270
```
fs_->NewWritableFile(path, foptions, &file, nullptr)
IO error: While open a file for appending: /dev/shm/rocksdb.Jedwt/run-block_based_table_reader_test-CapMemoryUsageUnderCacheCapacity-BlockBasedTableReaderCapMemoryTest.CapMemoryUsageUnderCacheCapacity-0/block_based_table_reader_test_1668910_829492452552920927/**table_1021**: Too many open files
```

**Summary:**
- Revised the test more efficiently on the above 2 places,  including using 1.1 instead 2 in the threshold and lowering down the block cache capacity a bit
- Renamed some variables for clarity

Pull Request resolved: facebook#9869

Test Plan:
- Manual inspection of max opened table reader in all test case, which is around ~389
- Circle CI to see if error is gone

Reviewed By: ajkr

Differential Revision: D35752655

Pulled By: hx235

fbshipit-source-id: 8a0953d39d561babfa4257b8ed8550bb21b04839
facebook-github-bot pushed a commit that referenced this pull request May 17, 2022
Summary:
**Context:**
Previous PR #9748, #9073, #8428 added separate flag for each charged memory area. Such API design is not scalable as we charge more and more memory areas. Also, we foresee an opportunity to consolidate this feature with other cache usage related features such as `cache_index_and_filter_blocks` using `CacheEntryRole`.

Therefore we decided to consolidate all these flags with `CacheUsageOptions cache_usage_options` and this PR serves as the first step by consolidating memory-charging related flags.

**Summary:**
- Replaced old API reference with new ones, including making `kCompressionDictionaryBuildingBuffer` opt-out and added a unit test for that
- Added missing db bench/stress test for some memory charging features
- Renamed related test suite to indicate they are under the same theme of memory charging
- Refactored a commonly used mocked cache component in memory charging related tests to reduce code duplication
- Replaced the phrases "memory tracking" / "cache reservation" (other than CacheReservationManager-related ones) with "memory charging" for standard description of this feature.

Pull Request resolved: #9926

Test Plan:
- New unit test for opt-out `kCompressionDictionaryBuildingBuffer` `TEST_F(ChargeCompressionDictionaryBuildingBufferTest, Basic)`
- New unit test for option validation/sanitization `TEST_F(CacheUsageOptionsOverridesTest, SanitizeAndValidateOptions)`
- CI
- db bench (in case querying new options introduces regression) **+0.5% micros/op**: `TEST_TMPDIR=/dev/shm/testdb ./db_bench -benchmarks=fillseq -db=$TEST_TMPDIR  -charge_compression_dictionary_building_buffer=1(remove this for comparison)  -compression_max_dict_bytes=10000 -disable_auto_compactions=1 -write_buffer_size=100000 -num=4000000 | egrep 'fillseq'`

#-run | (pre-PR) avg micros/op | std micros/op | (post-PR)  micros/op | std micros/op | change (%)
-- | -- | -- | -- | -- | --
10 | 3.9711 | 0.264408 | 3.9914 | 0.254563 | 0.5111933721
20 | 3.83905 | 0.0664488 | 3.8251 | 0.0695456 | **-0.3633711465**
40 | 3.86625 | 0.136669 | 3.8867 | 0.143765 | **0.5289363078**

- db_stress: `python3 tools/db_crashtest.py blackbox  -charge_compression_dictionary_building_buffer=1 -charge_filter_construction=1 -charge_table_reader=1 -cache_size=1` killed as normal

Reviewed By: ajkr

Differential Revision: D36054712

Pulled By: hx235

fbshipit-source-id: d406e90f5e0c5ea4dbcb585a484ad9302d4302af
facebook-github-bot pushed a commit that referenced this pull request Jun 4, 2022
…rance rate from 1% to 5% (#10113)

Summary:
**Context:**
#9748 added support to charge table reader memory to block cache. In the test `ChargeTableReaderTest/ChargeTableReaderTest.Basic`, it estimated the table reader memory, calculated the expected number of table reader opened based on this estimation and asserted this number with actual number. The expected number of table reader opened calculated based on estimated table reader memory will not be 100% accurate and should have tolerance for error. It was previously set to 1% and recently encountered an assertion failure that `(opened_table_reader_num) <= (max_table_reader_num_capped_upper_bound), actual: 375 or 376 vs 374` where `opened_table_reader_num` is the actual opened one and `max_table_reader_num_capped_upper_bound` is the estimated opened one (=371 * 1.01). I believe it's safe to increase error tolerance from 1% to 5% hence there is this PR.

Pull Request resolved: #10113

Test Plan: - CI again succeeds.

Reviewed By: ajkr

Differential Revision: D36911556

Pulled By: hx235

fbshipit-source-id: 259687dd77b450fea0f5658a5b567a1d31d4b1f7
@hx235 hx235 changed the title Account memory of big memory users in BlockBasedTable in global memory limit Account memory of big memory users in BlockBasedTableReader in global memory limit Jan 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants