Reduce iterator key comparison for upper/lower bound check #5111

yiwu-arbug · 2019-03-26T22:29:30Z

Summary:
Previously if iterator upper/lower bound presents, DBIter will check the bound for every key. This patch turns the check into per-file or per-data block check when applicable, by checking against either file largest/smallest key or block index key.

Test Plan:
Test 1: sequential scan
Use the following db_bench test, and compare

master: master before Avoid per-key upper bound check in BlockBasedTableIterator #5101
bound: master with per-block bound check
block: master with Avoid per-key upper bound check in BlockBasedTableIterator #5101
full: master with Avoid per-key upper bound check in BlockBasedTableIterator #5101 and this patch.

./db_bench --db=/dev/shm/db_bench --use_existing_db=false --benchmarks=fillrandom --num=100000000 --compression_type=none --target_file_size_base=8000000
./db_bench  --db=/dev/shm/db_bench --use_existing_db=true --benchmarks=seekrandom --disable_auto_compactions=true --seek_nexts=1000000 --max_scan_distance=1000000 --perf_level=2 --cache_size=20000000000 --reads=2000 -num=100000000

The benchmark do multiple (relatively long) range scan on a DB that's uncompressed, with smaller SST files, and can full cached in memory, to amplify the benefit.

Throughput & latency:

master: 153758.065 micros/op 6 ops/sec; 453.2 MB/s (1641 of 2000 found)
bound: 151067.057 micros/op 6 ops/sec; 461.3 MB/s (1641 of 2000 found)
block: 143366.367 micros/op 6 ops/sec; 486.0 MB/s (1641 of 2000 found)
full: 139727.842 micros/op 7 ops/sec; 498.7 MB/s (1641 of 2000 found)
~10% throughput improvement.

Key comparison count (reported by perf context):

master: 7317070743
bound: 5900740076
block: 5938053000
full: 4478580527
~40% less key comparison.
Also verified block_cache_hit and block_read_count are all the same, as a proxy to verify the behavior of the iterator is the same.

CPU usage ("task-clock (msec)" as reported by perf stat):

master: 308792
bound: 303408
block: 287866
full: 280581
~9% cpu usage reduction.

Test 2: reverse scan
Test with the same benchmark but added --reverse_iterator=true.

Throughput & latency:

master: 203234.688 micros/op 4 ops/sec; 342.1 MB/s (1641 of 2000 found)
full: 198474.047 micros/op 5 ops/sec; 350.3 MB/s (1641 of 2000 found)
~2% throughput improvement

Key comparison count:

master: 8249900053
full: 7103805458
~14% less key comparison

CPU usage:

master: 407625
full: 398032
~2% cpu usage reduction

Also verified the optimization works when partitioned-index is being used.

yiwu-arbug · 2019-03-26T22:30:11Z

The patch contains both #5101 and #5098. I'll rebase once those patches are merged.

siying

@yiwu-arbug thank you for your continuous contibution!
I think it is a great approach in general. My only concern is that HintWithinLowerBound() is a virtual function, and calling it for every key is kind of expensive. And this is really virtual, in the sense that compiler or CPU really can't predict the class is BlockBasedTableIterator, or LevelIterator at all. I wonder whether there is a better way to solve a problem. Is it possible to make a protected field of InernalIterator and return it as a non-virtual function, for example?

yiwu-arbug · 2019-03-28T21:28:40Z

@siying Valid concern. But to do that, caller (e.g. LevelIterator) will need to pass down a callback to callee (e.g. BlockBasedTableIterator) and the callee will need to use this callback to update the protected field of caller. I feel it is making iterators of different layers couple together more. Not sure if there's a better way to avoid the virtual call.

siying · 2019-03-28T21:44:49Z

Instead of relying on callback, can levelIterator fetch SST file iterator's value and put it in local variable after each Seek(), Next(), Prev(), etc? Is it possible?

yiwu-arbug · 2019-03-28T21:47:47Z

Sure. That's doable.

yiwu-arbug · 2019-03-28T23:31:20Z

Still WIP

siying · 2019-03-29T19:41:37Z

@yiwu-arbug do you still plan to move away from the virtual function?

yiwu-arbug · 2019-03-30T01:44:38Z

yes, still working on it.

…rator (#5101)" (#5132)" This reverts commit ebcc8ae.

Signed-off-by: Yi Wu <yiwu@pingcap.com>

yiwu-arbug · 2019-04-04T22:50:49Z

@siying I tried to remove virtual call, and the patch is here: https://github.com/yiwu-arbug/rocksdb/commit/01dd9aac1a92b83c3a4c8086400d27595667594a
Quick benchmark shows it end up being 3-5% slower than with virtual call. I haven't dig deep enough. My current theory is since LevelIterator don't know whether DBIter really need the hint, it's doing more work to pass the hint up the chain when it is not needed.

I also did a quick benchmark for short range scan and #5142 and #5111 combined is neutral vs. master.

./db_bench --db=/dev/shm/db_bench --use_existing_db=true --benchmarks=seekrandom --disable_auto_compactions=true --seek_nexts=5 --max_scan_distance=1000000 --perf_level=2 --cache_size=20000000000 --reads=2000000 -num=100000000

Shall we keeping the virtual call?

siying · 2019-04-04T22:59:48Z

@yiwu-arbug Valid() seems to be pretty expensive call. Do you need to call it?

yiwu-arbug · 2019-04-04T23:53:24Z

Changed Valid() to a simpler null check (bd576fa), getting similar result:
master: seekrandom : 164920.829 micros/op 6 ops/sec; 422.5 MB/s (1641 of 2000 found)
w/o virtual: seekrandom : 154005.544 micros/op 6 ops/sec; 452.5 MB/s (1641 of 2000 found)
w/ virtual: seekrandom : 143550.769 micros/op 6 ops/sec; 485.4 MB/s (1641 of 2000 found)

my second guess is: passing the hint layer by layer up the chain is a lot of random memory access, which is more expensive than virtual table lookup.

Signed-off-by: Yi Wu <yiwu@pingcap.com>

db/version_set.cc

Signed-off-by: Yi Wu <yiwu@pingcap.com>

siying

I'm not able to understand why moving away from virtual function is slower. Do you have any theory?

db/db_iter.cc

siying · 2019-04-11T22:24:47Z

db/version_set.cc

+    bool hint_within_upper_bound =
+        read_options_.iterate_upper_bound != nullptr &&
+        user_comparator_.Compare(ExtractUserKey(file_largest_key(file_index_)),
+                                 *read_options_.iterate_upper_bound) < 0;


We do allow users to change upper bound for a new Seek(). So maybe we should do it in InitFileIterator and persistent the result in this iterator, even if we don't create a new file iterator.

good point, will fix.

wait, the upper bound check result won't change even in InitFileIterator we don't create new file iterator, so it is safe to not updating it in that case. Do I miss anything?

db/version_set.cc

siying · 2019-04-11T22:49:17Z

@yiwu-arbug comparing the virtual function and non-virtual function cases, are they doing the same amount of comparisons?

yiwu-arbug · 2019-04-11T22:51:38Z

@siying regarding virtual call, my theory is we shall not just look at the part that makes the function call, but overall cpu cost. In the non-virtual attempt, we are passing the result bottom up, which also takes cpu cycle. Those cpu cycles may look trivial, but the virtual call that they comparing with is not that significant as well.

yiwu-arbug · 2019-04-11T22:52:12Z

@siying yes, I verified and the number of comparison is exactly the same.

siying · 2019-04-11T23:20:16Z

@yiwu-arbug it's roughly the same number of calls, isn't? If it is non-virtual, it's going to be inlined so it's not even a function call, but directly reading from memory location.

yiwu-arbug · 2019-04-12T00:03:39Z

@siying but the total cpu cost also include the effort to make it non-virtual. At least in my patch, the return value is copied several times layer by layer up, which is maybe not cheaper than just a virtual table lookup.

siying · 2019-04-12T00:08:52Z

@yiwu-arbug isn't the CPU mostly reading from one variable to another?

yiwu-arbug · 2019-04-12T18:02:27Z

@siying Forgive for my laziness but resolving issue with virtual calls is not the main goal of this patch. As I can see the method we are trying is at least making the code very messy. And we don't use the same pattern in the rest of the rocksdb code. As per-my bench the virtual call doesn't cause noticeable regression, but if it does I'll come back to fix.

siying · 2019-05-15T21:21:03Z

table/block_based_table_reader.h

@@ -705,6 +710,8 @@ class BlockBasedTableIterator : public InternalIteratorBase<TValue> {
  TBlockIter block_iter_;
  bool block_iter_points_to_real_block_;
  bool is_out_of_bound_ = false;
+  // Whether current data block being fully within iterate upper bound.
+  bool data_block_within_upper_bound_ = true;


Should be default false to be safe?

Neither one is safer to me:

when used to check if next block needs to be read, setting to true is safer, it allow the next block to be read.

when used to check whether upper bound needs to be checked, setting to false is safer, it force upper bound to be checked.

The initial value shouldn't be used anyway, but I don't know a way to assert that.

Then it's better to find why it failed in the first place then. It introduces a potential bug. When you try to reproduce, which flavor did you run? Did you run ASAN?

btw, I don't understand 1. If the value is false, will we skip the next block? Why?

if the current block is not completely within upper bound, the next block will be completely out of upper bound, so it's safe to skip.

yiwu-arbug · 2019-05-16T22:05:45Z

@siying was the test fixed?

siying · 2019-05-16T22:11:45Z

Yes it is fixed.

siying

I read the code and suspect that the failure happens in read_options_.iterate_upper_bound == nullptr case.

yiwu-arbug · 2019-05-16T23:35:21Z

I read the code and suspect that the failure happens in read_options_.iterate_upper_bound == nullptr case.

I think so. Do you prefer null-checking read_options_.iterate_upper_bound in every NextAndGetReesult()?

siying · 2019-05-16T23:42:34Z

@yiwu-arbug if possible, I hope we avoid extra branching for read_options.iterate_upper_bound == nullptr because in some use cases, upper bound is on and off and causing branching mis-prediction. Hope about set a value safe for return value? Internally, looks like all uses are after a upper bound null check.

facebook-github-bot · 2019-05-16T23:52:55Z

@yiwu-arbug has updated the pull request. Re-import the pull request

yiwu-arbug · 2019-05-16T23:53:48Z

@siying make sense. done.

facebook-github-bot

@siying has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2019-05-17T17:48:16Z

@siying merged this pull request in f3a7847.

siying · 2019-05-17T18:11:46Z

Landed. Thank you for your contribution!

yiwu-arbug · 2019-05-17T18:27:59Z

Thanks much for the help to get the tricky patch merged!

…bleIterator Summary: PR facebook#5111 reduced the number of key comparisons when iterating with upper/lower bounds; however, this caused a regression for MyRocks. Reverting to the previous behavior as a hotfix. Test Plan: make check + affected MyRocks test

…bleIterator (#5428) Summary: PR #5111 reduced the number of key comparisons when iterating with upper/lower bounds; however, this caused a regression for MyRocks. Reverting to the previous behavior in BlockBasedTableIterator as a hotfix. Pull Request resolved: #5428 Differential Revision: D15721038 Pulled By: ltamasi fbshipit-source-id: 5450106442f1763bccd17f6cfd648697f2ae8b6c

…acebook#5111)" This reverts commit f3a7847.

…5111)" (#5440) Summary: This reverts commit f3a7847. Pull Request resolved: #5440 Differential Revision: D15765967 Pulled By: ltamasi fbshipit-source-id: d027fe24132e3729289cd7c01857a7eb449d9dd0

…5111) Summary: Previously if iterator upper/lower bound presents, `DBIter` will check the bound for every key. This patch turns the check into per-file or per-data block check when applicable, by checking against either file largest/smallest key or block index key. Pull Request resolved: facebook#5111 Differential Revision: D15330061 Pulled By: siying fbshipit-source-id: 8a653fe3cd50d94d81eb2d13b087326c58ee2024

…bleIterator (facebook#5428) Summary: PR facebook#5111 reduced the number of key comparisons when iterating with upper/lower bounds; however, this caused a regression for MyRocks. Reverting to the previous behavior in BlockBasedTableIterator as a hotfix. Pull Request resolved: facebook#5428 Differential Revision: D15721038 Pulled By: ltamasi fbshipit-source-id: 5450106442f1763bccd17f6cfd648697f2ae8b6c

…acebook#5111)" (facebook#5440) Summary: This reverts commit f3a7847. Pull Request resolved: facebook#5440 Differential Revision: D15765967 Pulled By: ltamasi fbshipit-source-id: d027fe24132e3729289cd7c01857a7eb449d9dd0

…pt) (#5468) Summary: This is a second attempt for #5111, with the fix to redo iterate bounds check after `SeekXXX()`. This is because MyRocks may change iterate bounds between seek. See #5111 for original benchmark result and discussion. Closes #5463. Pull Request resolved: #5468 Test Plan: Existing rocksdb tests, plus myrocks test `rocksdb.optimizer_loose_index_scans` and `rocksdb.group_min_max`. Differential Revision: D15863332 fbshipit-source-id: ab4aba5899838591806b8673899bd465f3f53e18

…pt) (facebook#5468) Summary: This is a second attempt for facebook#5111, with the fix to redo iterate bounds check after `SeekXXX()`. This is because MyRocks may change iterate bounds between seek. See facebook#5111 for original benchmark result and discussion. Closes facebook#5463. Pull Request resolved: facebook#5468 Test Plan: Existing rocksdb tests, plus myrocks test `rocksdb.optimizer_loose_index_scans` and `rocksdb.group_min_max`. Differential Revision: D15863332 fbshipit-source-id: ab4aba5899838591806b8673899bd465f3f53e18

facebook-github-bot added the CLA Signed label Mar 26, 2019

This was referenced Mar 26, 2019

Improvement for upper/lower bound #5059

Closed

Avoid per-key upper bound check in BlockBasedTableIterator #5101

Closed

petermattis mentioned this pull request Mar 26, 2019

perf: optimize {Lower,Upper}Bound handling cockroachdb/pebble#28

Closed

siying reviewed Mar 28, 2019

View reviewed changes

Yi Wu added 3 commits April 2, 2019 22:41

Revert "Revert "Avoid per-key upper bound check in BlockBasedTableIte…

49c20f0

…rator (#5101)" (#5132)" This reverts commit ebcc8ae.

Fix issue with format_version=4

df6426f

Signed-off-by: Yi Wu <yiwu@pingcap.com>

Fix skipping next file while index key is out of bound

cb7aa3f

Signed-off-by: Yi Wu <yiwu@pingcap.com>

Yi Wu added 2 commits April 5, 2019 18:13

add InternalItearator::user_key()

4adb3d3

Signed-off-by: Yi Wu <yiwu@pingcap.com>

Reduce iterator key comparison for upper/lower bound check

dc3728d

Signed-off-by: Yi Wu <yiwu@pingcap.com>

siying reviewed Apr 5, 2019

View reviewed changes

db/version_set.cc Outdated Show resolved Hide resolved

LevelIterator handle hint_within_lower_bound by itself

c9be457

Signed-off-by: Yi Wu <yiwu@pingcap.com>

siying reviewed Apr 11, 2019

View reviewed changes

siying reviewed May 15, 2019

View reviewed changes

siying reviewed May 16, 2019

View reviewed changes

flip data_block_within_upper_bound_ default value

09dc619

facebook-github-bot reviewed May 16, 2019

View reviewed changes

facebook-github-bot closed this in f3a7847 May 17, 2019

facebook-github-bot added the Merged label May 17, 2019

yiwu-arbug deleted the iter_bound branch May 17, 2019 18:28

ltamasi mentioned this pull request Jun 7, 2019

Revert to checking the upper bound on a per-key basis in BlockBasedTableIterator #5428

Closed

ltamasi added a commit to ltamasi/rocksdb that referenced this pull request Jun 11, 2019

Revert "Reduce iterator key comparison for upper/lower bound check (f…

3562eb5

…acebook#5111)" This reverts commit f3a7847.

This was referenced Jun 17, 2019

Range scan optimization #5111 cause MyRocks test failure #5463

Closed

Reduce iterator key comparison for upper/lower bound check (2nd attempt) #5468

Closed

Reduce iterator key comparison for upper/lower bound check #5111

Reduce iterator key comparison for upper/lower bound check #5111

Conversation

yiwu-arbug commented Mar 26, 2019 • edited Loading

yiwu-arbug commented Mar 26, 2019

siying left a comment

Choose a reason for hiding this comment

yiwu-arbug commented Mar 28, 2019

siying commented Mar 28, 2019

yiwu-arbug commented Mar 28, 2019

yiwu-arbug commented Mar 28, 2019

siying commented Mar 29, 2019

yiwu-arbug commented Mar 30, 2019

yiwu-arbug commented Apr 4, 2019

siying commented Apr 4, 2019

yiwu-arbug commented Apr 4, 2019

siying left a comment

Choose a reason for hiding this comment

siying Apr 11, 2019

Choose a reason for hiding this comment

yiwu-arbug Apr 11, 2019

Choose a reason for hiding this comment

yiwu-arbug Apr 16, 2019

Choose a reason for hiding this comment

siying commented Apr 11, 2019

yiwu-arbug commented Apr 11, 2019

yiwu-arbug commented Apr 11, 2019

siying commented Apr 11, 2019

yiwu-arbug commented Apr 12, 2019

siying commented Apr 12, 2019

yiwu-arbug commented Apr 12, 2019 • edited Loading

siying May 15, 2019

Choose a reason for hiding this comment

yiwu-arbug May 15, 2019 • edited Loading

Choose a reason for hiding this comment

siying May 16, 2019

Choose a reason for hiding this comment

siying May 16, 2019

Choose a reason for hiding this comment

yiwu-arbug May 16, 2019

Choose a reason for hiding this comment

yiwu-arbug commented May 16, 2019

siying commented May 16, 2019

siying left a comment

Choose a reason for hiding this comment

yiwu-arbug commented May 16, 2019

siying commented May 16, 2019

facebook-github-bot commented May 16, 2019

yiwu-arbug commented May 16, 2019

facebook-github-bot left a comment

Choose a reason for hiding this comment

facebook-github-bot commented May 17, 2019

siying commented May 17, 2019

yiwu-arbug commented May 17, 2019

yiwu-arbug commented Mar 26, 2019 •

edited

Loading

yiwu-arbug commented Apr 12, 2019 •

edited

Loading

yiwu-arbug May 15, 2019 •

edited

Loading