Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce iterator key comparison for upper/lower bound check #5111

Closed
wants to merge 20 commits into from
Closed

Reduce iterator key comparison for upper/lower bound check #5111

wants to merge 20 commits into from

Conversation

yiwu-arbug
Copy link
Contributor

@yiwu-arbug yiwu-arbug commented Mar 26, 2019

Summary:
Previously if iterator upper/lower bound presents, DBIter will check the bound for every key. This patch turns the check into per-file or per-data block check when applicable, by checking against either file largest/smallest key or block index key.

Test Plan:
Test 1: sequential scan
Use the following db_bench test, and compare

./db_bench --db=/dev/shm/db_bench --use_existing_db=false --benchmarks=fillrandom --num=100000000 --compression_type=none --target_file_size_base=8000000
./db_bench  --db=/dev/shm/db_bench --use_existing_db=true --benchmarks=seekrandom --disable_auto_compactions=true --seek_nexts=1000000 --max_scan_distance=1000000 --perf_level=2 --cache_size=20000000000 --reads=2000 -num=100000000

The benchmark do multiple (relatively long) range scan on a DB that's uncompressed, with smaller SST files, and can full cached in memory, to amplify the benefit.

Throughput & latency:

  • master: 153758.065 micros/op 6 ops/sec; 453.2 MB/s (1641 of 2000 found)
  • bound: 151067.057 micros/op 6 ops/sec; 461.3 MB/s (1641 of 2000 found)
  • block: 143366.367 micros/op 6 ops/sec; 486.0 MB/s (1641 of 2000 found)
  • full: 139727.842 micros/op 7 ops/sec; 498.7 MB/s (1641 of 2000 found)
    ~10% throughput improvement.

Key comparison count (reported by perf context):

  • master: 7317070743
  • bound: 5900740076
  • block: 5938053000
  • full: 4478580527
    ~40% less key comparison.
    Also verified block_cache_hit and block_read_count are all the same, as a proxy to verify the behavior of the iterator is the same.

CPU usage ("task-clock (msec)" as reported by perf stat):

  • master: 308792
  • bound: 303408
  • block: 287866
  • full: 280581
    ~9% cpu usage reduction.

Test 2: reverse scan
Test with the same benchmark but added --reverse_iterator=true.

Throughput & latency:

  • master: 203234.688 micros/op 4 ops/sec; 342.1 MB/s (1641 of 2000 found)
  • full: 198474.047 micros/op 5 ops/sec; 350.3 MB/s (1641 of 2000 found)
    ~2% throughput improvement

Key comparison count:

  • master: 8249900053
  • full: 7103805458
    ~14% less key comparison

CPU usage:

  • master: 407625
  • full: 398032
    ~2% cpu usage reduction

Also verified the optimization works when partitioned-index is being used.

@yiwu-arbug
Copy link
Contributor Author

The patch contains both #5101 and #5098. I'll rebase once those patches are merged.

Copy link
Contributor

@siying siying left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yiwu-arbug thank you for your continuous contibution!
I think it is a great approach in general. My only concern is that HintWithinLowerBound() is a virtual function, and calling it for every key is kind of expensive. And this is really virtual, in the sense that compiler or CPU really can't predict the class is BlockBasedTableIterator, or LevelIterator at all. I wonder whether there is a better way to solve a problem. Is it possible to make a protected field of InernalIterator and return it as a non-virtual function, for example?

@yiwu-arbug
Copy link
Contributor Author

@siying Valid concern. But to do that, caller (e.g. LevelIterator) will need to pass down a callback to callee (e.g. BlockBasedTableIterator) and the callee will need to use this callback to update the protected field of caller. I feel it is making iterators of different layers couple together more. Not sure if there's a better way to avoid the virtual call.

@siying
Copy link
Contributor

siying commented Mar 28, 2019

Instead of relying on callback, can levelIterator fetch SST file iterator's value and put it in local variable after each Seek(), Next(), Prev(), etc? Is it possible?

@yiwu-arbug
Copy link
Contributor Author

Sure. That's doable.

@yiwu-arbug
Copy link
Contributor Author

Still WIP

@siying
Copy link
Contributor

siying commented Mar 29, 2019

@yiwu-arbug do you still plan to move away from the virtual function?

@yiwu-arbug
Copy link
Contributor Author

yes, still working on it.

Yi Wu added 3 commits April 2, 2019 22:41
Signed-off-by: Yi Wu <yiwu@pingcap.com>
Signed-off-by: Yi Wu <yiwu@pingcap.com>
@yiwu-arbug
Copy link
Contributor Author

@siying I tried to remove virtual call, and the patch is here: https://github.com/yiwu-arbug/rocksdb/commit/01dd9aac1a92b83c3a4c8086400d27595667594a
Quick benchmark shows it end up being 3-5% slower than with virtual call. I haven't dig deep enough. My current theory is since LevelIterator don't know whether DBIter really need the hint, it's doing more work to pass the hint up the chain when it is not needed.

I also did a quick benchmark for short range scan and #5142 and #5111 combined is neutral vs. master.

./db_bench --db=/dev/shm/db_bench --use_existing_db=true --benchmarks=seekrandom --disable_auto_compactions=true --seek_nexts=5 --max_scan_distance=1000000 --perf_level=2 --cache_size=20000000000 --reads=2000000 -num=100000000

Shall we keeping the virtual call?

@siying
Copy link
Contributor

siying commented Apr 4, 2019

@yiwu-arbug Valid() seems to be pretty expensive call. Do you need to call it?

@yiwu-arbug
Copy link
Contributor Author

Changed Valid() to a simpler null check (bd576fa), getting similar result:
master: seekrandom : 164920.829 micros/op 6 ops/sec; 422.5 MB/s (1641 of 2000 found)
w/o virtual: seekrandom : 154005.544 micros/op 6 ops/sec; 452.5 MB/s (1641 of 2000 found)
w/ virtual: seekrandom : 143550.769 micros/op 6 ops/sec; 485.4 MB/s (1641 of 2000 found)

my second guess is: passing the hint layer by layer up the chain is a lot of random memory access, which is more expensive than virtual table lookup.

Yi Wu added 2 commits April 5, 2019 18:13
Signed-off-by: Yi Wu <yiwu@pingcap.com>
Signed-off-by: Yi Wu <yiwu@pingcap.com>
db/version_set.cc Outdated Show resolved Hide resolved
Signed-off-by: Yi Wu <yiwu@pingcap.com>
Copy link
Contributor

@siying siying left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not able to understand why moving away from virtual function is slower. Do you have any theory?

db/db_iter.cc Outdated Show resolved Hide resolved
bool hint_within_upper_bound =
read_options_.iterate_upper_bound != nullptr &&
user_comparator_.Compare(ExtractUserKey(file_largest_key(file_index_)),
*read_options_.iterate_upper_bound) < 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do allow users to change upper bound for a new Seek(). So maybe we should do it in InitFileIterator and persistent the result in this iterator, even if we don't create a new file iterator.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, will fix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait, the upper bound check result won't change even in InitFileIterator we don't create new file iterator, so it is safe to not updating it in that case. Do I miss anything?

db/version_set.cc Outdated Show resolved Hide resolved
@siying
Copy link
Contributor

siying commented Apr 11, 2019

@yiwu-arbug comparing the virtual function and non-virtual function cases, are they doing the same amount of comparisons?

@yiwu-arbug
Copy link
Contributor Author

@siying regarding virtual call, my theory is we shall not just look at the part that makes the function call, but overall cpu cost. In the non-virtual attempt, we are passing the result bottom up, which also takes cpu cycle. Those cpu cycles may look trivial, but the virtual call that they comparing with is not that significant as well.

@yiwu-arbug
Copy link
Contributor Author

@siying yes, I verified and the number of comparison is exactly the same.

@siying
Copy link
Contributor

siying commented Apr 11, 2019

@yiwu-arbug it's roughly the same number of calls, isn't? If it is non-virtual, it's going to be inlined so it's not even a function call, but directly reading from memory location.

@yiwu-arbug
Copy link
Contributor Author

@siying but the total cpu cost also include the effort to make it non-virtual. At least in my patch, the return value is copied several times layer by layer up, which is maybe not cheaper than just a virtual table lookup.

@siying
Copy link
Contributor

siying commented Apr 12, 2019

@yiwu-arbug isn't the CPU mostly reading from one variable to another?

@yiwu-arbug
Copy link
Contributor Author

yiwu-arbug commented Apr 12, 2019

@siying Forgive for my laziness but resolving issue with virtual calls is not the main goal of this patch. As I can see the method we are trying is at least making the code very messy. And we don't use the same pattern in the rest of the rocksdb code. As per-my bench the virtual call doesn't cause noticeable regression, but if it does I'll come back to fix.

@@ -705,6 +710,8 @@ class BlockBasedTableIterator : public InternalIteratorBase<TValue> {
TBlockIter block_iter_;
bool block_iter_points_to_real_block_;
bool is_out_of_bound_ = false;
// Whether current data block being fully within iterate upper bound.
bool data_block_within_upper_bound_ = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be default false to be safe?

Copy link
Contributor Author

@yiwu-arbug yiwu-arbug May 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neither one is safer to me:

  1. when used to check if next block needs to be read, setting to true is safer, it allow the next block to be read.
  2. when used to check whether upper bound needs to be checked, setting to false is safer, it force upper bound to be checked.

The initial value shouldn't be used anyway, but I don't know a way to assert that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then it's better to find why it failed in the first place then. It introduces a potential bug. When you try to reproduce, which flavor did you run? Did you run ASAN?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, I don't understand 1. If the value is false, will we skip the next block? Why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the current block is not completely within upper bound, the next block will be completely out of upper bound, so it's safe to skip.

@yiwu-arbug
Copy link
Contributor Author

@siying was the test fixed?

@siying
Copy link
Contributor

siying commented May 16, 2019

Yes it is fixed.

Copy link
Contributor

@siying siying left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read the code and suspect that the failure happens in read_options_.iterate_upper_bound == nullptr case.

@yiwu-arbug
Copy link
Contributor Author

I read the code and suspect that the failure happens in read_options_.iterate_upper_bound == nullptr case.

I think so. Do you prefer null-checking read_options_.iterate_upper_bound in every NextAndGetReesult()?

@siying
Copy link
Contributor

siying commented May 16, 2019

@yiwu-arbug if possible, I hope we avoid extra branching for read_options.iterate_upper_bound == nullptr because in some use cases, upper bound is on and off and causing branching mis-prediction. Hope about set a value safe for return value? Internally, looks like all uses are after a upper bound null check.

@facebook-github-bot
Copy link
Contributor

@yiwu-arbug has updated the pull request. Re-import the pull request

@yiwu-arbug
Copy link
Contributor Author

@siying make sense. done.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@siying has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@siying merged this pull request in f3a7847.

@siying
Copy link
Contributor

siying commented May 17, 2019

Landed. Thank you for your contribution!

@yiwu-arbug
Copy link
Contributor Author

Thanks much for the help to get the tricky patch merged!

@yiwu-arbug yiwu-arbug deleted the iter_bound branch May 17, 2019 18:28
ltamasi added a commit to ltamasi/rocksdb that referenced this pull request Jun 7, 2019
…bleIterator

Summary:
PR facebook#5111 reduced the number of key comparisons when iterating with
upper/lower bounds; however, this caused a regression for MyRocks.
Reverting to the previous behavior as a hotfix.

Test Plan:
make check + affected MyRocks test
facebook-github-bot pushed a commit that referenced this pull request Jun 7, 2019
…bleIterator (#5428)

Summary:
PR #5111 reduced the number of key comparisons when iterating with
upper/lower bounds; however, this caused a regression for MyRocks.
Reverting to the previous behavior in BlockBasedTableIterator as a hotfix.
Pull Request resolved: #5428

Differential Revision: D15721038

Pulled By: ltamasi

fbshipit-source-id: 5450106442f1763bccd17f6cfd648697f2ae8b6c
ltamasi added a commit to ltamasi/rocksdb that referenced this pull request Jun 11, 2019
facebook-github-bot pushed a commit that referenced this pull request Jun 11, 2019
…5111)" (#5440)

Summary:
This reverts commit f3a7847.
Pull Request resolved: #5440

Differential Revision: D15765967

Pulled By: ltamasi

fbshipit-source-id: d027fe24132e3729289cd7c01857a7eb449d9dd0
vagogte pushed a commit to vagogte/rocksdb that referenced this pull request Jun 18, 2019
…5111)

Summary:
Previously if iterator upper/lower bound presents, `DBIter` will check the bound for every key. This patch turns the check into per-file or per-data block check when applicable, by checking against either file largest/smallest key or block index key.
Pull Request resolved: facebook#5111

Differential Revision: D15330061

Pulled By: siying

fbshipit-source-id: 8a653fe3cd50d94d81eb2d13b087326c58ee2024
vagogte pushed a commit to vagogte/rocksdb that referenced this pull request Jun 18, 2019
…bleIterator (facebook#5428)

Summary:
PR facebook#5111 reduced the number of key comparisons when iterating with
upper/lower bounds; however, this caused a regression for MyRocks.
Reverting to the previous behavior in BlockBasedTableIterator as a hotfix.
Pull Request resolved: facebook#5428

Differential Revision: D15721038

Pulled By: ltamasi

fbshipit-source-id: 5450106442f1763bccd17f6cfd648697f2ae8b6c
vagogte pushed a commit to vagogte/rocksdb that referenced this pull request Jun 18, 2019
…acebook#5111)" (facebook#5440)

Summary:
This reverts commit f3a7847.
Pull Request resolved: facebook#5440

Differential Revision: D15765967

Pulled By: ltamasi

fbshipit-source-id: d027fe24132e3729289cd7c01857a7eb449d9dd0
facebook-github-bot pushed a commit that referenced this pull request Jul 2, 2019
…pt) (#5468)

Summary:
This is a second attempt for #5111, with the fix to redo iterate bounds check after `SeekXXX()`. This is because MyRocks may change iterate bounds between seek.

See #5111 for original benchmark result and discussion.

Closes #5463.
Pull Request resolved: #5468

Test Plan: Existing rocksdb tests, plus myrocks test `rocksdb.optimizer_loose_index_scans` and `rocksdb.group_min_max`.

Differential Revision: D15863332

fbshipit-source-id: ab4aba5899838591806b8673899bd465f3f53e18
merryChris pushed a commit to merryChris/rocksdb that referenced this pull request Nov 18, 2019
…pt) (facebook#5468)

Summary:
This is a second attempt for facebook#5111, with the fix to redo iterate bounds check after `SeekXXX()`. This is because MyRocks may change iterate bounds between seek.

See facebook#5111 for original benchmark result and discussion.

Closes facebook#5463.
Pull Request resolved: facebook#5468

Test Plan: Existing rocksdb tests, plus myrocks test `rocksdb.optimizer_loose_index_scans` and `rocksdb.group_min_max`.

Differential Revision: D15863332

fbshipit-source-id: ab4aba5899838591806b8673899bd465f3f53e18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants