fix: Bypass window store cache when doing windowed pull queries #6548

AlanConfluent · 2020-10-29T23:58:39Z

Description

Windowed pull queries have very mediocre performance. After lots of investigation, it was clear that this was due to the use of the streams cache. We experimented with disabling the cache, and performance is good for pull queries. The issue is that if we did this, then other areas such as persistent queries suffer in their performance.

This PR aims to disable the use of the streams cache for only windowed pull queries. There was a lot of discussion over whether a public API should be exposed in Streams to bypass the cache during a state store lookup. At the moment they don't want to do this with the existing API. Next public API will expose bypassing the cache.

In order to give ksqlDB good performance for all pull query types, it was decided to use some reflection to bypass the lack of proper public APIs and skip the caching layers.

Testing done

Ran local unit tests. Ran RestQueryTranslationTests. Ran pull query benchmarks for windowed queries.

Reviewer checklist

Ensure docs are updated if necessary. (eg. if a user visible feature is being added or changed).
Ensure relevant issues are linked (description should include text like "Fixes #")

…l query performance

guozhangwang

Overall LGTM. Maybe we can only answer whether the performance is similar to the case with caching disabled via config directly by doing some benchmarks. I'm just wondering if calling reflection per-fetch call would be too costly or is it negligible. If benchmark shows latter case, we can consider caching the provider / type along with the CompositeReadOnlyXXStore (actually, does that work without changing streams code? anyways, we'll see if it is really necessary)

vvcephei

Thanks for this, @AlanConfluent ! It looks good to me. I just had a couple of minor comments.

vvcephei · 2020-10-30T14:24:37Z

...ain/java/io/confluent/ksql/execution/streams/materialization/ks/SessionStoreCacheBypass.java

+        if (!(store instanceof SessionStore)) {
+          break;
+        }


This is interesting; why does it happen?

...Ah, I see. It's because the RocksDBSessionStore wraps a SegmentedBytesStore.

The rest seems fairly obvious, but this one might be subtle enough to warrant a comment.

Yes, it's exactly as you mentioned. Stopping at the last SessionStore is sufficient since it is below the cache and still has the expected interface for fetching. Added a few comments about that.

vvcephei · 2020-10-30T14:34:53Z

.../java/io/confluent/ksql/execution/streams/materialization/ks/WindowStoreCacheBypassTest.java

+  }
+
+  @Test
+  public void shouldAvoidNonWindowStore() throws IllegalAccessException {


It looks like this test verifies that we bottom out at a wrapped store when it doesn't wrap a WindowStore, right?
It seems like a similar test could positively verify that we actually do skip the caching layer, or any other wrapped layer, but I didn't see that test. Did I miss it?

The logic was technically cache agnostic. So long as the last WindowStore layer is not a cache, which we know it wouldn't be since the caching layer is built upon that, this check would hopefully be sufficient.

I changed the bypassing logic so that it verifies that it actually passes the caching layer when it's run and the tests verify that as well.

Actually, I removed the bypass check because it's possible that the user can configure it without a cache. I don't really want to try to reproduce the logic for determining if cache is enabled, so I removed this bypass check.

AlanConfluent · 2020-10-31T00:28:15Z

Maybe we can only answer whether the performance is similar to the case with caching disabled via config directly by doing some benchmarks. I'm just wondering if calling reflection per-fetch call would be too costly or is it negligible. If benchmark shows latter case, we can consider caching the provider / type along with the CompositeReadOnlyXXStore (actually, does that work without changing streams code? anyways, we'll see if it is really necessary)

From my benchmarking, the reflection doesn't appear to have a significant effect on performance. I think the performance is dominated by other factors, namely query parsing, and actually doing the reads from rocksdb. I'm hitting more or less the same numbers as with caching disabled.

On our standard single node windowed benchmark, we can do about 1800qps, and in benchmarks with this change, I'm seeing 1850 qps on my latest run, so the overhead is well within the noise.

guozhangwang · 2020-10-31T02:07:25Z

On our standard single node windowed benchmark, we can do about 1800qps, and in benchmarks with this change, I'm seeing 1850 qps on my latest run, so the overhead is well within the noise.

That's great!!

vpapavas · 2020-11-02T02:48:32Z

ksqldb-rest-app/src/test/java/io/confluent/ksql/rest/integration/PullQueryFunctionalTest.java

@@ -129,6 +129,7 @@
      .withProperty(KsqlRestConfig.ADVERTISED_LISTENER_CONFIG, "http://localhost:8188")
      .withProperty(KsqlConfig.KSQL_QUERY_PULL_ENABLE_STANDBY_READS, true)
      .withProperty(KsqlConfig.KSQL_STREAMS_PREFIX + "num.standby.replicas", 1)
+      .withProperty(KsqlConfig.KSQL_STREAMS_PREFIX + "cache.max.bytes.buffering", 10000)


Why did you add these?

I just added them to one of the pull query tests because I wanted to exercise the cache bypassing code in a functional test. It's hard to set up a realistic scenario in unit tests since it's not easy to construct all of the state store layers as they actually exist.

If something basic goes wrong in the reflection or types with underlying streams when faced with the cache, these tests should fail. The real test for this will be the automated benchmarks which test that this bypassing effect works as intended.

Ok, makes sense

vpapavas

Thank you Alan! LGTM!

AlanConfluent added 5 commits October 27, 2020 16:29

fix: Removes the streams cache for window stores since this hurts pul…

7efd454

…l query performance

Checkpoint that kina works, added session too

a95014a

Changes to new methodology

38fc464

Removes old code and gets style validated

1fce993

Adds more tests

5062bba

AlanConfluent requested review from vvcephei, vpapavas and guozhangwang October 29, 2020 23:58

AlanConfluent requested a review from a team as a code owner October 29, 2020 23:58

guozhangwang approved these changes Oct 30, 2020

View reviewed changes

vvcephei approved these changes Oct 30, 2020

View reviewed changes

AlanConfluent added 2 commits October 30, 2020 11:20

Remove unused field in test

81089d8

Feedback again

7cfe7de

AlanConfluent force-pushed the remove_window_store_cache branch from 0aa7346 to 7cfe7de Compare October 30, 2020 23:44

vpapavas reviewed Nov 2, 2020

View reviewed changes

vpapavas approved these changes Nov 2, 2020

View reviewed changes

AlanConfluent changed the title ~~Bypass window store cache when doing windowed pull queries~~ fix: Bypass window store cache when doing windowed pull queries Nov 2, 2020

AlanConfluent merged commit 8f84e41 into confluentinc:master Nov 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Bypass window store cache when doing windowed pull queries #6548

fix: Bypass window store cache when doing windowed pull queries #6548

AlanConfluent commented Oct 29, 2020 •

edited

Loading

guozhangwang left a comment

vvcephei left a comment

vvcephei Oct 30, 2020

AlanConfluent Oct 30, 2020

vvcephei Oct 30, 2020

AlanConfluent Oct 30, 2020

AlanConfluent Oct 30, 2020

AlanConfluent commented Oct 31, 2020 •

edited

Loading

guozhangwang commented Oct 31, 2020

vpapavas Nov 2, 2020

AlanConfluent Nov 2, 2020

vpapavas Nov 2, 2020

vpapavas left a comment

fix: Bypass window store cache when doing windowed pull queries #6548

fix: Bypass window store cache when doing windowed pull queries #6548

Conversation

AlanConfluent commented Oct 29, 2020 • edited Loading

Description

Testing done

Reviewer checklist

guozhangwang left a comment

Choose a reason for hiding this comment

vvcephei left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AlanConfluent commented Oct 31, 2020 • edited Loading

guozhangwang commented Oct 31, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vpapavas left a comment

Choose a reason for hiding this comment

AlanConfluent commented Oct 29, 2020 •

edited

Loading

AlanConfluent commented Oct 31, 2020 •

edited

Loading