Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: don't use queryId of last terminate command after restore #6278

Merged
merged 8 commits into from
Sep 24, 2020

Conversation

vpapavas
Copy link
Member

@vpapavas vpapavas commented Sep 22, 2020

Description

Fixes #6010
The problem is that if the last command before a restart was a terminate, then the first command after the restart gets the same queryId as the terminate. Moreover, if the command topic contains only terminate commands, the next queryId will be 0. To avoid this, we always increase the queryId for every command, whether it has a plan or not

Testing done

Unit tests.

Manual testing

ksql> CREATE STREAM A (COLUMN STRING) WITH (KAFKA_TOPIC='A', VALUE_FORMAT='JSON', partitions=1);

 Message
----------------
 Stream created
----------------
ksql> CREATE STREAM B AS SELECT * FROM A;

 Message
--------------------------------
 Created query with ID CSAS_B_3
--------------------------------
ksql> CREATE STREAM C AS SELECT * FROM A;

 Message
--------------------------------
 Created query with ID CSAS_C_5
--------------------------------
ksql> terminate CSAS_B_3;

 Message
-------------------
 Query terminated.
-------------------
ksql> terminate CSAS_C_5;

 Message
-------------------
 Query terminated.
-------------------

Restart

ksql> CREATE STREAM D AS SELECT * FROM A;

 Message
--------------------------------
 Created query with ID CSAS_D_7
--------------------------------

Reviewer checklist

  • Ensure docs are updated if necessary. (eg. if a user visible feature is being added or changed).
  • Ensure relevant issues are linked (description should include text like "Fixes #")

@stevenpyzhang
Copy link
Member

@vpapavas can you try this test scenario? I'm not sure this is fixing the issue.

Delete any existing command topic
Startup server

create stream test as select * from KSQL_PROCESSING_LOG;
terminate CSAS_TEST_0;
drop stream test;

Stop the server. Start the server up.

create stream test as select level from KSQL_PROCESSING_LOG;

I think even with this change, we'd duplicate the CSAS_TEST_0 as the QueryId

Copy link
Contributor

@agavra agavra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix Vicky! The test coverage gives me more confidence that this works.

The change, however, I think is fixing the issue with a band-aid instead of fixing it at the root. We're adding more edge cases to the code and making the queryID generation more difficult to reason about. I think we should look for another solution that simplifies queryID generation instead of adding more conditions. Let's chat offline if you need some help coming up with solutions.

Copy link
Contributor

@agavra agavra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! It looks like you accidentally left some of the previous commit's work (handleRestore). Just for anyone reading this discussion, I have thought about whether this is backwards compatible and i think it is because any old commands that used the old numbering (e.g. the first query was always _0) will have their queryIds persisted. It might break some people's automated tests in the worst case

@vpapavas vpapavas merged commit 2753ccd into confluentinc:master Sep 24, 2020
@vpapavas vpapavas deleted the issue-6010 branch January 26, 2022 11:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Query Id doesn't use correct offset if the last query in command topic has an associated terminate command
3 participants