Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restarting KSQL with active transient queries can leave orphaned internal topics #4009

Closed
big-andy-coates opened this issue Dec 2, 2019 · 5 comments · Fixed by #6714
Closed
Assignees
Labels
bug P0 Denotes must-have for a given milestone
Milestone

Comments

@big-andy-coates
Copy link
Contributor

Describe the bug
It may be the case that restarting KSQL, while there are running push/transient query that uses internal changelog or repartition topics, results in the internal topics not being deleted

This has been removed on the community slack:

https://confluentcommunity.slack.com/archives/C6UJNMY67/p1574980967041200

If this is the case, it's probably because we're not close()ing transient queries on shutdown, which would clean up these internal topics. This would fix half the problem.

Of course, we may still have transient topics left around if a node crashes or is killed while there are transient queries. At the moment, a specific node would not be able to clean these up on restart as they could belong to another node. We'd need to encode a node identify into the topic name to allow a restarted node to clean up orphaned topics.

@progocz
Copy link

progocz commented Dec 28, 2019

We could also add support for transient topics to kafka in order to properly solve the issue.

@big-andy-coates
Copy link
Contributor Author

Did some testing. Internal topics are generally cleaned up, but occasionally not. Looks to be some kind of race condition.

Regardless, on hard-kill / crash they will not be, so need some way of cleaning up. e.g. store set
of transient queries in topic/table. Use this to clean up on restart?

@derekjn derekjn added this to the 0.11.0 milestone Jun 12, 2020
@agavra agavra added the P0 Denotes must-have for a given milestone label Jun 24, 2020
@vcrfxia vcrfxia modified the milestones: 0.11.0, 0.12.0 Jul 13, 2020
@apurvam apurvam removed this from the 0.12.0 milestone Jul 30, 2020
@apurvam
Copy link
Contributor

apurvam commented Jul 30, 2020

I'm dropping this off the milestone, as I don't think we are going to work on it: it's just been carried over blindly.

@rodesai
Copy link
Contributor

rodesai commented Oct 8, 2020

I think we should consider prioritizing this in 0.14 or 0.15 as we've seen numerous leaks on our soak cluster (see #6360). I'll mark as needs-triage again so we can discuss.

@apurvam
Copy link
Contributor

apurvam commented Oct 14, 2020

I think we should do this for 0.15.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug P0 Denotes must-have for a given milestone
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants