Restarting KSQL with active transient queries can leave orphaned internal topics #4009

big-andy-coates · 2019-12-02T11:24:08Z

Describe the bug
It may be the case that restarting KSQL, while there are running push/transient query that uses internal changelog or repartition topics, results in the internal topics not being deleted

This has been removed on the community slack:

https://confluentcommunity.slack.com/archives/C6UJNMY67/p1574980967041200

If this is the case, it's probably because we're not close()ing transient queries on shutdown, which would clean up these internal topics. This would fix half the problem.

Of course, we may still have transient topics left around if a node crashes or is killed while there are transient queries. At the moment, a specific node would not be able to clean these up on restart as they could belong to another node. We'd need to encode a node identify into the topic name to allow a restarted node to clean up orphaned topics.

The text was updated successfully, but these errors were encountered:

progocz · 2019-12-28T21:14:12Z

We could also add support for transient topics to kafka in order to properly solve the issue.

big-andy-coates · 2020-01-10T18:45:57Z

Did some testing. Internal topics are generally cleaned up, but occasionally not. Looks to be some kind of race condition.

Regardless, on hard-kill / crash they will not be, so need some way of cleaning up. e.g. store set
of transient queries in topic/table. Use this to clean up on restart?

apurvam · 2020-07-30T00:17:16Z

I'm dropping this off the milestone, as I don't think we are going to work on it: it's just been carried over blindly.

rodesai · 2020-10-08T07:19:36Z

I think we should consider prioritizing this in 0.14 or 0.15 as we've seen numerous leaks on our soak cluster (see #6360). I'll mark as needs-triage again so we can discuss.

apurvam · 2020-10-14T17:39:12Z

I think we should do this for 0.15.

big-andy-coates added the bug label Dec 2, 2019

big-andy-coates mentioned this issue Jan 10, 2020

Confirm / deny that queries get stuck in PENDING_SHUTDOWN if topics deleted from under them. #4267

Closed

derekjn added this to the 0.11.0 milestone Jun 12, 2020

agavra added the P0 Denotes must-have for a given milestone label Jun 24, 2020

vcrfxia assigned rodesai Jun 24, 2020

vcrfxia modified the milestones: 0.11.0, 0.12.0 Jul 13, 2020

apurvam unassigned rodesai Jul 30, 2020

apurvam removed this from the 0.12.0 milestone Jul 30, 2020

rodesai mentioned this issue Oct 8, 2020

ksqlDB leaks state stores for transient queries #6360

Closed

rodesai added the needs-triage label Oct 8, 2020

apurvam added this to the 0.15.0 milestone Oct 14, 2020

apurvam removed the needs-triage label Oct 14, 2020

rodesai assigned AlanConfluent Nov 10, 2020

AlanConfluent mentioned this issue Dec 2, 2020

fix: Removes orphaned topics from transient queries #6714

Merged

2 tasks

AlanConfluent closed this as completed in #6714 Dec 10, 2020

AlanConfluent mentioned this issue Jun 23, 2021

Cleanup abandoned state stores on startup #7720

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Restarting KSQL with active transient queries can leave orphaned internal topics #4009

Restarting KSQL with active transient queries can leave orphaned internal topics #4009

big-andy-coates commented Dec 2, 2019

progocz commented Dec 28, 2019

big-andy-coates commented Jan 10, 2020

apurvam commented Jul 30, 2020

rodesai commented Oct 8, 2020

apurvam commented Oct 14, 2020

Restarting KSQL with active transient queries can leave orphaned internal topics #4009

Restarting KSQL with active transient queries can leave orphaned internal topics #4009

Comments

big-andy-coates commented Dec 2, 2019

progocz commented Dec 28, 2019

big-andy-coates commented Jan 10, 2020

apurvam commented Jul 30, 2020

rodesai commented Oct 8, 2020

apurvam commented Oct 14, 2020