Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add index blocked after injection network partition between pd leader and all other pods #53909

Closed
Lily2025 opened this issue Jun 11, 2024 · 3 comments · Fixed by #54149
Closed
Assignees
Labels
affects-8.1 affects-8.2 component/ddl This issue is related to DDL of TiDB. severity/major type/bug The issue is confirmed as a bug.

Comments

@Lily2025
Copy link

Lily2025 commented Jun 11, 2024

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

tidb_enable_dist_task='off'
1、run sysbench
2、add index for one of table
3、inject network partition between pd leader and all other pods

tidb logs:
tidb-0-2024-06-09T22-21-12.334.log.tar.gz
tidb-1-2024-06-09T22-09-07.701.log.tar.gz

2. What did you expect to see? (Required)

add index can success

3. What did you see instead (Required)

add index blocked after injection network partition between pd leader and all other pods

operator logs:
the status of ddl job is not synced after 1h0m0s (now: 2024-06-09 12:39:47, jobId: 491, job type: add index /* ingest /, state: running)
operatorLogs:
[2024-06-09 11:39:29] ###### start adding index
ALTER TABLE sbtest1 ADD INDEX index_test_1717904369842(c)
[2024-06-09 11:39:29] ###### wait for ddl job finish
[2024-06-09 12:39:47] ###### wait for ddl job finish timeout(1h0m0s)
select job_id, job_type, state from information_schema.ddl_jobs where query = 'ALTER TABLE sbtest1 ADD INDEX index_test_1717904369842(c)'
jobId: 491, job type: add index /
ingest */, state: running

4. What is your TiDB version? (Required)

./tidb-server -V
Release Version: v8.2.0-alpha
Edition: Community
Git Commit Hash: d75ff82
Git Branch: heads/refs/tags/v8.2.0-alpha
UTC Build Time: 2024-06-08 11:46:12
GoVersion: go1.21.10
Race Enabled: false
Check Table Before Drop: false
Store: unistore
2024-06-09T11:28:46.842+0800

@Lily2025 Lily2025 added the type/bug The issue is confirmed as a bug. label Jun 11, 2024
@Lily2025
Copy link
Author

/component ddl
/severity critical

@Lily2025
Copy link
Author

/assign tangenta

@tangenta
Copy link
Contributor

$ rg "DDL worker closed|owner info\"=\"\[ddl\]" tidb-0.log tidb-1.log| rg "2024/06/09 1[12]" | less

tidb-0.log:[2024/06/09 11:40:10.144 +08:00] [INFO] [manager.go:460] ["watch canceled, no owner"] ["owner info"="[ddl] ownerManager f6dee9b0-f00d-4d0c-910f-0239e104af26 watch owner key /tidb/ddl/fg/owner/23be8ffa4b9d8a51"]
tidb-0.log:[2024/06/09 11:40:10.144 +08:00] [INFO] [manager.go:232] ["retire owner"] ["owner info"="[ddl] /tidb/ddl/fg/owner ownerManager f6dee9b0-f00d-4d0c-910f-0239e104af26"]
tidb-0.log:[2024/06/09 11:40:15.846 +08:00] [INFO] [ddl_worker.go:188] ["DDL worker closed"] [worker="worker 3, tp add index"] [category=ddl] ["take time"=3.591µs]
tidb-1.log:[2024/06/09 11:30:06.164 +08:00] [INFO] [manager.go:301] ["failed to campaign"] ["owner info"="[ddl] /tidb/ddl/fg/owner ownerManager c6f823db-400a-4217-9ff0-0acd05dae2a9"] [error="etcdserver: mvcc: required revision has been compacted"]

There has been no ddl owner since the fault injection, because OnRetireOwner is blocked by jobScheduler.close(). That means jobScheduler cancels the context, but add index worker does not quit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-8.1 affects-8.2 component/ddl This issue is related to DDL of TiDB. severity/major type/bug The issue is confirmed as a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants