-
Notifications
You must be signed in to change notification settings - Fork 3.5k
-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[bug] drop/delete terribly slow #9636
Comments
It definitely shouldn't be taking that long. Can you gather some profiles while the slow delete is in progress?
The workaround I would try is to break out the regular expression, like running If you run the DROP calls as separate statements, you may trigger a compaction on each run. |
I will try to provide the details You need tomorrow. Can I somehow turn compaction off? I also have a problem with influx db copy program (that I've. written myself months ago). It now takes about 3-10x times more than it was taking on v1.3.x. I'll be investigating this tomorrow, for now the 3x-10x more time is due to mutiple "timeout error" that I'm receiving and retrying (my copy program deals with this). I suspect the problem is due to the number of threads - I'm using 48 threads and http-timeout = 300s. And just after about 300s of data copying I'm staring to receive "timeout error" Is there a way to tell This the influx db backup tool. It can copy one influx database to another local/remote etc: I need such a tool becausewhen I'm doing a full data regenerate, I need to save it to the temp database first (it takes hours), not to disturb original database which is used by Grafana. The better option would be to generate temp db and the just rename it, but unfotunatelly Influx db is not supporting this too :-( |
Uploaded requested profiles: While running |
Delete still running (already 19 minutes) it was at least 40x faster in v1.3.x. |
I don't believe that is supported. Maybe setting max-concurrent-compactions=1 would suffice here. https://docs.influxdata.com/influxdb/v1.5/administration/config/#max-concurrent-compactions-0
Overwriting existing points is expensive and should generally be avoided if possible; otherwise it should happen in bulk if possible.
I suspect you'll get much better performance with using the native backup and restore tools.
Default should be unlimited: https://docs.influxdata.com/influxdb/v1.5/administration/config/#max-connection-limit-0
Thanks, this will be very helpful in identifying the performance bottleneck. @rbetts care to prioritize/assign this? |
@lukaszgryglicki Thanks for the very complete report. We'll see triage this in an upcoming grooming session. |
BTW: I've killed the query after 40 minutes and implemented alternate solution:
https://github.com/cncf/devstats/blob/master/cmd/z2influx/z2influx.go |
|
https://docs.influxdata.com/influxdb/v1.5/administration/backup_and_restore/ |
I'll double check tomorrow then but a quick question:
Copy A_temp to A should be orders of magnitude faster than just generating A (to avoid downtime). Is this feature new in v1.5 ? |
Maybe |
Seems like regexps are also very slow.
|
I am having a similar issue where I need to delete lots of measurements (malformed graphite template led to 100k measurements being created). Deleting an individual measurement takes between 20s and 2m when it finishes at all (sometimes it just hangs). I am unable to drop by regex (I get an error: The system is reasonably loaded, using 50% of system ram and ~70% load. This seems entirely related to the number of measurements, not the amount of data. These operations yesterday were fast (less than 5 seconds, but not instant) when I had 50 measurements. I am not writing any data to the new bugged measurements, so my data has not grown much, but the number of measurements has. I went from ~50 measurements to ~110000. Since they are raw graphite items, their patterns are easy to regex, but if I was to delete the measurements individually it would take weeks. Requesting the debug profiles using the command listed above takes 30s to start (the curl stats show no activity for 30s, then the download happens). This feels like the kind of latency that is happening during other operations (it just feels very slow and laggy). Due to that (and the lack of regex drop), I can't dump all three stats during the same measurement delete operation, so i just start the next one as soon as I can. As a note, the vars download is not slow. Here is an archive of my profile data: System Info (aws ec2 m5.large):
|
We've replaced influx with PostgreSQL and delete is instant. |
This is kind of drifting off-topic, but how do you deal with the dynamic schema benefits of influxdb? Do you just not index the tags and store all measurements as strings? We have a fairly complex pile of pre-existing metrics and developing a sql schema for all of them in a reasonable way would be very difficult (unless we just crudely stored them as txt or something); the benefit of influxdb (and other storage systems like it) is that it is specific to the problem of metrics storage and querying (as opposed to general purpose databasing). |
I've implemented something that dynamically creates tables/columns/indices as needed. Took more than week, but it works great and is faster. |
I can reproduce this bug, but not consistently. Created a database with 100000 measurements and about 8 million series. I have twice seen Delete from 1 measurement:
Delete from about 9 measurements:
Delete from about 90 measurements:
Delete from about 900 measurements (gave up, ctrl-C after ~5m):
Adding some debug logging to |
The deadlock is on |
Opening as there is still possibly a performance issue. |
Related to #10056 |
|
@jacobmarble do you have an update about this investigation/bug fix ? |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
don't close. |
I am using 1.7.7 and the issue still persists. Deleting a single measurement with 1mio series takes forever. iostat shows disk is not of a very high utilization
|
Still present in 1.7.8. |
It's not just the delete that is slow, but the garbage collection and/or compaction that happens after the delete is terrible. I deleted a couple hundred measurements and the load jumped 6x for hour+. Memory maxed. IO wait is 50% or so of all CPU time (cloud resources!). |
If I remember correctly, we decided in 1.5 to block on delete, rather than accept the delete request and handle it asynchronously. My earlier analysis probably reflects this. So "that's a feature, not a bug" is my offhand comment for that. @bryanspears have you tried limiting concurrent compactions? Where your bulk delete operation likely touched several shards, and those shards all require multiple levels of compaction, limiting concurrency has helped other folks in the past. Start with limiting to 1. https://docs.influxdata.com/influxdb/v1.7/administration/config/#max-concurrent-compactions-0 |
Possibly related to #15271 |
@jacobmarble your suggestion has stabilized our small influx setup. For anyone else using limited I/O, cloud-based resources. These configuration options helped dramatically. Compaction does take longer, but that's better than the alternative crash that was occurring for us.
|
@bryanspears I'm glad to know that limiting concurrent compactions helped. In many situations, compaction duration doesn't cost anything, and slower compactions free resources for queries and writes. For anyone else reading, when dealing with this sort of write-vs-read contention, I suggest
Be aware that setting Another way to help WAL I/O is to write in batches, say 5000 points per batch. This way, you can let it |
This issue was the first hit for a search about slow I'm optimizing for disk space, so I very much would like to delete this temporary database. |
I am cleaning up some invalid values from my measurement in python code. Table has data for a year with probably one minute resolution.
Query is tremendously slow. i am thinking if it is possible to pass an array of times to make index rebuild only once. Unfortunately next query was not working last time I tried.
|
Experiencing the same difficulties here with InfluxDB v1.8, it takes more than 25min to drop 1 measurement where data is stored over 3 shards of 4 weeks duration. I have 42k measurements to delete it's not acceptable for our production platform, i'm gonna need to find a trick. |
Wow, over 3 years passed since I've reported this and it's not fixed... |
Not sure how related, but I've got a measurement with a single point on influxdb 1.8.5 and when I run |
I am also getting the same issue, even I have only 7-8 measurements and none of measurement have more than 1500 records in it. Delete command : curl -X POST iostat.txt I am at the default configuration so please let me know if I need to do set some configurations |
We also want to drop measurements of hosts which do not exist anymore on influxdb v1.8.5. But after dropping a few of them the database just locks up and does not accept further drop statements. Then we have to wait up to a day until it is available again. Would be nice to get this finally solved. Also does someone have a clue if this issue persists in version 2.x? |
experiencing this issue as well. we had a problem a while back which caused a measurement to be created with about 300k series. the new measurement is small, only about 1m rows, but trying to drop the measurement hangs forever. the cardinality of the database is blown up now (before this measurement, it was around 20k) and causing all kinds of problems. i'm not even able to successfully drop any series from this measurement now. for example, a drop series query filtering on just 10 series hangs forever. nothing useful in the logs. we're on v1.8.6 and TSI index. this issue is also causing our db to take about an hour to startup... any idea here? |
Having this issue too, influx 1.8.10. In my case I'm using a query like:
|
|
Bug report
Influx version: v1.5.1
OS version: Ubuntu Linux 17.04
Steps to reproduce:
drop series from /{{regexp_pattern}}/
.drop measurement /{{regexp_pattern}}/
.delete from /{{regexp_pattern}}/
.It will take > 30 minutes.
Expected behavior:
The text was updated successfully, but these errors were encountered: