-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
quick fix to sort issue #87
Conversation
It has been observed that in some cases the computed sort value for a DocumentMatch becomes corrupted. The problem has been traced back to the doc values uncompressed slices, but for now a quick fix is proposed to copy the bytes associate with the sort key, ensuring that no other doc values operations can corrupt them.
This is a quick fix that addresses the problem I found (though I do not yet have simple testcase to add). It's not clear if this is the best fix, but it should be safe, as we just create an additional copy of the data we care about. There should will also be some slight perf hit caused by the additional copy of sort keys. |
LGTM as a workaround |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good for fix, how will this impact memory usage during a query execution though?
Yesterday I spoke with @sreekanth-cb from the Bleve project to better understand why this issue seemed to affect Bluge and not Bleve. The key change has to do with the data type used for the sort key inside a match. In Bleve, sort keys are So, with this deeper understanding of the problem, it now seems that this is the correct fix given the current design (doc values are made available inside a callback, and are only valid for the duration of the callback) @voldyman fortunately this is somewhat easy to do back-of-the-napkin math to try and predict the behavior. With this change, one would expect the following additional memory allocations: number of allocations: number of search hits * number of elements in the sort key In the strict sense, it will of course perform slower than before, because additional copies are now made. However, without the fix, the sort order is easily corrupted (by design we reuse a buffer). However, these copies were already being made in the Bleve code-base, so if you're comparing against Bleve it should still be comparable. The only other way to fix the problem would be to change the design of doc values, such that they don't require a copy. However, this would either require giving up compression of the doc values, or allocating new buffers for each decompression, both of which seem likely to cause an even larger performance regression. |
I'm going to merge this as we now have high confidence this is a real bug, and this fix restores correct behavior. However, this fix copies all sort values, but really only doc values that are used for sorting need to be copied. This matters because some sort values are not derived from doc values, or they may originate form doc values, but produce a new safe-to-use []byte. In those cases we do not need to make an additional copy of the bytes. I will continue to investigate this, and see if there a way to offer a tighter fix. |
It has been observed that in some cases the computed sort value
for a DocumentMatch becomes corrupted. The problem has been
traced back to the doc values uncompressed slices, but for now
a quick fix is proposed to copy the bytes associate with the
sort key, ensuring that no other doc values operations can
corrupt them.