Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

quick fix to sort issue #87

Merged
merged 3 commits into from
Jan 4, 2022
Merged

quick fix to sort issue #87

merged 3 commits into from
Jan 4, 2022

Conversation

mschoch
Copy link
Member

@mschoch mschoch commented Dec 27, 2021

It has been observed that in some cases the computed sort value
for a DocumentMatch becomes corrupted. The problem has been
traced back to the doc values uncompressed slices, but for now
a quick fix is proposed to copy the bytes associate with the
sort key, ensuring that no other doc values operations can
corrupt them.

It has been observed that in some cases the computed sort value
for a DocumentMatch becomes corrupted.  The problem has been
traced back to the doc values uncompressed slices, but for now
a quick fix is proposed to copy the bytes associate with the
sort key, ensuring that no other doc values operations can
corrupt them.
@mschoch
Copy link
Member Author

mschoch commented Dec 27, 2021

This is a quick fix that addresses the problem I found (though I do not yet have simple testcase to add).

It's not clear if this is the best fix, but it should be safe, as we just create an additional copy of the data we care about. There should will also be some slight perf hit caused by the additional copy of sort keys.

@tmm1
Copy link
Contributor

tmm1 commented Dec 27, 2021

LGTM as a workaround

Copy link
Contributor

@voldyman voldyman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good for fix, how will this impact memory usage during a query execution though?

@mschoch
Copy link
Member Author

mschoch commented Jan 4, 2022

Yesterday I spoke with @sreekanth-cb from the Bleve project to better understand why this issue seemed to affect Bluge and not Bleve. The key change has to do with the data type used for the sort key inside a match. In Bleve, sort keys are []string and during the development of Bluge I switched this to [][]byte. The thinking was that since Bluge is more focused on Go application developers, and not forcing decisions which can be made by the application, we should avoid the conversion to string. Also, there was an obvious performance gain as well (avoid unnecessary copies). However, what I didn't realize at the time was that this conversion to a string was not just an API convenience for Bleve, it was also ensuring correct behavior by copying a value that required copying.

So, with this deeper understanding of the problem, it now seems that this is the correct fix given the current design (doc values are made available inside a callback, and are only valid for the duration of the callback)

@voldyman fortunately this is somewhat easy to do back-of-the-napkin math to try and predict the behavior. With this change, one would expect the following additional memory allocations:

number of allocations: number of search hits * number of elements in the sort key
amount allocated: number of search hits * avg sort key size in bytes

In the strict sense, it will of course perform slower than before, because additional copies are now made. However, without the fix, the sort order is easily corrupted (by design we reuse a buffer). However, these copies were already being made in the Bleve code-base, so if you're comparing against Bleve it should still be comparable.

The only other way to fix the problem would be to change the design of doc values, such that they don't require a copy. However, this would either require giving up compression of the doc values, or allocating new buffers for each decompression, both of which seem likely to cause an even larger performance regression.

@mschoch
Copy link
Member Author

mschoch commented Jan 4, 2022

I'm going to merge this as we now have high confidence this is a real bug, and this fix restores correct behavior.

However, this fix copies all sort values, but really only doc values that are used for sorting need to be copied. This matters because some sort values are not derived from doc values, or they may originate form doc values, but produce a new safe-to-use []byte. In those cases we do not need to make an additional copy of the bytes. I will continue to investigate this, and see if there a way to offer a tighter fix.

@mschoch mschoch merged commit f34c4ac into master Jan 4, 2022
@mschoch mschoch deleted the sort-bug-quick-fix branch January 4, 2022 15:23
@mschoch mschoch added this to the v0.1.9 milestone Jan 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants