quick fix to sort issue #87

mschoch · 2021-12-27T19:14:14Z

It has been observed that in some cases the computed sort value
for a DocumentMatch becomes corrupted. The problem has been
traced back to the doc values uncompressed slices, but for now
a quick fix is proposed to copy the bytes associate with the
sort key, ensuring that no other doc values operations can
corrupt them.

It has been observed that in some cases the computed sort value for a DocumentMatch becomes corrupted. The problem has been traced back to the doc values uncompressed slices, but for now a quick fix is proposed to copy the bytes associate with the sort key, ensuring that no other doc values operations can corrupt them.

mschoch · 2021-12-27T19:16:22Z

This is a quick fix that addresses the problem I found (though I do not yet have simple testcase to add).

It's not clear if this is the best fix, but it should be safe, as we just create an additional copy of the data we care about. There should will also be some slight perf hit caused by the additional copy of sort keys.

tmm1 · 2021-12-27T19:20:37Z

LGTM as a workaround

voldyman

Looks good for fix, how will this impact memory usage during a query execution though?

mschoch · 2022-01-04T15:13:23Z

Yesterday I spoke with @sreekanth-cb from the Bleve project to better understand why this issue seemed to affect Bluge and not Bleve. The key change has to do with the data type used for the sort key inside a match. In Bleve, sort keys are []string and during the development of Bluge I switched this to [][]byte. The thinking was that since Bluge is more focused on Go application developers, and not forcing decisions which can be made by the application, we should avoid the conversion to string. Also, there was an obvious performance gain as well (avoid unnecessary copies). However, what I didn't realize at the time was that this conversion to a string was not just an API convenience for Bleve, it was also ensuring correct behavior by copying a value that required copying.

So, with this deeper understanding of the problem, it now seems that this is the correct fix given the current design (doc values are made available inside a callback, and are only valid for the duration of the callback)

@voldyman fortunately this is somewhat easy to do back-of-the-napkin math to try and predict the behavior. With this change, one would expect the following additional memory allocations:

number of allocations: number of search hits * number of elements in the sort key
amount allocated: number of search hits * avg sort key size in bytes

In the strict sense, it will of course perform slower than before, because additional copies are now made. However, without the fix, the sort order is easily corrupted (by design we reuse a buffer). However, these copies were already being made in the Bleve code-base, so if you're comparing against Bleve it should still be comparable.

The only other way to fix the problem would be to change the design of doc values, such that they don't require a copy. However, this would either require giving up compression of the doc values, or allocating new buffers for each decompression, both of which seem likely to cause an even larger performance regression.

mschoch · 2022-01-04T15:20:49Z

I'm going to merge this as we now have high confidence this is a real bug, and this fix restores correct behavior.

However, this fix copies all sort values, but really only doc values that are used for sorting need to be copied. This matters because some sort values are not derived from doc values, or they may originate form doc values, but produce a new safe-to-use []byte. In those cases we do not need to make an additional copy of the bytes. I will continue to investigate this, and see if there a way to offer a tighter fix.

mschoch requested review from michaeljs1990, rubiojr, tmm1 and voldyman December 27, 2021 19:16

mschoch added 2 commits December 27, 2021 18:04

add a unit test to reproduce the problem

0e1d4de

fix CI failure on Windows, closing reader

72326e3

voldyman approved these changes Dec 28, 2021

View reviewed changes

mschoch merged commit f34c4ac into master Jan 4, 2022

mschoch deleted the sort-bug-quick-fix branch January 4, 2022 15:23

mschoch added this to the v0.1.9 milestone Jan 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

quick fix to sort issue #87

quick fix to sort issue #87

mschoch commented Dec 27, 2021

mschoch commented Dec 27, 2021

tmm1 commented Dec 27, 2021

voldyman left a comment

mschoch commented Jan 4, 2022 •

edited

Loading

mschoch commented Jan 4, 2022

quick fix to sort issue #87

quick fix to sort issue #87

Conversation

mschoch commented Dec 27, 2021

mschoch commented Dec 27, 2021

tmm1 commented Dec 27, 2021

voldyman left a comment

Choose a reason for hiding this comment

mschoch commented Jan 4, 2022 • edited Loading

mschoch commented Jan 4, 2022

mschoch commented Jan 4, 2022 •

edited

Loading