Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support of multi vector in jni #1364

Merged

Conversation

heemin32
Copy link
Collaborator

@heemin32 heemin32 commented Dec 28, 2023

Description

With this PR, QueryIndex and QueryIndex_WithFilter JNI method can take a list of parentId to dedupe search result per parent. To do that, custom result collector is implemented which will be passed as HNSW search parameter to faiss library.

Issues Resolved

N/A

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed as per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@jmazanec15
Copy link
Member

@heemin32 Can you give detailed summary on what is being changed in PR description?

jni/include/knn_extension/faiss/utils/BitSet.h Outdated Show resolved Hide resolved
jni/include/knn_extension/faiss/utils/BitSet.h Outdated Show resolved Hide resolved
jni/include/knn_extension/faiss/utils/BitSet.h Outdated Show resolved Hide resolved
jni/include/knn_extension/faiss/utils/BitSet.h Outdated Show resolved Hide resolved
jni/src/knn_extension/faiss/utils/BitSet.cpp Outdated Show resolved Hide resolved
jni/src/knn_extension/faiss/utils/BitSet.cpp Outdated Show resolved Hide resolved
jni/src/knn_extension/faiss/utils/BitSet.cpp Outdated Show resolved Hide resolved
@heemin32 heemin32 force-pushed the multi-vector branch 2 times, most recently from d6b2116 to 7a00cab Compare December 29, 2023 19:40
Copy link

codecov bot commented Dec 29, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (7b47bae) 85.10% compared to head (411dbfc) 85.10%.

Additional details and impacted files
@@                   Coverage Diff                   @@
##             feature/multi-vector    #1364   +/-   ##
=======================================================
  Coverage                   85.10%   85.10%           
  Complexity                   1251     1251           
=======================================================
  Files                         162      162           
  Lines                        5101     5101           
  Branches                      477      477           
=======================================================
  Hits                         4341     4341           
  Misses                        554      554           
  Partials                      206      206           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

jni/tests/knn_extension/faiss/utils/HeapTest.cpp Outdated Show resolved Hide resolved
jni/include/knn_extension/faiss/utils/BitSet.h Outdated Show resolved Hide resolved
jni/include/knn_extension/faiss/utils/BitSet.h Outdated Show resolved Hide resolved
jni/include/knn_extension/faiss/utils/Heap.h Outdated Show resolved Hide resolved
uint64_t word = this->bitmap[i] >> (index & 63); // Equivalent of bitmap[i] >> (index % 64)

if (word != 0) {
return index + __builtin_ctzll(word);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comment around usage of this? What does word represent? It looks like its the 64-bit int that is like the "bit page" shifted over offset bits.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Take a look at the comment in Heap.h file and let me know if you still need some comment here to understand the code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure which comment in Heap.h you are referring

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean

 * Here a block is 64 bit. However, for simplicity let's assume its size is 8 bits.
 * Then, if have an array of 3, 7, and 10, it will be represented in bitmap as follow.
 *            [0]      [1]
 * bitmap: 10001000 00000100
 *
 * for next_set_bit call with 4
 * 1. it looks for bitmap[0]
 * 2. bitmap[0] >> 4
 * 3. count trailing zero of the result from step 2 which is 3
 * 4. return 4(current index) + 3(result from step 3)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats not in Heap.h, but makes sense

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry. BitSet.h :)

Signed-off-by: Heemin Kim <heemin@amazon.com>
uint64_t word = this->bitmap[i] >> (index & 63); // Equivalent of bitmap[i] >> (index % 64)

if (word != 0) {
return index + __builtin_ctzll(word);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure which comment in Heap.h you are referring

jni/src/knn_extension/faiss/utils/BitSet.cpp Show resolved Hide resolved
jni/include/knn_extension/faiss/utils/Heap.h Show resolved Hide resolved
@heemin32 heemin32 merged commit df6d1fa into opensearch-project:feature/multi-vector Jan 3, 2024
23 of 47 checks passed
@heemin32 heemin32 deleted the multi-vector branch January 3, 2024 21:52
heemin32 added a commit to heemin32/k-NN that referenced this pull request Jan 5, 2024
Signed-off-by: Heemin Kim <heemin@amazon.com>
heemin32 added a commit to heemin32/k-NN that referenced this pull request Jan 16, 2024
Signed-off-by: Heemin Kim <heemin@amazon.com>
heemin32 added a commit that referenced this pull request Jan 16, 2024
Signed-off-by: Heemin Kim <heemin@amazon.com>
heemin32 added a commit that referenced this pull request Jan 19, 2024
* Add patch to support multi vector in faiss (#1358)

Signed-off-by: Heemin Kim <heemin@amazon.com>

* Initialize id_map as null (#1363)

Signed-off-by: Heemin Kim <heemin@amazon.com>

* Add support of multi vector in jni (#1364)

Signed-off-by: Heemin Kim <heemin@amazon.com>

* Multi vector support for Faiss HNSW (#1371)

Apply the parentId filter to the Faiss HNSW search method. This ensures that documents are deduplicated based on their parentId, and the method returns k results for documents with nested fields.

Signed-off-by: Heemin Kim <heemin@amazon.com>

* Add data generation script for nested field (#1388)

Signed-off-by: Heemin Kim <heemin@amazon.com>

* Add perf test for nested field (#1394)

Signed-off-by: Heemin Kim <heemin@amazon.com>

---------

Signed-off-by: Heemin Kim <heemin@amazon.com>
heemin32 added a commit that referenced this pull request Jan 19, 2024
Signed-off-by: Heemin Kim <heemin@amazon.com>
heemin32 added a commit to heemin32/k-NN that referenced this pull request Jan 19, 2024
* Add patch to support multi vector in faiss (opensearch-project#1358)

Signed-off-by: Heemin Kim <heemin@amazon.com>

* Initialize id_map as null (opensearch-project#1363)

Signed-off-by: Heemin Kim <heemin@amazon.com>

* Add support of multi vector in jni (opensearch-project#1364)

Signed-off-by: Heemin Kim <heemin@amazon.com>

* Multi vector support for Faiss HNSW (opensearch-project#1371)

Apply the parentId filter to the Faiss HNSW search method. This ensures that documents are deduplicated based on their parentId, and the method returns k results for documents with nested fields.

Signed-off-by: Heemin Kim <heemin@amazon.com>

* Add data generation script for nested field (opensearch-project#1388)

Signed-off-by: Heemin Kim <heemin@amazon.com>

* Add perf test for nested field (opensearch-project#1394)

Signed-off-by: Heemin Kim <heemin@amazon.com>

---------

Signed-off-by: Heemin Kim <heemin@amazon.com>
(cherry picked from commit 709b448)
heemin32 added a commit to heemin32/k-NN that referenced this pull request Jan 19, 2024
* Add patch to support multi vector in faiss (opensearch-project#1358)

Signed-off-by: Heemin Kim <heemin@amazon.com>

* Initialize id_map as null (opensearch-project#1363)

Signed-off-by: Heemin Kim <heemin@amazon.com>

* Add support of multi vector in jni (opensearch-project#1364)

Signed-off-by: Heemin Kim <heemin@amazon.com>

* Multi vector support for Faiss HNSW (opensearch-project#1371)

Apply the parentId filter to the Faiss HNSW search method. This ensures that documents are deduplicated based on their parentId, and the method returns k results for documents with nested fields.

Signed-off-by: Heemin Kim <heemin@amazon.com>

* Add data generation script for nested field (opensearch-project#1388)

Signed-off-by: Heemin Kim <heemin@amazon.com>

* Add perf test for nested field (opensearch-project#1394)

Signed-off-by: Heemin Kim <heemin@amazon.com>

---------

Signed-off-by: Heemin Kim <heemin@amazon.com>
(cherry picked from commit 709b448)
heemin32 added a commit that referenced this pull request Jan 19, 2024
* Add patch to support multi vector in faiss (#1358)

Signed-off-by: Heemin Kim <heemin@amazon.com>

* Initialize id_map as null (#1363)

Signed-off-by: Heemin Kim <heemin@amazon.com>

* Add support of multi vector in jni (#1364)

Signed-off-by: Heemin Kim <heemin@amazon.com>

* Multi vector support for Faiss HNSW (#1371)

Apply the parentId filter to the Faiss HNSW search method. This ensures that documents are deduplicated based on their parentId, and the method returns k results for documents with nested fields.

Signed-off-by: Heemin Kim <heemin@amazon.com>

* Add data generation script for nested field (#1388)

Signed-off-by: Heemin Kim <heemin@amazon.com>

* Add perf test for nested field (#1394)

Signed-off-by: Heemin Kim <heemin@amazon.com>

---------

Signed-off-by: Heemin Kim <heemin@amazon.com>
(cherry picked from commit 709b448)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants