Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: HNSW - Cannot return the results in a contigious 2D array. Probably ef or M is too small #2620

Open
tazarov opened this issue Aug 2, 2024 · 0 comments
Assignees
Labels
bug Something isn't working index

Comments

@tazarov
Copy link
Contributor

tazarov commented Aug 2, 2024

What happened?

Testing a tangential issue, I've run into a reproducible HNSW index error:

...
  File "/Users/tazarov/experiments/chroma/chroma-taz-21/chromadb/segment/impl/vector/local_hnsw.py", line 157, in query_vectors
    result_labels, distances = self._index.knn_query(
                               ^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Cannot return the results in a contigious 2D array. Probably ef or M is too small

The code to reproduce this:

import gc
import uuid

import chromadb
import numpy as np

np.random.seed(42)


def main():
    data = np.random.uniform(-1, 1, (1000, 500, 1536))

    client = chromadb.PersistentClient("contiguous2d")
    # client = chromadb.HttpClient()
    collection = client.get_or_create_collection("test_collection")
    for i in range(data.shape[0]):
        print("Iteration: ", str(i))
        gc.collect()
        ids = [f"{uuid.uuid4()}" for i in range(data[i].shape[0])]
        collection.add(ids=ids, embeddings=data[i])
        collection.query(query_embeddings=[data[i][np.random.choice(data[i].shape[0])].tolist()], n_results=10)
        collection.delete(ids=ids)
        gc.collect()


if __name__ == "__main__":
    main()

Note: The issue occurs anywhere between 20 to 50th iterations.

The issue can be reproduced with both PersistentClient and HttpClient with server in Docker container.

Versions

Chroma 0.5.3/latest main, hnswlib 0.7.3/0.7.5

Tested HW configs:

  • Python 3.11, MacOS (M3 CPU)
  • Python 3.12, Ubuntu 24.04 (4 core/16GB, Intel CPU)

Relevant log output

Traceback (most recent call last):
  File "/home/ubuntu/hnsw_contiguous.py", line 22, in <module>
    main()
  File "/home/ubuntu/hnsw_contiguous.py", line 17, in main
    collection.query(query_embeddings=[data[i][np.random.choice(data[i].shape[0])].tolist()], n_results=10)
  File "/home/ubuntu/venv/lib/python3.12/site-packages/chromadb/api/models/Collection.py", line 195, in query
    query_results = self._client._query(
                    ^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/venv/lib/python3.12/site-packages/chromadb/telemetry/opentelemetry/__init__.py", line 146, in wrapper
    return f(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/venv/lib/python3.12/site-packages/chromadb/rate_limiting/__init__.py", line 47, in wrapper
    return f(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/venv/lib/python3.12/site-packages/chromadb/api/segment.py", line 738, in _query
    results = vector_reader.query_vectors(query)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/venv/lib/python3.12/site-packages/chromadb/telemetry/opentelemetry/__init__.py", line 146, in wrapper
    return f(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/venv/lib/python3.12/site-packages/chromadb/segment/impl/vector/local_persistent_hnsw.py", line 372, in query_vectors
    hnsw_results = super().query_vectors(hnsw_query)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/venv/lib/python3.12/site-packages/chromadb/telemetry/opentelemetry/__init__.py", line 146, in wrapper
    return f(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/venv/lib/python3.12/site-packages/chromadb/segment/impl/vector/local_hnsw.py", line 156, in query_vectors
    result_labels, distances = self._index.knn_query(
                               ^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Cannot return the results in a contigious 2D array. Probably ef or M is too small
@tazarov tazarov added bug Something isn't working index labels Aug 2, 2024
@atroyn atroyn self-assigned this Sep 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working index
Projects
None yet
Development

No branches or pull requests

2 participants