[FEATURE] Support batch ingestion in TextEmbeddingProcessor & SparseEncodingProcessor #743

chishui · 2024-05-09T03:26:15Z

Is your feature request related to a problem?

RFC: opensearch-project/OpenSearch#12457

We have implemented batch ingestion logic in OpenSearch core in version 2.14, now we want to enable the batch ingestion capability in neural-search processors: TextEmbeddingProcessor & SparseEncodingProcessor so that we can better utilize the remote ML server's GPU capacity and accelerate the ingestion process, based our benchmark, batch can reduce total ingestion time by 77% without seeing throttling error (P90, SageMaker), please refer to here to see the benchmark results.

What solution would you like?

In InferenceProcessor, override Processor's batchExecute API, add a default implementation to combine List<String> inferenceText from multiple docs, then reuse the mlCommonsClientAccessor.inferenceSentences and mlCommonsClientAccessor.inferenceSentencesWithMapResult. After getting inference results, map them to each doc and update the docs.
We'll sort the docs by length before sending for inference to achieve better performance. And inference results will be restored to original order before being processed.
(This was original proposed in ml-commons. But as @ylwu-amzn suggested that we can reuse input_docs_processed_step_size as max batch size, then it makes more sense to sort the docs in neural-search as we can ensure that we won't sort docs from TextImageEmbeddingProcessor)

What alternatives have you considered?

N/A

Do you have any additional context?

N/A

The text was updated successfully, but these errors were encountered:

zhichao-aws · 2024-05-17T02:59:08Z

Hi @chishui , could you please provide an example API request body to create a batched ingest processor?

martin-gaievski · 2024-05-18T00:18:33Z

We are not adding this feature for TextImageEmbeddingProcessor, is there a plan to do it later?

chishui · 2024-05-20T05:51:42Z

@zhichao-aws there is no changes to how TextEmbeddingProcessor and SparseEncodingProcessor are created. It's only when user uses _bulk API with batch_size parameter, then the processors will see documents in batches.

chishui · 2024-05-21T06:10:28Z

@martin-gaievski we don't have a plan to support TextImageEmbeddingProcessor as it requires text and image to be grouped in one request.

chishui · 2024-05-21T06:16:36Z

Updated description in "What solution would you like?" section.

navneet1v · 2024-05-30T03:40:45Z

@chishui given that the feature is merged across all the components, what is the final benchmarking number for the batch ingestion?

navneet1v · 2024-05-30T03:41:35Z

@chishui can you attach the documentation issue here for tracking purpose as this feature.

navneet1v · 2024-05-30T03:43:39Z

@chishui I looked into the benchmarks here: opensearch-project/OpenSearch#12457 (comment), when we say there is a significant improvement on throughput I see it is with a gpu instance and not with any other model. Is this correct?

chishui · 2024-05-30T11:24:02Z

@chishui given that the feature is merged across all the components, what is the final benchmarking number for the batch ingestion?

This is the recent benchmark I redid with neural sparse model hosted on sagemaker

Based on OpenSearch-v2.14.0
OpenSearch host type: r6a.4xlarge
- 16 vCPU
1 shard
OpenSearch benchmark host type: c6a.4xlarge
OpenSearch JVM: Xms:48g, Xmx: 48g
Data: https://github.com/iai-group/DBpedia-Entity/. (300k text only)

SageMaker

SageMaker host type: g5.xlarge
Processor: Sparse Encoding
Benchmark Setup
- Bulk size: 100
- client: 1

Metrics	no batch	batch (batch size=50, input_docs_processed_step_size =20)
Min Throughput (docs/s)	57.73	17.35
Mean Throughput (docs/s)	93.26	214.94
Median Throughput (docs/s)	93.9	226.36
Max Throughput (docs/s)	96.38	239.91
Latency P50 (ms)	1180.89	427.71
Latency P90 (ms)	1134.32	562.268
Latency P99 (ms)	1248.18	796.35
Total Benchmark Time (s)	3275	1361
Error Rate (%)	0	0

chishui · 2024-05-30T11:33:49Z

@chishui I looked into the benchmarks here: opensearch-project/OpenSearch#12457 (comment), when we say there is a significant improvement on throughput I see it is with a gpu instance and not with any other model. Is this correct?

We did benchmarked on Sagemaker, OpenAI and Cohere which all results were listed in the comment you linked. We saw improvement on throughput for all these services. The gpu instance mentioned was the one SageMaker used as we can choose instance type for SageMaker but we have no idea what type of gpu instance OpenAI and Cohere used.

chishui · 2024-06-06T07:23:00Z

@chishui can you attach the documentation issue here for tracking purpose as this feature.

opensearch-project/documentation-website#7305

chishui added enhancement untriaged labels May 9, 2024

This was referenced May 9, 2024

[FEATURE] Split remote inference text list if its number exceeds user configured limitation opensearch-project/ml-commons#2428

Closed

[FEATURE] Support batch ingestion in TextEmbeddingProcessor & SparseEncodingProcessor #744

Merged

zhichao-aws assigned chishui May 14, 2024

zhichao-aws removed the untriaged label May 14, 2024

chishui closed this as completed May 28, 2024

chishui mentioned this issue Jun 6, 2024

[DOC] Documentation of batch ingestion feature opensearch-project/documentation-website#7305

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Support batch ingestion in TextEmbeddingProcessor & SparseEncodingProcessor #743

[FEATURE] Support batch ingestion in TextEmbeddingProcessor & SparseEncodingProcessor #743

chishui commented May 9, 2024 •

edited

Loading

zhichao-aws commented May 17, 2024

martin-gaievski commented May 18, 2024

chishui commented May 20, 2024

chishui commented May 21, 2024

chishui commented May 21, 2024

navneet1v commented May 30, 2024

navneet1v commented May 30, 2024

navneet1v commented May 30, 2024

chishui commented May 30, 2024

chishui commented May 30, 2024

chishui commented Jun 6, 2024

[FEATURE] Support batch ingestion in TextEmbeddingProcessor & SparseEncodingProcessor #743

[FEATURE] Support batch ingestion in TextEmbeddingProcessor & SparseEncodingProcessor #743

Comments

chishui commented May 9, 2024 • edited Loading

Is your feature request related to a problem?

What solution would you like?

What alternatives have you considered?

Do you have any additional context?

zhichao-aws commented May 17, 2024

martin-gaievski commented May 18, 2024

chishui commented May 20, 2024

chishui commented May 21, 2024

chishui commented May 21, 2024

navneet1v commented May 30, 2024

navneet1v commented May 30, 2024

navneet1v commented May 30, 2024

chishui commented May 30, 2024

chishui commented May 30, 2024

chishui commented Jun 6, 2024

chishui commented May 9, 2024 •

edited

Loading