Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Concurrent Segment Search] NPE with size and terminate_after parameters when using concurrent search #10054

Closed
jed326 opened this issue Sep 14, 2023 · 3 comments · Fixed by #10082
Assignees
Labels
enhancement Enhancement or improvement to existing feature or request Search:Performance

Comments

@jed326
Copy link
Collaborator

jed326 commented Sep 14, 2023

When size=0 and terminate_after parameters are used in a search request and concurrent search path is used, an NPE is thrown

To Reproduce
This was done on nyc_taxis data set.

dev-dsk-dengjay-2b-698455fb % curl -X GET "--/_search?size=0&track_total_hits=true&pretty&terminate_after=1000000" -H 'Content-Type: application/json' -d'
{
    "query": {
        "match_all": {}
    }
}
'

{
  "error" : {
    "root_cause" : [
      {
        "type" : "null_pointer_exception",
        "reason" : "Cannot invoke \"org.apache.lucene.search.CollectorManager.newCollector()\" because \"this.manager\" is null"
      }
    ],
    "type" : "search_phase_execution_exception",
    "reason" : "all shards failed",
    "phase" : "query",
    "grouped" : true,
    "failed_shards" : [
      {
        "shard" : 0,
        "index" : "nyc_taxis",
        "node" : "K7DzKxU4Tyin-01SQBj-7A",
        "reason" : {
          "type" : "null_pointer_exception",
          "reason" : "Cannot invoke \"org.apache.lucene.search.CollectorManager.newCollector()\" because \"this.manager\" is null"
        }
      }
    ],
    "caused_by" : {
      "type" : "null_pointer_exception",
      "reason" : "Cannot invoke \"org.apache.lucene.search.CollectorManager.newCollector()\" because \"this.manager\" is null",
      "caused_by" : {
        "type" : "null_pointer_exception",
        "reason" : "Cannot invoke \"org.apache.lucene.search.CollectorManager.newCollector()\" because \"this.manager\" is null"
      }
    }
  },
  "status" : 500
}

This was found while doing some debugging for

@jed326 jed326 added enhancement Enhancement or improvement to existing feature or request Search Search query, autocomplete ...etc labels Sep 14, 2023
@jed326 jed326 self-assigned this Sep 14, 2023
@jed326
Copy link
Collaborator Author

jed326 commented Sep 15, 2023

It looks like the issue also related to track_total_hits:

dengjay@88665a3f24d4 OpenSearch % curl -X GET "localhost:9200/_search?size=0&track_total_hits=false&pretty&terminate_after=10000000" -H 'Content-Type: application/json' -d'
{
    "query": {
        "match_all": {}
    }
}
'

{
  "took" : 13,
  "timed_out" : false,
  "terminated_early" : true,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "max_score" : null,
    "hits" : [ ]
  }
}

When size=0 we use EmptyTopDocsCollectorContext, and in the createManager call below manager ends up being null since we don't go into this if block if track_total_hits=true.

if (hitCount == -1) {
if (trackTotalHitsUpTo != SearchContext.TRACK_TOTAL_HITS_ACCURATE) {
manager = new EarlyTerminatingCollectorManager<>(
new TotalHitCountCollectorManager(sort),
trackTotalHitsUpTo,
false
);
}

Consequently, the manager passed into ctx.createManager will be null:

public static CollectorManager<? extends Collector, ReduceableSearchResult> createQueryCollectorManager(
List<QueryCollectorContext> collectorContexts
) throws IOException {
CollectorManager<?, ReduceableSearchResult> manager = null;
for (QueryCollectorContext ctx : collectorContexts) {
manager = ctx.createManager(manager);
}
return manager;
}

And EarlyTerminatingCollectorManager gets created with a null manager:

@Override
CollectorManager<? extends Collector, ReduceableSearchResult> createManager(
CollectorManager<? extends Collector, ReduceableSearchResult> in
) throws IOException {
return new EarlyTerminatingCollectorManager<>(in, numHits, true);
}

@jed326
Copy link
Collaborator Author

jed326 commented Sep 15, 2023

For the special case of size=0, we used an EmptyTopDocsCollectorContext:

if (searchContext.size() == 0) {
// no matter what the value of from is
return new EmptyTopDocsCollectorContext(
reader,
query,
searchContext.sort(),
searchContext.trackTotalHitsUpTo(),
hasFilterCollector
);
} else if (searchContext.scrollContext() != null) {

However, unlike SimpleTopDocsCollectorContext this does not take numDocs in as a parameter.

@jed326
Copy link
Collaborator Author

jed326 commented Sep 16, 2023

Looks like there are actually at least 2 more problems related to size/terminate_after/track_total_hits:

  1. track_total_hits does not correctly track the total hits when terminate_after is used. This is specific to concurrent search:
curl -X GET "--/_search?pretty&size=1&track_total_hits=true&terminate_after=1"  -H 'Content-Type: application/json' -d'
{
    "query": {
        "match_all": {}
    }
}
'
{
  "took" : 1,
  "timed_out" : false,
  "terminated_early" : true,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "gte"
    },
    "max_score" : 1.0,
    "hits" : [...
    ]
  }
}
curl -X GET "--/_search?pretty&size=1&track_total_hits=true"  -H 'Content-Type: application/json' -d'
{
    "query": {
        "match_all": {}
    }
}
'
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 165346692,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [...
    ]
  }
}
  1. There's a problem with size=0 and track_total_hits for the aggs too.
curl -X GET "opens-clust-1gj8zaf4fng1b-d6fa7bd00441ed0d.elb.us-east-1.amazonaws.com/_search?track_total_hits=true&pretty" -H 'Content-Type: application/json' -d'
{
            "size": 0,
        "query": {
          "bool": {
            "filter": {
              "range": {
                "trip_distance": {
                  "lt": 50,
                  "gte": 0
                }
              }
            }
          }
        },
        "aggs": {
          "distance_histo": {
            "histogram": {
              "field": "trip_distance",
              "interval": 1
            },
            "aggs": {
              "total_amount_stats": {
                "stats": {
                  "field": "total_amount"
                }
              }
            }
          }
        }
}'
{
  "error" : {
    "root_cause" : [
      {
        "type" : "illegal_argument_exception",
        "reason" : "Collector managers should all be non-null"
      }
    ],
    "type" : "search_phase_execution_exception",
    "reason" : "all shards failed",
    "phase" : "query",
    "grouped" : true,
    "failed_shards" : [
      {
        "shard" : 0,
        "index" : "nyc_taxis",
        "node" : "K7DzKxU4Tyin-01SQBj-7A",
        "reason" : {
          "type" : "illegal_argument_exception",
          "reason" : "Collector managers should all be non-null"
        }
      }
    ],
    "caused_by" : {
      "type" : "illegal_argument_exception",
      "reason" : "Collector managers should all be non-null",
      "caused_by" : {
        "type" : "illegal_argument_exception",
        "reason" : "Collector managers should all be non-null"
      }
    }
  },
  "status" : 400
}

Basically looks like we need to take a closer look at the special case of size=0 for concurrent segment search. Going to take a quick look at both of these first and then see if we need to move these into separate issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Search:Performance
Projects
Status: Done
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants