Back to Home

Stop words + AND queries: why you suddenly get 0 results (and the setting that fixes it)

#opensearch #elasticsearch #search #debugging #gotcha

Context

You have a full-text search endpoint that builds a boolean query from user input. A common pattern is to split the user query into keywords and create one multi_match clause per keyword, then combine them with bool.must (AND semantics).

This can fail unexpectedly when the field analyzer removes stop words (e.g. “of”, “the”): queries containing stop words can return zero results, even though narrower queries work.

Finding

Why AND + per-keyword multi_match can return 0 results

When OpenSearch/Elasticsearch executes a multi_match query, it analyzes the query string using the selected analyzer. If analysis produces zero tokens, the query becomes a zero-terms query.

  • With the default behavior (zero_terms_query omitted), a zero-terms query matches no documents.
  • If you then combine that clause in bool.must, the entire query becomes unsatisfiable: MATCH(Lord) AND MATCH(of) AND MATCH(the) AND MATCH(rings) where MATCH(of)/MATCH(the) analyze to zero tokens and therefore match nothing.

This is different from “a term that produces tokens but happens to match 0 docs” (e.g. a rare word). zero_terms_query does not change that.

What zero_terms_query: "all" actually does

Setting zero_terms_query: "all" tells OpenSearch:

  • If analysis yields zero tokens, treat this clause as matching all documents.

So in an AND chain, stop-word-only keywords become neutral rather than destructive:

  • MATCH(Lord) = restricts results
  • MATCH(of) = ALL (neutral)
  • MATCH(the) = ALL (neutral)
  • MATCH(rings) = restricts results

Net effect: the query behaves like Lord AND rings.

Code Example

Anonymised example: per-keyword multi_match with AND

{
  "query": {
    "bool": {
      "must": [
        {
          "multi_match": {
            "query": "lord",
            "fields": ["fulltext^10", "name^4"],
            "analyzer": "my_english",
            "fuzziness": "1",
            "zero_terms_query": "all"
          }
        },
        {
          "multi_match": {
            "query": "of",
            "fields": ["fulltext^10", "name^4"],
            "analyzer": "my_english",
            "fuzziness": "1",
            "zero_terms_query": "all"
          }
        },
        {
          "multi_match": {
            "query": "rings",
            "fields": ["fulltext^10", "name^4"],
            "analyzer": "my_english",
            "fuzziness": "1",
            "zero_terms_query": "all"
          }
        }
      ]
    }
  }
}

The order of processing (mental model)

  1. Your app builds JSON DSL

    • Often: split user query into tokens → map tokens to multi_match clauses → combine with bool.must.
  2. OpenSearch applies the analyzer per clause

    • Tokenizer + lowercase + stemming + stop-word filter.
  3. If analysis yields no tokens

    • Clause becomes a zero-terms query.
    • Without zero_terms_query: "all": clause matches nothing.
    • With zero_terms_query: "all": clause matches everything (neutral).
  4. Boolean logic runs

    • bool.must intersects matches from each clause.

If you rely on AND semantics and per-token clauses, the most robust pattern is:

  • Pre-filter stop words in the app (so you don’t generate pointless clauses).
  • Still set zero_terms_query: "all" as a safety net in case:
    • your stop-word list differs from the index analyzer,
    • language/analyzer changes,
    • or unexpected tokens are fully removed by analysis.

Why It Works

zero_terms_query does not “break AND semantics”. It only changes behavior for clauses where the analyzer produced no query terms at all.

That’s exactly the pathological case for stop words in per-token AND queries: without zero_terms_query, stop words turn into “match nothing” constraints. With it, they become “match all” constraints, which is the closest equivalent to “ignore this token”.

References

Comments