Stop words + AND queries: why you suddenly get 0 results (and the setting that fixes it)
Context
You have a full-text search endpoint that builds a boolean query from user input.
A common pattern is to split the user query into keywords and create one multi_match clause per keyword, then combine them with bool.must (AND semantics).
This can fail unexpectedly when the field analyzer removes stop words (e.g. “of”, “the”): queries containing stop words can return zero results, even though narrower queries work.
Finding
Why AND + per-keyword multi_match can return 0 results
When OpenSearch/Elasticsearch executes a multi_match query, it analyzes the query string using the selected analyzer. If analysis produces zero tokens, the query becomes a zero-terms query.
- With the default behavior (
zero_terms_queryomitted), a zero-terms query matches no documents. - If you then combine that clause in
bool.must, the entire query becomes unsatisfiable:MATCH(Lord) AND MATCH(of) AND MATCH(the) AND MATCH(rings)whereMATCH(of)/MATCH(the)analyze to zero tokens and therefore match nothing.
This is different from “a term that produces tokens but happens to match 0 docs” (e.g. a rare word). zero_terms_query does not change that.
What zero_terms_query: "all" actually does
Setting zero_terms_query: "all" tells OpenSearch:
- If analysis yields zero tokens, treat this clause as matching all documents.
So in an AND chain, stop-word-only keywords become neutral rather than destructive:
MATCH(Lord)= restricts resultsMATCH(of)= ALL (neutral)MATCH(the)= ALL (neutral)MATCH(rings)= restricts results
Net effect: the query behaves like Lord AND rings.
Code Example
Anonymised example: per-keyword multi_match with AND
{
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "lord",
"fields": ["fulltext^10", "name^4"],
"analyzer": "my_english",
"fuzziness": "1",
"zero_terms_query": "all"
}
},
{
"multi_match": {
"query": "of",
"fields": ["fulltext^10", "name^4"],
"analyzer": "my_english",
"fuzziness": "1",
"zero_terms_query": "all"
}
},
{
"multi_match": {
"query": "rings",
"fields": ["fulltext^10", "name^4"],
"analyzer": "my_english",
"fuzziness": "1",
"zero_terms_query": "all"
}
}
]
}
}
}
The order of processing (mental model)
-
Your app builds JSON DSL
- Often: split user query into tokens → map tokens to
multi_matchclauses → combine withbool.must.
- Often: split user query into tokens → map tokens to
-
OpenSearch applies the analyzer per clause
- Tokenizer + lowercase + stemming + stop-word filter.
-
If analysis yields no tokens
- Clause becomes a zero-terms query.
- Without
zero_terms_query: "all": clause matches nothing. - With
zero_terms_query: "all": clause matches everything (neutral).
-
Boolean logic runs
bool.mustintersects matches from each clause.
Recommended pattern: strip stop words in the app AND set zero_terms_query
If you rely on AND semantics and per-token clauses, the most robust pattern is:
- Pre-filter stop words in the app (so you don’t generate pointless clauses).
- Still set
zero_terms_query: "all"as a safety net in case:- your stop-word list differs from the index analyzer,
- language/analyzer changes,
- or unexpected tokens are fully removed by analysis.
Why It Works
zero_terms_query does not “break AND semantics”. It only changes behavior for clauses where the analyzer produced no query terms at all.
That’s exactly the pathological case for stop words in per-token AND queries:
without zero_terms_query, stop words turn into “match nothing” constraints.
With it, they become “match all” constraints, which is the closest equivalent to “ignore this token”.
References
- OpenSearch / Elasticsearch docs: zero_terms_query — Explains how queries behave when analysis results in zero tokens
- stopword (npm) — Useful for pre-filtering stop words before building per-token queries