One of our most frequently asked for and used feature is a custom query parser. Beyond simple prefix and term matching, we can also parse custom queries that lets you add filters and restrict the search to specific fields. Let’s look into why we built this and how it works…
First, we’ll talk about how a search engine is different from a relational data store like MySQL or PostgreSQL. In a relational database, you have a schema that defines columns with specific types. Types include varchar, char, integer, etc.
In a search engine, beyond basic types, you can have something like the n-gram type. An n-gram data structure divides a word like “introduction” into a set of uni-grams, bi-grams, tri-grams, etc. This is very useful for matching autocomplete terms.
To increase the range of matches, it’s easier to preprocess the input being indexed to account for word stems, extended latin characters (like ñ), spelling corrections and synonyms. We index the possible variations of the input text into a multi-field and during query time, we include all of these fields in the search automatically. All of this is transparent to the user and is done out of the box.
The challenge is to devise a query language as close to natural language as possible and rewrite it to an abstract syntax tree that can then be transformed to perform arbitrarily complex queries.
While designing the parser, we strove to be as close to the Google query syntax as possible. Google’s syntax has a few rules:
- the query is parsed into terms and operators
- terms can be exact or phrases
- exact terms are within double quotes, eg: “john smith”
- phrases match one or more words in any order
- boolean operators like and, or and not can be used with parenthesis to group queries
- you can use + and - to mark specific terms as present or absent
- you can restrict the search to specific fields with a prefix
A product search example
Say you have a product search engine with the fields: SKU, name, description, price, category and availability. Here are some queries you can use:
Restrict to a single category
"gold ring" category:ring
Restrict to one or more categories
"gold ring" category:ring category:necklace "gold ring" (category:ring or category:necklace)
"gold ring" +engraved color:red
The query parser is fully documented. Read about it on the API reference page.
Photo Credit: mendhak.