Understanding advanced search
This syntax is available when you use the Matches any of the Words type of query. You can use these special characters:
Character |
Description |
---|---|
+ |
Performs the AND operation |
- |
Negates a single token |
| |
Performs the OR operation |
( and ) |
Signifies precedence |
“ |
Wraps a number of tokens to signify a phrase for searching |
* |
At the end of a term, signifies a prefix query |
? |
Matches any single character |
~N |
After a word, signifies edit distance (fuzziness) |
~N |
After a phrase, signifies slop amount |
\ |
Followed by one of the above symbols, allows the search for that symbol, and disables its special meaning. |
The AND operation forces the inclusion of the additional terms. This is similar to the normal behaviour, but results MUST include the term which is prefixed by the ‘+’ symbol. To illustrate the difference, the search:
matches any document which contains the word ‘legal’, or matches the word ‘document’. But the search:
matches any document which contains ‘document’, if and only if they contain the word ‘legal’.
Conversely, the following search matches words which contain the word ‘document’ if and only if they do not contain the word ‘legal’.
The OR operator ‘|’, becomes significantly more powerful with the parenthesis precedence operator. The query:
searches for the term ‘legal’ only in documents that contain at least one of the terms ‘document’ or ‘contract’.
The phrase-wrapping operator, the quotation marks, indicate sections which should not be counted as words, but whole phrases to be matched.
The phrase wrapping operator can be used alongside other operators in the same way that words can. It should be noted that the opposite is not true - other opeators cannot appear inside quotes.
Prefix queries match any word that starts with the given segment before the ‘*’. This query matches “John” but also “Johannesburg”.
Similarly, the ‘?’ operator matches any single missing character, such that this query matches “John” but not “Johannesburg”.
Edit distance is the number of changes (substitutions, removals, additions) that need to be made for a word to match another. “John” has an edit distance of 1 from “Johan” or “Jon”. By appending a ~ followed by a number, you can specify matches within an edit distance of a word provided in the query.
As such, this query:
matches John, Johan, and John.
Using the tilde after a phrase has a different meaning - slop is the number of words (more accurately, tokens) which can come between tokens in a phrase. Thus:
matches both “are authorised” and “are not authorised” (or indeed, any other word in between ‘are’ and ‘authorised’.
The “Match Exact” checkbox next to each term in a query has different functionality depending on the field you’re using. It has no effect at all in the “Body” text.
If you uncheck it, the matching for the term in the query is analysed by the platform to provide some context sensitivity to the search.
If you uncheck it, Exonar analyses the term in the query to provide some context sensitivity to the search.
For example, if you use the telephone numbers field with +441134961122 as your search term, Exonar recognises it as an internationally formatted UK number, and divides it up into components:
Component |
Description |
---|---|
+ |
Prefix |
44 |
International dialling code |
(0)113 |
Area code with optional prefix 0 |
4961122 |
Local telephone number |
When checked, the search term must exactly match the data in the indexes we have stored.
+441134961122 is a fictional UK telephone number provided by Ofcom for use in film and television.