2.9.0 (April 28 2011)¶
New Features¶
ENGINE-193 Did You Mean? Support¶
Customers can now enable a power query suggestion feature to provide “Did You Mean?” and spell correction support. Did You Mean will return suggested queries, with highlighted changes, in the query response.
To define a custom spelling dictionary, create a Word Set and assign the word set to the text or fieldedText dimension’s didYouMeanDictionary attribute.
It is not necessary to define a custom spelling dictionary to use Did You Mean? because the feature will automatically use the indexed words as its original spelling dictionary.
Example:
<dimension id="freetext" key="venue_name" didYouMean="true" didYouMeanDictionary-ref="customDict" />
<wordset id="customDict">
animalz tintinnabulation fantabulous supercalifragilisticexpialidocious
</wordset>
Did You Mean at Indexing¶
When Did You Mean is enabled, a separate spellcheck and relevance index is created based upon the didYouMeanDictionary and any indexed words. This index has additional memory requirements associated with it.
Did You Mean at Query Time¶
The query API has been enhanced to include a didYouMean response that includes alternate query suggestions. The query request API also includes option to determine highlighting of the suggested query words, the number of suggestions to return and information about whether a word was considered a good search term.
ENGINE-428 Support for match highlighting against tree type dimensions¶
The highlighting query criterion and response have been enhanced to return specifically which value indexed in a tree type dimension was a match.
ENGINE-208 Support for Synonyms¶
Customers can now create synonym dictionaries to be reused when analyzing text and fieldedText type dimensions.
They are defined as elements within the dimensions.xml file and then referred to in dimension definitions. The words in the synonym dictionaries are tokenized (split into terms) using the analyzer rules for the dimension to which they apply, so punctuation, whitespace and other extra characters will be automatically removed. These features should minimize any formatting requirements to create a synonym dictionary.
Synonyms can consist of one or more words (phrases). The lookup word or phrase maps to a list of one or more synonyms. To identify a phrase synonym, the phrase must be surrounded with double-quotes (“massachusetts state”).
A synonym dictionary is created using the thesaurus element. This example creates a simple synonym dictionary for a handful of US States:
<dimensions>
<dimension id="freetext" key="venue_name" synonyms-ref="us-states" />
<thesaurus id="us-states">
<lookup id="OR">Oregon Ore</lookup>
<lookup id="ME">Maine</lookup>
<lookup id="FL">Florida Fla</lookup>
<lookup id="North Carolina">NC "tarheel state"</lookup>
</thesaurus>
</dimensions>
Synonym dictionaries and dimensions can appear in any order in the dimensions.xml file. The thesaurus element does not need to be defined first in order to referred to in a dimension definition.
Synonym Reduction at Indexing¶
For efficient storage of synonyms in a text dimension, the engine performs synonym reduction if the synonym lookup has the same number of analyzed words as the synonym itself. In the above example for US States, synonym reduction means that any instances of “Ore” and “Oregon” will be replaced by “OR”. Similarly, all instances of “tarheel state” will be replaced with “North Carolina”, but “North Carolina” will not be reduced to “NC” since the former has 2 words and the latter but 1. To maintain the original term before it is reduced, use the storeOriginalTerm text type dimension attribute.
Synonym Expansion at Query Time¶
Synonym reduction takes place on the query search string in a similar fashion to what happens at index time. In addition, the query string is expanded to include any synonyms that may be found by scanning the original query, but only if the number of words in the synonym lookup are not equal to the number of words of the synonym itself. In the above example for US States, synonym expansion means that searching for “North Carolina” will also include searching for the phrase “tarheel state” or the word “NC”.
There is no additional query request parameter necessary to enable synonym dictionaries other than to define the synonym dictionaries with the text dimension to be searched.
Synonym Dictionary Inversion¶
Because the engine optimizes and inverts the analyzed synonym dictionaries, queries can search on the lookup value or any synonym value and find the same items.
ENGINE-411 Support for a Word Delimiter Tokenizer¶
Customers can now make use of a special text and fieldedText dimension feature that breaks apart and builds up words from the original text. This new feature is particularly useful for product names, numbers and words that are hyphen-delimited, making it easier for users to find results without knowing how the exact word appeared in the original document.
To enable the Word Delimiter feature, use the dimension attribute wordDelimiter.
Features¶
- Split on intra-word delimiters (by default, all non alpha-numeric characters) and
combine word parts to create a single word, for example
wi-fi
becomeswi
,fi
,wifi
and with storeOriginalWord enabled,wi-fi
. Split on intra-word delimiters also works with numbers, converting555-322-1212
into555
,322
,1212
and5553221212
. - Split on case transitions, for example,(with ignoreCase enabled),
PowerShot
becomespower
,shot
andpowershot
.McDonald
becomesmc
anddonald
. - Correctly stem English possessives. For example,
O'Neil's
becomesO'Neil
. - Split on letter-number transitions, for example,
MD80
becomesMD
,80
and with storeOriginalWord enabled,MD80
. - Maintains leading and trailing currency symbols, plus (+) and minus (-) signs.
With storeOriginalWord enabled, customers can search for
$10
or10
,20¢
or20
,+20
,10-
or-10
. - Maintains trailing % symbols. With storeOriginalWord enabled, customers can
search for
10%
or10
.
ENGINE-228 Expose Lucene-style QueryParser as Advanced Parser Option¶
Customers can now create an advanced query using an experimental query parser modeled after Apache Lucene’s query parser. The query parser can be selected using the queryParser of a text or fieldedText search criterion.
Behavioral Changes¶
ENGINE-411 The default tokenizer for text and fieldedText dimensionsis now wordDelimiter¶
Customers can enable the standard tokenizer by using the dimension attribute tokenizer=standard
.
ENGINE-418 Removed use of radians¶
Radians for geoloc type dimensions are no longer supported.
ENGINE-421 Removed Locale support from WordSets¶
Locales can no longer be specified for Word Sets. Word Sets are now not related to a locale.
ENGINE-428 Highlighting & Values query response API Changes¶
Changeset property data for tree type dimensions are now returned in a dictionary that includes the original value in the tree plus it’s label, which is the value associated with the name attribute of the element tag that indexed the item.
Bug Fixes¶
ENGINE-434 fieldPositionIncrement gap was ignored¶
fieldPositionIncrementGap is designed to work on fields of a fieldedText dimension. This setting was being ignored. It now works.
ENGINE-398 Multi-threaded query setting is for boot time only¶
The multi-threaded query option that was new in verions 2.8.7 did not property apply itself unless the engine was rebooted. This setting may now be enabled or disabled on-the-fly using the Admin Tool Settings page.
ENGINE-414 Bulk changesets aren’t correctly committed¶
If a Bulk changeset was applied and there were no subsequent delta changesets applied, the bulk changeset was not properly committed resulting in the changeset having to be reapplied when the engine rebooted.</p>
ENGINE-417 Admin Tool buttons and options disappear if engine has never been set to read-only¶
The engine’s read only setting, if not selected in the Admin Tool, caused various features and buttons to be removed from the Admin Tool. This has been fixed.
ENGINE-423 Admin Tool ignores username/password changes¶
Changes to the username or password of protected feed URLs in the Admin Tool were not recognized as being changed and thus those changes were being ignored.
ENGINE-424 Engine was incorrectly logging a success message when a changeset feed failed¶
Engine was incorrectly logging a success message when a changeset feed failed.