2.8.6 (February 2 2011)

New Features

ENGINE-271 Engine Data Can Be Set to Read-Only

The Admin Tool’s Settings Page now includes a new option to disable changes to changesets and indices.

ENGINE-313 Engine Configuration Settings Can Now Be Downloaded and Uploaded

Available from the Admin Tool’s Settings Page.

ENGINE-318 The engine now supports logging of HTTP requests

The Admin Tool’s Settings Page now includes a new option to enable/disable HTTP logging.

ENGINE-330 Italian is now a supported language

Refer to the documentation for the default Italian stop words.

ENGINE-266 Support for Word Sets

Customers can now create lists of words (word sets) to be reused when analyzing text type dimensions. Word sets can now be used to define stop words, stemming exclusion words and analysis exclusion words.

Word sets can be defined per Locale or without. They are defined as elements within the dimensions.xml file and then referred to in dimension definitions. The list of words in the word set definition are tokenized (split into words) using the analyzer rules for the dimension to which they apply, so punctuation, whitespace and other extra characters will be automatically removed. This features should minimize any formatting requirements to create word set.

This example creates a simple word set and includes extra punctuation that will not be included in the final word set:

<wordset id="stopwords">a an, the, that; whose: </wordset>

Word sets can also be defined for a specific Locale as follows in this example for English and French:

<wordset id="stopwords">
  <locale id="en_US">a an the that whose</locale>
  <locale id="fr">à un de la les le</locale>
</wordset>

If the word set does not contain a locale, then the Locale used is the default Locale for the engine. If the Locale is specified, then the Locale of the dimension will be used to determine which word set to be used. If no match is found, then the default Locale will be used.

<dimension id="en_text" type="text" stopWords-ref="stopwords" />
<dimension id="fr_text" type="text" stopWords-ref="stopwords" locale="fr" />

Word sets and dimensions can appear in any order in the dimensions.xml file. Word sets do not need to be defined first in order to refer to them in a dimension definition.

Enhancements

ENGINE-67 Admin Tool Dimensions Page now shows index statistics and coverage per user-defined Content Type

This feature makes it easier to understand the index coverage statistics by filtering the items shown by content-type. Customers can determine which dimension serves as the content type discriminator.

ENGINE-230 All supported languages now provide for excluding words from the stemming filter

Word sets are used to create stemming exclusion lists, words that should not be stemmed.

ENGINE-232 New text type dimension feature “enablePositionIncrements” for refined stop word handling in text dimensions

This new features allows for more detailed accounting of original word positions when a stop word is removed during the indexing process. When on, queries against the original text need to include the original number of stop words for exact phrase matching. Exact phrase matching is enabled in word proximity searches where wordDistance = 0.

Take this example text “United States of the Americas” in which both stemming and stop words are enabled for the dimension. When enablePositionIncrements is on, exact phrase matching against the original text would succeed only if there are two stop words between “States” and “America”. For example, “United States of America” would not match because there is only one stop word in the query: “United States of the Americas” would not match.

When enablePositionIncrements is off (default) exact phrase matching against the original text will succeed with any number of stop words between the original query text. For example, “United in the States of America” would be an exact phrase match to the example text above.

Because this setting is applied at index time, it applied to an entire dimension and cannot be modified at query time.

ENGINE-244 fieldPositionIncrementGap – New Text Type Dimension Feature

Customers can now specify whether two changeset properties included in a fieldedText dimension should be considered in proximity matching (phrases) exact matches (word proximity searches where wordDistance = 0).

Consider the following example:

<dimension id="freetext" type="fieldedText">
  <field id="name" key="venue_name" />
  <field id="location" key="address,city,state" fieldPositionIncrementGap="1" />
</dimension>

Without positionIncrementGap the property keys address, city and state would not be use to create a phrase in the location field. As shown in the above example, the text in the above example would be created in such a way that proximity matching could be applied, e.g. “Salem, Oregon” with wordDistance = 0, might match items in with city = Salem and State = “Oregon”.

Note that this attribute can be applied to any text type dimension or field elements. Because this setting is applied at index time, it applied to an entire dimension and cannot be modified at query time.

ENGINE-245 Customers can now create lists of words that should be completely excluded from analysis for text type dimensions

The new text dimension attribute is noAnalysis-ref which refers to a word set.

ENGINE-247 New Whitespace Tokenizer is Available for Text Type Dimensions

A new Whitespace tokenizer is available for text type dimensions when indexing proper names or hyphenated names. The standard tokenizer will break words at hyphens.

The new whitespace tokenizer will not break words at hyphens (dashes) nor connector characters such as the underline character. This feature may be useful for analyzing proper names. It recognizes any Unicode dash, currency symbol or punctuation connector as part of the word.

To choose the new tokenizer, use the tokenizer attribute. For example:

<dimension id="freetext" type="fieldedText">
  <field id="firstname" tokenizer="whitespace" />
  <field id="lastname" />
</dimension>

ENGINE-327 Improved Keyword startsWith Relevance

Keyword type dimensions now support richer relevance scoring by ranking exact matches higher than those that start with the queried text. To determine how the relevance changes, use the new scoring criterion in the query.

ENGINE-328 Improved Stemming Relevance When Using “storeOriginalWord”

This version extends support for storeOriginalWord (introduced in version 2.8.5) to the stemming filters. With stemming enabled, the words “star, starred, starring, stars” are all reduced to the word “star”. When storeOriginalWord is disabled, then searching for “star” will not be able to match against the original word and therefore matches to any of the stemmed forms are equally relevant.

With storeOriginalWord enabled, if the query term is an exact for the original term, then those items are ranked higher than matches against the stemmed forms.

ENGINE-334 Improved Relevance with Accent Folding When Using “storeOriginalWord”

This version extends support for storeOriginalWord to the accent folder. With accent folding enabled, the words “the thé” are all reduced to the word “the”. When storeOriginalWord is disabled, then searching for “thé” will not be able to match against the original word and therefore matches to any of the folded forms are equally relevant.

With storeOriginalWord enabled, if the query term is an exact for the original term, then those items are ranked higher than matches against the folded forms.

ENGINE-338] Phrase Queries Now Support Double Metaphone Phonetic Analysis

Phrase Queries Now Support Double Metaphone Phonetic Analysis.

ENGINE-349 Admin Tool allows dimensions to be cleared

Admin Tool allows dimensions to be cleared (and all indices removed).

ENGINE-368 Multiple Indexing and Changeset Loading Performance Improvements

Multiple Indexing and Changeset Loading Performance Improvements.

ENGINE-382 Customers Can Now Use Word Sets to Define Stop Words

The stopWords feature for text dimensions now support using Word Sets instead of a delimited string of stop words. Use the new attribute stopWords-ref to identify the stored Word Set to use for stop words.

Behavioral Changes

ENGINE-319 Posting Empty Changeset Returns Status Code “not-modified” Rather Than an Error

Posting Empty Changeset Returns Status Code “not-modified” Rather Than an Error.

Bug Fixes

ENGINE-1 Addresses incompatibility issues with the Admin Tool when used with Microsoft Internet Explorer versions

Addresses incompatibility issues with the Admin Tool when used with Microsoft Internet Explorer versions

ENGINE-329 Shutdown message does not display in log

Shutdown message does not display in log.