Defining Synonyms/Thesauruses

Synonym dictionaries are created as part of the dimensions XML document.

New in version 2.9.

Synonym dictionaries are defined as elements within the dimensions.xml file and then referred to in dimension definitions using a reference. The list of entries in the are defined using a thesaurus definition are analyzed using the analyzer rules for the dimension to which they apply, so punctuation, whitespace and other extra characters will be automatically removed or words broken up at punctuation marks.

Thesauruses and dimensions can appear in any order in the dimensions.xml file.

This example creates a simple synonym dictionary for sample US states:

<thesaurus id="us_states">
  <lookup value="OR">oregon or ore</lookup>
  <lookup value="NY">"new york" ny nys</lookup>
  <lookup value="NC">"north carolina" nc</lookup>
</thesaurus>

The lookup id is the key that the engine will use for synonym reduction. It can be defined as a single word or phrase. There should be one lookup for each list of synonyms. The synonyms are defined as a set of words. If the synonym is a phrase, then enclose the words in double quotes.

To use a synonym dictionary in a text type dimension, refer to the thesaurus using the appropriate reference attribute, for example synonyms-ref.

Creating Merged Synonym Dictionaries

Synonym dictionaries can be created in manageable blocks and merged with other synonym dictionaries before they are used on a dimension. There are several variants that you can use to refer to an individual synonym dictionary or merged synonym dictionaries.

When synonym dictionaries are merged,two lookups of with the same id will result in a single lookup including the merged unique list of synonyms.

Variant 1: Synonym Dictionary Reference with Dimension

This variant also is the easiest way to reuse a synonym dictionary.

<dimension id="example" synonyms-ref="mySynonyms"/>
<thesaurus id="mySynonyms">
  <lookup value="st">Saint Sante, Ste, Santa</lookup>
</thesaurus>

Variant 2: Multiple Synonym Dictionary References with Dimension

This variant allows multiple pre-defined synonym dictionaries to be merged by listing the necessary thesaurus ids on the dimension declaration.

<dimension id="example" synonyms-ref="mySynonyms,states"/>
<thesaurus id="mySynonyms">
  <lookup value="st">Saint Sante, Ste, Santa</lookup>
</thesaurus>
<thesaurus id="states">
  <lookup value="OR">Oregon Ore</lookup>
  <lookup value="ME">Maine</lookup>
</thesaurus>

Variant 3: Synonym Dictionaries Referring to Thesauruses with Dimension

This variant allows you to create a new synonym dictionary by merging the words from one or more synonym dictionaries. It also allows you to add additional synonyms to the merged synonym dictionaries.

<dimension id="example" synonyms-ref="combinedSynonyms"/>
<thesaurus id="combinedSynonyms" thesaurus-ref="mySynonyms,states">
  <lookup value="FL">Florida Fla</lookup>
  <lookup value="AK">Alaska</lookup>
</thesaurus>
<thesaurus id="mySynonyms">
  <lookup value="st">Saint Sante, Ste, Santa</lookup>
</thesaurus>
<thesaurus id="states">
  <lookup value="OR">Oregon Ore</lookup>
  <lookup value="ME">Maine</lookup>
</thesaurus>