One of the core features of the Discovery Search Engine is its ability to exploit the structure of your content, without which the engine would not understand what the content is. This structure is realized through the definition of each dimension by which the items in a dataset can be described. Defining a dimension consists of telling the engine both the dimension’s type and the location of the dimension’s value.
Dimensions are currently defined in an XML document that is pushed to the Discovery Search Engine using HTTP POST. We will discuss how to create a dimension definition document as well as how to push the file to the engine.
Importing a Dimension Definition to an Engine¶
Once the dimension definition has been created it can imported to the engine using the Admin Tool’s Dimensions Page. For more information on the Admin Tool, refer to Admin Tool Interface.
The engine also provides a web service to which a dimensions definition file can be posted. For more information on the engine web services, refer to Web Services.
The Admin Tool will indicate any progress in indexing once the dimension definition has been imported. The Indices Page of the admin tool will reflect the current information about how the changeset data is made available in the imported dimensions.
Exporting a Dimension Definition to an Engine¶
Once the dimension definition has been created it can exported to a file using the Admin Tool’s Dimensions Page.
The engine also provides a web service with which a dimensions definition file can be exported. For more information on the engine web services, refer to Web Services.
A dimension definition document has a dimensions element at its root, with zero or more dimension sub-elements. A dimension element contains an id attribute which references the name of the properties defined for each item in the changeset XML document.
The id attribute value can be used in queries.
The dimensions element serves as a container for all dimensions. It also
functions as a way to determine default values for attributes that apply to
all of its nested dimensions. For example, the default value for
true for all text dimensions. To change the default value to
declare the attribute on the dimensions element with that value.
For example, to set the distance unit for all queries to “kilometers”, set the
distanceUnit attribute on this element to “km”.
This is the element that defines a dimension.
An id is required and must be unique within the set of all dimension elements. The id is often used as the dimension name. The search Criteria specification refers to a dimension by its id.
The value of id is case-sensitive, regardless of the ignoreCase attribute of the dimension element.
The dimension element may contain one or more nested element elements.
A type determines the type of dimension to use. Each type may have optional or required attributes. For information on these specific attributes, read the description of the dimension type elements below.
<dimensions stemming="false"> <dimension id="freetext" type="text"> <dimension id="document_body" type="text" stemming="true"> <dimensions/>
key="<Changeset Data Property(s)>"
A key maps a changeset property (by name) to the index for the dimension. If key is not specified, then key is assumed to be the same value as id.
To combine include more than one changeset property into a single dimension index separate changeset property names with commas.
The value of key is case-sensitive, regardless of the ignoreCase attribute of the dimension element.
The key attribute is also valid for a
field element of a text type
<dimension id="tags" type="keyword" key="tags,user_tags"/>
The delimiters attribute specifies delimiting characters, used in the changeset data names by the key attribute, that the indexer will use to explode into individual values.
Only applicable for dimension types Keyword, Tree, Integer, Long, Double and Geoloc.
<dimension id="tags" type="keyword" delimiters=",-"/>
maxQueryClauses = "2048" .. versionadded:: 2.8.1
When the engine optimizes startsWith or other complex queries against text dimensions, memory requirements for the request may grow. To limit how much memory is available to optimizing queries, use this attribute. If the value is exceeded, the server will return HTTP status code Internal Error 500.
maxQueryClauses is used in combination with dimension attribute
minStartsWithLength. Generally, maxQueryClauses will be triggered for
startsWith queries of the fewest letters since they can expand to hundreds of
possible matches. A well-tuned index finds a balance between the lowest
minStartsWithLength and highest
To disable any limits, set maxQueryClauses to 0.
Warning: Removing any limits to maxQueryClauses could make predicting server-side memory requirements difficult because the engine will allocate whatever memory is required to satisfy the query criterion.
<dimensions maxQueryClauses="2048"> <dimension id="freetext" type="text"> <dimensions/>
ignoreCase="true"|"false" .. versionadded:: 2.7.0
When this attribute is true, all searches against dimension values are case-insensitive. This occurs by converting all indexed values to lower case letters using a Locale in the locale attribute.
The attributes of the dimensions element apply to all define dimensions unless overridden for a specific dimension. In that case, the dimension element attribute are used instead.
<dimensions ignoreCase="true"> <dimension id="freetext" type="text"> <dimensions/>
locale= “locale spec”
The engine supports non-US English collation and sorting. For US English users, no changes need to be made. For users with specific language requirements the engine supports specifying a locale for string comparisons and sorting.
<dimensions locale="Language"> <dimensions locale="Language_Country"> <dimensions locale="Language_Country_Variant">
For more information on how to specify a Locale, refer to Locale Specs.
<dimensions ignoreCase="true" locale="en"> </dimensions> <dimensions ignoreCase="true" locale="pt_BR"> </dimensions> <dimensions ignoreCase="true" locale="en_US_POSIX"> </dimensions>
reindexMethod="warmSwap"|"createDrop" .. versionadded:: 2.8.3
This attributes configures how the engine will recreate the dimension if the dimension has been changed. The default updating procedure for the engine is to drop the dimension first, then recreate it. This means that the dimension is available while it is being populated, even if it is not yet complete. Using warmSwap creates the new index offline. When that index is complete, it replaces the old dimension. This means that there must be enough memory to complete the indexing process.
warmSwap provides the best data availability.
<dimensions ignoreCase="true"> <dimension id="freetext" type="text" reindexMethod="warmSwap"> <dimensions/>
This is the element that defines a value of an integer, double, time, tree, mutex or ordered dimension type.
<element id="344" name="Comedy"/>
element element specifies a facet of the dimension and/or provides
mapping information from an coded
id to a self-documenting description.
The following dimension types may not specify elements: Geoloc, Keyword and Text.
This is the element that defines a field of a text type dimension.
<field id="title" />
field element specifies a field to create in a text dimension.