Tutorial¶
Overview¶
In this tutorial we’ll walk you through setting up the engine, pushing some sample data to it and performing queries against it.
Installation¶
To keep this tutorial simple we’ll describe the quickest way to install the engine. Note that we diverge from the recommended practice which keeps your data separate from the binary distribution. For detailed installation instructions please see Engine and Data Tool Installation for UNIX Variants.
- Unzip the distribution
- Get a command prompt inside the unzipped directory
- Run bin/server.sh to start the engine
If you are not running on a UNIX variant then you can invoke Java directly. See the contents of the server.sh script for details.
The output will look like this:
You can verify that the engine is running by pointing your web browser at http://localhost:8090 which will display the following page.
Import Data¶
We’ll use a small example dataset for this tutorial:
id | price | weight | shape | tags |
---|---|---|---|---|
Square | 14 | 11.2 | square | A, B, C |
LightCircle | 13 | 5.3 | circle | A |
HeavyTriangle | 12 | 22.4 | triangle | B, C |
CheapTriangle | 6 | 15.5 | triangle | A, B |
CheapLightCircle | 9 | 8.4 | circle | A, C |
CheapHeavyCircle | 3 | 22.3 | circle | C |
ExpensiveSquare | 24 | 15.4 | square | B |
ExpensiveLightTriangle | 29 | 2.1 | triangle | A, C |
ExpensiveHeavySquare | 21 | 27.8 | square | A, B, C |
Please download the file tutorial_changeset.xml
.
Upload this changeset XML file to the engine from the Changesets tab in the web interface.
Alternatively you can POST changeset XML from the command line.
$ curl -qs --header 'Content-Type: text/xml' --data-binary \
@tutorial_changeset.xml http://localhost:8090/ws/changeset
Once uploaded you can see an entry on the Changesets page.
Define indexes¶
We’ll define the following dimensions.
- price An integer dimension. If something is less than 10 it is cheap, greater than 20 it is expensive, otherwise it is moderate.
- weight A double dimension. If something is less than 10 it is light, greater than 20 it is heavy, otherwise it is normal.
- shape A tree dimension.
- tags A keyword dimension.
Please download the file tutorial_dimensions.xml
.
Import this dimensions XML file to the engine from the Dimensions tab in the web interface.
Alternatively you can POST dimensions XML from the command line.
$ curl -qs --header 'Content-Type: text/xml' --data-binary tutorial_dimensions.xml \
http://localhost:8090/ws/dimensions
Once uploaded you can see the definition on the Dimensions page.
Performing Queries¶
The engine has a built-in web interface that you can use to test queries. To access it go to the Applications tab and select the JSON application.
The JON application page looks like this:
You’ll see that the page is divided into four sections. The top left allows you to use different web services; we’ll ignore this for now. The top right tells you the current state of the page. The bottom left contains an editable text area where you can enter your query and the bottom right contains the results of the last executed query.
When the page first loads a default query template exists in the Request section and the Response section contains the engine’s response when that query runs against the engine.
Congratulations, you’ve run your first query! Let’s try asking for something useful now.
Simple Searches¶
price is similar to 20¶
To search for a price similar to 20 enter the following JSON into the Request section.
{
"criteria": [
{
"dimension": "price",
"value": 20
}
],
"startIndex": 0,
"pageSize": 10
}
Once you’ve entered the JSON tab out of the edit control or click outside of it to perform the search. The Response section will show actual engine response and a new Results section will appear above the Request and Response with the data shown in a simple grid.
The engine response looks like this:
{
"availableSize": 9,
"currentPageSize": 9,
"datasetSize": 9,
"exactMatches": [false, false, false, false, false, false, false, false, false],
"exactSize": 0,
"itemIds": ["ExpensiveHeavySquare", "ExpensiveSquare", "Square", "LightCircle",
"HeavyTriangle", "ExpensiveLightTriangle", "CheapLightCircle",
"CheapTriangle", "CheapHeavyCircle"],
"relevanceValues": [0.9999999995343387, 0.9999999981373549, 0.9999999972060323,
0.999999996740371, 0.9999999962747097, 0.9999999958090484,
0.9999999948777258, 0.999999993480742, 0.9999999920837581],
"startIndex": 0,
"totalSize": 9
}
price is less than or equal to 9, cheapest first¶
Request:
{
"criteria": [
{
"dimension": "price",
"value": 0,
"max": 9
}
],
"startIndex": 0,
"pageSize": 10
}
Results:
Faceted Search¶
Facet search (drilldown counts) will tell you how many items are available for each well-defined choice under a dimension.
If you don’t pass in any criteria it is calculated over the indexed dataset, otherwise it is calculated over the result of the query.
facet search on price¶
Request:
{
"drillDown": [
{
"dimension": "price"
}
],
"startIndex": 0,
"pageSize": 10
}
Response:
{
"datasetSize": 9,
"drillDown": [
{
"dimension": "price",
"exactCounts": [3, 3, 3],
"fuzzyCounts": [3, 3, 3
],
"ids": ["cheap", "moderate", "expensive"]
}
]
}
You’ll see we have 3 items that are cheap, expensive or moderate.
The fuzzyCounts are calculated over the entire result set and the exactCounts over the part of the result set which are exact matches to your query only.
price is less than or equal to 9, cheapest first, drill down on tags¶
Request:
{
"criteria": [
{
"dimension": "price",
"value": 0,
"max": 9
}
],
"drillDown": [
{
"dimension": "tags"
}
],
"startIndex": 0,
"pageSize": 10
}
Response:
{
"availableSize": 3,
"currentPageSize": 3,
"datasetSize": 9,
"drillDown": [
{
"dimension": "tags",
"exactCounts": [0, 0, 0],
"fuzzyCounts": [2, 1, 2],
"ids": ["A", "B", "C"]
}
],
"exactMatches": [false, false, false],
"exactSize": 0,
"itemIds": ["CheapHeavyCircle", "CheapTriangle", "CheapLightCircle"],
"relevanceValues": [0.9999999986030161, 0.9999999972060323, 0.9999999958090484],
"startIndex": 0,
"totalSize": 3
}
Administration¶
Web-based Administration¶
The web-based administration application is quite powerful. It can be used to upload dimensions and changesets. It can also be used to temporarily turn on or turn off pull-based dimensions and changeset feeds.
Status page¶
Verify engine release version, memory footprint, any current activities running in the background.
Changesets¶
Examine previously applied changesets or upload changeset files. Gzip encoded files are also supported.
Dimensions¶
See what dimensions are defined or upload dimensions files.
Indices¶
See the distribution of the data across your dimensions. This can be used to verify quickly that the changesets and dimensions you defined result in expected categorization of your data.
Admin¶
This page is used to configure feed sources (where the engine pulls the data over HTTP) for dimensions and changeset XML data.
The checkpoint interval is how often any previously applied changesets are squashed into a single checkpoint XML file. This is used to quickly reset the engine’s state when it is restarted.
Note that any configuration changes made there take effect immediately but are not stored in the discovery.properties file. All configuration changes made through the web application are lost when the engine is shut down. The only way of making persistent configuration changes is to edit the discovery.properties file in the DISCOVERY_DIR.
Example files¶
changeset¶
This is a snippet of the example changeset, to run through this tutorial fully you should download the full file as described in Import Data.
<changeset>
<set-item id="Square">
<properties>
<struct>
<entry name="price"> <int>14</int> </entry>
<entry name="weight"> <real>11.2</real> </entry>
<entry name="shape"> <string>square</string> </entry>
<entry name="tags">
<array>
<element> <string>A</string> </element>
<element> <string>B</string> </element>
<element> <string>C</string> </element>
</array>
</entry>
</struct>
</properties>
</set-item>
<!-- More set-items to add, update or remove items -->
</changeset>
dimensions¶
This is a snippet of the example dimensions, to run through this tutorial fully you should download the full file as described in Define indexes.
<dimensions>
<dimension id='price' type='integer'>
<element id='cheap' value='(,10)'/>
<element id='moderate' value='[10,20]'/>
<element id='expensive' value='(20,)'/>
</dimension>
<dimension id='weight' type='double'>
<element id='light' value='(,10)'/>
<element id='normal' value='[10,20]'/>
<element id='heavy' value='(20,)'/>
</dimension>
<dimension id='shape' type='tree'>
<element id='square'/>
<element id='circle'/>
<element id='triangle'/>
</dimension>
<dimension id='tags' type='keyword'/>
</dimensions>