Tutorial

Overview

In this tutorial we’ll walk you through setting up the engine, pushing some sample data to it and performing queries against it.

Installation

To keep this tutorial simple we’ll describe the quickest way to install the engine. Note that we diverge from the recommended practice which keeps your data separate from the binary distribution. For detailed installation instructions please see Engine and Data Tool Installation for UNIX Variants.

  1. Unzip the distribution
  2. Get a command prompt inside the unzipped directory
  3. Run bin/server.sh to start the engine

If you are not running on a UNIX variant then you can invoke Java directly. See the contents of the server.sh script for details.

The output will look like this:

../_images/TutorialInstallScreenshot.png

You can verify that the engine is running by pointing your web browser at http://localhost:8090 which will display the following page.

../_images/TutorialStatusPage.png

Import Data

We’ll use a small example dataset for this tutorial:

id price weight shape tags
Square 14 11.2 square A, B, C
LightCircle 13 5.3 circle A
HeavyTriangle 12 22.4 triangle B, C
CheapTriangle 6 15.5 triangle A, B
CheapLightCircle 9 8.4 circle A, C
CheapHeavyCircle 3 22.3 circle C
ExpensiveSquare 24 15.4 square B
ExpensiveLightTriangle 29 2.1 triangle A, C
ExpensiveHeavySquare 21 27.8 square A, B, C

Please download the file tutorial_changeset.xml.

Upload this changeset XML file to the engine from the Changesets tab in the web interface.

../_images/TutorialUploadChangeset.png

Alternatively you can POST changeset XML from the command line.

$ curl -qs --header 'Content-Type: text/xml' --data-binary \
    @tutorial_changeset.xml http://localhost:8090/ws/changeset

Once uploaded you can see an entry on the Changesets page.

../_images/TutorialChangesetUploaded.png

Define indexes

We’ll define the following dimensions.

  • price An integer dimension. If something is less than 10 it is cheap, greater than 20 it is expensive, otherwise it is moderate.
  • weight A double dimension. If something is less than 10 it is light, greater than 20 it is heavy, otherwise it is normal.
  • shape A tree dimension.
  • tags A keyword dimension.

Please download the file tutorial_dimensions.xml.

Import this dimensions XML file to the engine from the Dimensions tab in the web interface.

../_images/TutorialUploadDimensions.png

Alternatively you can POST dimensions XML from the command line.

$ curl -qs --header 'Content-Type: text/xml' --data-binary tutorial_dimensions.xml \
    http://localhost:8090/ws/dimensions

Once uploaded you can see the definition on the Dimensions page.

../_images/TutorialDimensionsUploaded.png

Performing Queries

The engine has a built-in web interface that you can use to test queries. To access it go to the Applications tab and select the JSON application.

../_images/TutorialApplications.png

The JON application page looks like this:

../_images/TutorialJsonDefault.png

You’ll see that the page is divided into four sections. The top left allows you to use different web services; we’ll ignore this for now. The top right tells you the current state of the page. The bottom left contains an editable text area where you can enter your query and the bottom right contains the results of the last executed query.

When the page first loads a default query template exists in the Request section and the Response section contains the engine’s response when that query runs against the engine.

Congratulations, you’ve run your first query! Let’s try asking for something useful now.

Simple Searches

price is similar to 20

To search for a price similar to 20 enter the following JSON into the Request section.

{
  "criteria": [
    {
      "dimension": "price",
      "value": 20
    }
  ],
  "startIndex": 0,
  "pageSize": 10
}

Once you’ve entered the JSON tab out of the edit control or click outside of it to perform the search. The Response section will show actual engine response and a new Results section will appear above the Request and Response with the data shown in a simple grid.

../_images/TutorialPriceSimilar20Results.png

The engine response looks like this:

{
  "currentPageSize": 9,
  "datasetSize": 9,
  "exactMatches": [false, false, false, false, false, false, false, false, false],
  "exactSize": 0,
  "itemIds": ["ExpensiveHeavySquare", "ExpensiveSquare", "Square", "LightCircle",
              "HeavyTriangle", "ExpensiveLightTriangle", "CheapLightCircle",
              "CheapTriangle", "CheapHeavyCircle"],
  "relevanceValues": [0.9999999995343387, 0.9999999981373549, 0.9999999972060323,
                      0.999999996740371, 0.9999999962747097, 0.9999999958090484,
                      0.9999999948777258, 0.999999993480742, 0.9999999920837581],
  "startIndex": 0,
  "totalSize": 9
}

price is less than or equal to 9, cheapest first

Request:

{
  "criteria": [
    {
      "dimension": "price",
      "value": 0,
      "max": 9
    }
  ],
  "startIndex": 0,
  "pageSize": 10
}

Results:

../_images/TutorialPriceLessThan10CheapestFirst.png

Administration

Web-based Administration

The web-based administration application is quite powerful. It can be used to upload dimensions and changesets. It can also be used to temporarily turn on or turn off pull-based dimensions and changeset feeds.

Status page

Verify engine release version, memory footprint, any current activities running in the background.

Changesets

Examine previously applied changesets or upload changeset files. Gzip encoded files are also supported.

Dimensions

See what dimensions are defined or upload dimensions files.

Indices

See the distribution of the data across your dimensions. This can be used to verify quickly that the changesets and dimensions you defined result in expected categorization of your data.

Admin

This page is used to configure feed sources (where the engine pulls the data over HTTP) for dimensions and changeset XML data.

The checkpoint interval is how often any previously applied changesets are squashed into a single checkpoint XML file. This is used to quickly reset the engine’s state when it is restarted.

Note that any configuration changes made there take effect immediately but are not stored in the discovery.properties file. All configuration changes made through the web application are lost when the engine is shut down. The only way of making persistent configuration changes is to edit the discovery.properties file in the DISCOVERY_DIR.

Example files

changeset

This is a snippet of the example changeset, to run through this tutorial fully you should download the full file as described in Import Data.

<changeset>
  <set-item id="Square">
    <properties>
      <struct>
        <entry name="price"> <int>14</int> </entry>
        <entry name="weight"> <real>11.2</real> </entry>
        <entry name="shape"> <string>square</string> </entry>
        <entry name="tags">
          <array>
            <element> <string>A</string> </element>
            <element> <string>B</string> </element>
            <element> <string>C</string> </element>
          </array>
        </entry>
      </struct>
    </properties>
  </set-item>
  <!-- More set-items to add, update or remove items -->
</changeset>

dimensions

This is a snippet of the example dimensions, to run through this tutorial fully you should download the full file as described in Define indexes.

<dimensions>
  <dimension id='price' type='integer'>
    <element id='cheap' value='(,10)'/>
    <element id='moderate' value='[10,20]'/>
    <element id='expensive' value='(20,)'/>
  </dimension>
  <dimension id='weight' type='double'>
    <element id='light' value='(,10)'/>
    <element id='normal' value='[10,20]'/>
    <element id='heavy' value='(20,)'/>
    </dimension>
  <dimension id='shape' type='tree'>
    <element id='square'/>
    <element id='circle'/>
    <element id='triangle'/>
  </dimension>
  <dimension id='tags' type='keyword'/>
</dimensions>