Examples¶

The data used to generate the following examples can be downloaded here: Example Real Estate Dataset.

Querying geoloc dimensions¶

The Discovery Search Engine supports searching geographic data using both zip codes and latitude/longitude. Changeset entries typically encode geographic data as latitude and longitude values. The Discovery Search Engine also supports using zipcodes.

If you have latitude and longitude values, the changeset entry would look this way:

<entry name="myLat">
    <string>40.6949005</string>
</entry>
<entry name="myLong">
    <string>-74.270401</string>
</entry>

If you have an entry with a zip code, you would do the following in the changeset file:

<entry name="myZipcode"><string>10072</string></entry>

When defining a geoloc dimension, you associate the name of the entry with the type of geographic data it represents. For example, if the changeset uses zipcode data, like the previous example, then the dimension file would be configured like so:

<dimension id="location" zipcode="myZipcode" type="geoloc" />

Likewise, if the changeset uses latitude and longitude data, the dimensions file would look like the following:

<dimension id="location" latitude="myLat" longitude="myLong" type="geoloc" />

If you do not specify names for zip code or latitude/longitude, the Discovery Engine defaults to naming them “zipcode”, “latitude” and “longitude”. This means the simplest definition for a geoloc dimension is as follows:

<dimension id="location" type="geoloc"></dimension>

The sample dataset contains zip codes for 50 real estate listings. A basic query for all listings in the 10072 zip code looks like this:

{
    "criteria": [
        {
            "dimension": "location",
            "zipcode": "10072"
        }
    ],
    "startIndex": 0,
    "pageSize": 5
}

There are two such listings in the dataset, which is reflected in the response from the engine.

{
    "itemIds": ["41", "44", "45", "46", "47"],
    "exactMatches": [true, true, true, true, true],
    "relevanceValues": [1.0, 1.0, 1.0, 1.0, 1.0],
    "pageSize": 5,
    "currentPageSize": 5,
    "startIndex": 0,
    "exactSize": 8,
    "totalSize": 50,
    "datasetSize": 50}

exactDistance¶

The exactDistance setting determines how the Discovery Search Engine decides an exact match. The setting defines a radius, in miles, from the location being searched. All items whose location falls within that radius are considered exact matches.

In this example, we want to consider all items within 1 mile of the location we were searching for to be considered exact.

{
    "criteria": [
        {
            "dimension": "location",
            "zipcode": "10072",
            "exactDistance": 1
        }
    ],
    "startIndex": 0,
    "pageSize": 5
}

In the first example, only the first two listings were considered exact matches, but you can see in the following response that now all of the listings are considered exact. Everything else has remained the same.

{
    "itemIds": ["41", "44", "45", "46", "47"],
    "exactMatches": [true, true, true, true, true],
    "relevanceValues": [1.0, 1.0, 1.0, 1.0, 1.0],
    "pageSize": 5,
    "currentPageSize": 5,
    "startIndex": 0,
    "exactSize": 11,
    "totalSize": 50,
    "datasetSize": 50}

normalDistance¶

The normalDistance setting tells the Discovery Search Engine how to calculate the relevance scores of listings that fall outside a given radius. Just like the exactDistance setting, a value is set that defines a radius, in miles, around the location being searched.

{
    "criteria": [
        {
            "dimension": "location",
            "zipcode": "10072",
            "normalDistance": 0.5
        }
    ],
    "startIndex": 0,
    "pageSize": 5
}

The response shows a result set very similar to the first example, but in this case, the relevance scores are much lower for those listings whose location falls outside of the radius defined in the query.

{
    "itemIds": ["41", "44", "45", "46", "47"],
    "exactMatches": [true, true, true, true, true],
    "relevanceValues": [1.0, 1.0, 1.0, 1.0, 1.0],
    "pageSize": 5,
    "currentPageSize": 5,
    "startIndex": 0,
    "exactSize": 8,
    "totalSize": 50,
    "datasetSize": 50}

cullDistance¶

Setting the cullDistance in the query establishes a radius beyond which all listings are culled, or removed from the result set.

{
    "criteria": [
        {
            "dimension": "location",
            "zipcode": "10072",
            "cullDistance": 0.5
        }
    ],
    "startIndex": 0,
    "pageSize": 5
}

The response from this search looks similar to the first example in terms of exact matches and relevance scores. The big difference is with the totalSize in the result. In this particular example, there are only six listings within .5 miles of the 10072 zip code and those are the only items returned by the Discovery Search Engine.

{
    "itemIds": ["41", "44", "45", "46", "47"],
    "exactMatches": [true, true, true, true, true],
    "relevanceValues": [1.0, 1.0, 1.0, 1.0, 1.0],
    "pageSize": 5,
    "currentPageSize": 5,
    "startIndex": 0,
    "exactSize": 8,
    "totalSize": 10,
    "datasetSize": 50}

Querying mutex dimensions¶

Mutex dimensions are for values that are mutually exclusive of each other. In the example dataset, the type dimension is a mutex dimension which means that a listing can be either a rental or a sales property, but never both.

{
    "criteria": [
        {
            "dimension": "type",
            "id": "rentals"
        }
    ],
    "startIndex": 0,
    "pageSize": 5
}

Half the listings in the example dataset are rentals and the other half are sales, so the search for rentals returns a result set whose size is 25.

{
    "itemIds": ["26", "27", "28", "29", "30"],
    "exactMatches": [true, true, true, true, true],
    "relevanceValues": [1.0, 1.0, 1.0, 1.0, 1.0],
    "pageSize": 5,
    "currentPageSize": 5,
    "startIndex": 0,
    "exactSize": 25,
    "totalSize": 25,
    "datasetSize": 50}

Querying tree dimensions¶

Tree dimensions represent hierarchical data structures. The example dataset defines a tree dimensions named style that represents a categories of architectural styles.

    <dimension type="tree" id="style">
        <element id="multi-family">
            <element id="apartment"/>
            <element id="condo"/>
            <element id="co-op"/>
            <element id="townhome"/>
        </element>
        <element id="single-family">
            <element id="colonial">
                <element id="new england" />
                <element id="cape cod" />
            </element>
            <element id="classical">
                <element id="federal" />
                <element id="greek revival" />
                <element id="tidewater" />
                <element id="antebellum" />
            </element>
            <element id="victorian">
                <element id="gothic" />
                <element id="second empire" />
                <element id="queen anne" />
            </element>
            <element id="contemporary">
                <element id="ranch"/>
                <element id="raised ranch" />
                <element id="split-level" />
                <element id="bauhaus" />
                <element id="art moderne" />
                <element id="transitional" />
            </element>
        </element>
    </dimension>

In this example, a query for single-family homes is executed.

{
    "criteria": [
        {
            "dimension": "style",
            "id": "single-family"
        }
    ],
    "startIndex": 0,
    "pageSize": 5
}

There are two properties in the example dataset that are specifically identified as single-family homes, and you can see in the response that these two properties are listed first. Notice that all matches are exact, but only the first two listings have a relevance score of 1.0.

{
    "itemIds": ["31", "6", "10", "15", "19"],
    "exactMatches": [true, true, true, true, true],
    "relevanceValues": [1.0, 1.0, 0.99898016, 0.99898016, 0.99898016],
    "pageSize": 5,
    "currentPageSize": 5,
    "startIndex": 0,
    "exactSize": 40,
    "totalSize": 50,
    "datasetSize": 50}

The relevance score is calculated according to each listing’s proximity to the style searched for. After listings whose style is single-family, listings of styles that are children of single-family are listed.

ID Style

31 single-family

6 single-family

10 classical

15 victorian

19 contemporary

In the first example, single-family is a parent node, at the top of the tree. The next query searches for homes whose style is federal, a style much lower in the hierarchy.

{
    "criteria": [
        {
            "dimension": "style",
            "id": "federal"
        }
    ],
    "startIndex": 0,
    "pageSize": 5
}

The response differs from the first example because now only the first two listings are considered exact matches, and the relevance scores are significantly lower for the remaining items.

{
    "itemIds": ["11", "36", "10", "35", "12"],
    "exactMatches": [true, true, false, false, false],
    "relevanceValues": [1.0, 1.0, 0.22472607, 0.22472607, 0.2245151],
    "pageSize": 5,
    "currentPageSize": 5,
    "startIndex": 0,
    "exactSize": 2,
    "totalSize": 50,
    "datasetSize": 50}

The following table shows the styles of the items in the response. The reason they are not considered exact matches is because the listings that follow the first two have styles that are either parents of or siblings to the federal style. In the first example, the listings that followed were children of single-family.

ID Style

11 federal

36 federal

10 classical

35 classical

12 greek

In the final example, we will search for homes whose style is cape cod. The query also establishes a page size of 10 in order to retrieve a longer list of results in order to illustrate how the engine traverses the tree when calculating relevance scores.

{
    "criteria": [
        {
            "dimension": "style",
            "id": "cape cod"
        }
    ],
    "startIndex": 0,
    "pageSize": 10
}

The response:

{
    "itemIds": ["34", "9", "32", "7", "33", "8", "31", "6", "10", "15"],
    "exactMatches": [true, true, false, false, false, false, false, false, false, false],
    "relevanceValues": [1.0, 1.0, 0.22472607, 0.22472607, 0.2245151, 0.2245151, 0.049330108, 0.049330108, 0.048394997, 0.048394997],
    "pageSize": 10,
    "currentPageSize": 10,
    "startIndex": 0,
    "exactSize": 2,
    "totalSize": 50,
    "datasetSize": 50}

You can see from the following table that the Discovery Search Engine first returns exact matches – those homes whose style is cape cod, following by matches of the direct parent of the cape cod style, colonial. This is followed by the sibling new england. Finally, once all the siblings have been returned, then listings whose style is single-family are returned, the style that is the direct parent of colonial.

ID Style

34 cape cod

9 cape cod

32 colonial

7 colonial

33 new england

8 new england

31 single-family

6 single-family

10 classical

15 victorian

Weight¶

The weight of a criterion is used to calculate the relevance value of a listing relative to that criterion. In the following example, a query is being executed for a three bedroom home whose style is a townhome.

{
    "criteria": [
        {
            "dimension": "bedroom",
            "value": 3
        },
        {
            "dimension": "style",
            "id": "townhome"
        }
    ],
    "startIndex": 0,
    "pageSize": 5
}

The response indicates that there are no listings that are exact matches. There are two items whose style is townhome, but neither of them have three bedrooms. Therefore all of the relevance scores are less than 1.0.

{
    "itemIds": ["30", "5", "1", "26", "2"],
    "exactMatches": [false, false, false, false, false],
    "relevanceValues": [0.9999995, 0.9999995, 0.6097551, 0.6097551, 0.60964316],
    "pageSize": 5,
    "currentPageSize": 5,
    "startIndex": 0,
    "exactSize": 0,
    "totalSize": 50,
    "datasetSize": 50}

In the next example, the relevance score for the style of the listing is lowered from the default value of 1.0 to be 0.5. This means that whether a listing is a townhome or not is not as important as whether it should have three bedrooms.

{
    "criteria": [
        {
            "dimension": "bedroom",
            "value": 3
        },
        {
            "dimension": "style",
            "id": "townhome",
            "weight": 0.5
        }
    ],
    "startIndex": 0,
    "pageSize": 5
}

The response includes the same list of items as the previous search held, but the relevance scores are higher. This is because the second query says that whether a listing is a townhome or not is less important. As a consequence, the relevance scores for townhomes are higher than they were in the original query.

{
    "itemIds": ["30", "5", "1", "26", "2"],
    "exactMatches": [false, false, false, false, false],
    "relevanceValues": [0.99999934, 0.99999934, 0.73983604, 0.73983604, 0.7397614],
    "pageSize": 5,
    "currentPageSize": 5,
    "startIndex": 0,
    "exactSize": 0,
    "totalSize": 50,
    "datasetSize": 50}

Querying with drill down counts¶

{
    "criteria": [
        {
            "dimension": "bedroom",
            "value": 3
        }
    ],
    "drillDown": [
        {
            "dimension": "bedroom"
        }
    ],
    "startIndex": 0,
    "pageSize": 10
}

{
    "itemIds": ["10", "11", "12", "13", "14", "15", "16", "17", "18", "19"],
    "exactMatches": [true, true, true, true, true, true, true, true, true, true],
    "relevanceValues": [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0],
    "pageSize": 10,
    "currentPageSize": 10,
    "startIndex": 0,
    "exactSize": 38,
    "totalSize": 50,
    "drillDown": [
        {
            "dimension": "bedroom",
            "ids": ["studio", "1", "2", "3", "4", "5+"],
            "exactCounts": [0, 8, 2, 38, 2, 0],
            "fuzzyCounts": [0, 8, 2, 38, 2, 0]}],
    "datasetSize": 50}

Rendering parameters¶

When a search is executed against the Discovery Search Engine, a list of IDs are returned. In a typical integration, these IDs are often used to generate a second HTTP request to retrieve the data associated with the ID. The Discovery Engine’s renderParameters setting is designed to simplify the second HTTP request by generating a query string that can be used in an HTTP GET request.

The first example shows a request that specifies the use of a ‘,’ delimiter to separate the list of itemIds.

{
    "criteria": [
        {
            "dimension": "price",
            "value": 200000
        }
    ],
    "renderParameters": {
        "itemIdsDelimiter": ","
    },
    "startIndex": 0,
    "pageSize": 5
}

In the response, the renderParameters value returns a string in which the itemIds are listed, delimited by a comma, which, when url encoded is represented %2C.

{
    "itemIds": ["1", "2", "3", "4", "5"],
    "exactMatches": [true, true, true, true, false],
    "relevanceValues": [1.0, 1.0, 1.0, 1.0, 0.95],
    "pageSize": 5,
    "currentPageSize": 5,
    "startIndex": 0,
    "exactSize": 4,
    "totalSize": 25,
    "datasetSize": 50,
    "renderParameters": "startIndex=0&pageSize=5&exactSize=4&totalSize=25&itemIds=1%2C2%2C3%2C4%2C5&exactMatches=11110"}

The second example shows a typical query that relies on the default delimiter, which is a space character.

{
    "criteria": [
        {
            "dimension": "price",
            "value": 200000
        }
    ],
    "renderParameters": true,
    "startIndex": 0,
    "pageSize": 5
}

Once url encoded, the space character is rendered as +.

{
    "itemIds": ["1", "2", "3", "4", "5"],
    "exactMatches": [true, true, true, true, false],
    "relevanceValues": [1.0, 1.0, 1.0, 1.0, 0.95],
    "pageSize": 5,
    "currentPageSize": 5,
    "startIndex": 0,
    "exactSize": 4,
    "totalSize": 25,
    "datasetSize": 50,
    "renderParameters": "startIndex=0&pageSize=5&exactSize=4&totalSize=25&itemIds=1+2+3+4+5&exactMatches=11110"}