Examples¶
The data used to generate the following examples can be downloaded here: Example Real Estate Dataset.
Querying geoloc dimensions¶
The Discovery Search Engine supports searching geographic data using both zip codes and latitude/longitude. Changeset entries typically encode geographic data as latitude and longitude values. The Discovery Search Engine also supports using zipcodes.
If you have latitude and longitude values, the changeset entry would look this way:
<entry name="myLat"> <string>40.6949005</string> </entry> <entry name="myLong"> <string>-74.270401</string> </entry>
If you have an entry with a zip code, you would do the following in the changeset file:
<entry name="myZipcode"><string>10072</string></entry>
When defining a geoloc
dimension, you associate the name of the entry with
the type of geographic data it represents. For example, if the changeset uses
zipcode data, like the previous example, then the dimension file would be
configured like so:
<dimension id="location" zipcode="myZipcode" type="geoloc" />
Likewise, if the changeset uses latitude and longitude data, the dimensions file would look like the following:
<dimension id="location" latitude="myLat" longitude="myLong" type="geoloc" />
If you do not specify names for zip code or latitude/longitude, the Discovery
Engine defaults to naming them “zipcode”, “latitude” and “longitude”.
This means the simplest definition for a geoloc
dimension is as follows:
<dimension id="location" type="geoloc"></dimension>
The sample dataset contains zip codes for 50 real estate listings. A basic query
for all listings in the 10072
zip code looks like this:
{ "criteria": [ { "dimension": "location", "zipcode": "10072" } ], "startIndex": 0, "pageSize": 5 }
There are two such listings in the dataset, which is reflected in the response from the engine.
{ "itemIds": ["41", "44", "45", "46", "47"], "exactMatches": [true, true, true, true, true], "relevanceValues": [1.0, 1.0, 1.0, 1.0, 1.0], "pageSize": 5, "currentPageSize": 5, "startIndex": 0, "exactSize": 8, "totalSize": 50, "datasetSize": 50}
exactDistance¶
The exactDistance
setting determines how the Discovery Search Engine decides an
exact match. The setting defines a radius, in miles, from the location being
searched. All items whose location falls within that radius are considered exact
matches.
In this example, we want to consider all items within 1 mile of the location we were searching for to be considered exact.
{ "criteria": [ { "dimension": "location", "zipcode": "10072", "exactDistance": 1 } ], "startIndex": 0, "pageSize": 5 }
In the first example, only the first two listings were considered exact matches, but you can see in the following response that now all of the listings are considered exact. Everything else has remained the same.
{ "itemIds": ["41", "44", "45", "46", "47"], "exactMatches": [true, true, true, true, true], "relevanceValues": [1.0, 1.0, 1.0, 1.0, 1.0], "pageSize": 5, "currentPageSize": 5, "startIndex": 0, "exactSize": 11, "totalSize": 50, "datasetSize": 50}
normalDistance¶
The normalDistance
setting tells the Discovery Search Engine how to calculate the
relevance scores of listings that fall outside a given radius. Just like the
exactDistance
setting, a value is set that defines a radius, in miles,
around the location being searched.
{ "criteria": [ { "dimension": "location", "zipcode": "10072", "normalDistance": 0.5 } ], "startIndex": 0, "pageSize": 5 }
The response shows a result set very similar to the first example, but in this case, the relevance scores are much lower for those listings whose location falls outside of the radius defined in the query.
{ "itemIds": ["41", "44", "45", "46", "47"], "exactMatches": [true, true, true, true, true], "relevanceValues": [1.0, 1.0, 1.0, 1.0, 1.0], "pageSize": 5, "currentPageSize": 5, "startIndex": 0, "exactSize": 8, "totalSize": 50, "datasetSize": 50}
cullDistance¶
Setting the cullDistance
in the query establishes a radius beyond which all
listings are culled, or removed from the result set.
{
"criteria": [
{
"dimension": "location",
"zipcode": "10072",
"cullDistance": 0.5
}
],
"startIndex": 0,
"pageSize": 5
}
The response from this search looks similar to the first example in terms of
exact matches and relevance scores. The big difference is with the
totalSize
in the result. In this particular
example, there are only six listings within .5 miles of the 10072 zip code and
those are the only items returned by the Discovery Search Engine.
{
"itemIds": ["41", "44", "45", "46", "47"],
"exactMatches": [true, true, true, true, true],
"relevanceValues": [1.0, 1.0, 1.0, 1.0, 1.0],
"pageSize": 5,
"currentPageSize": 5,
"startIndex": 0,
"exactSize": 8,
"totalSize": 10,
"datasetSize": 50}
Querying mutex dimensions¶
Mutex dimensions are for values that are mutually exclusive of each other. In
the example dataset, the type
dimension is a mutex dimension which means
that a listing can be either a rental
or a sales
property, but never
both.
{ "criteria": [ { "dimension": "type", "id": "rentals" } ], "startIndex": 0, "pageSize": 5 }
Half the listings in the example dataset are rentals
and the other half are
sales
, so the search for rentals
returns a result set whose size is 25.
{ "itemIds": ["26", "27", "28", "29", "30"], "exactMatches": [true, true, true, true, true], "relevanceValues": [1.0, 1.0, 1.0, 1.0, 1.0], "pageSize": 5, "currentPageSize": 5, "startIndex": 0, "exactSize": 25, "totalSize": 25, "datasetSize": 50}
Querying tree dimensions¶
Tree dimensions represent hierarchical data structures. The example dataset defines a tree dimensions named style that represents a categories of architectural styles.
<dimension type="tree" id="style"> <element id="multi-family"> <element id="apartment"/> <element id="condo"/> <element id="co-op"/> <element id="townhome"/> </element> <element id="single-family"> <element id="colonial"> <element id="new england" /> <element id="cape cod" /> </element> <element id="classical"> <element id="federal" /> <element id="greek revival" /> <element id="tidewater" /> <element id="antebellum" /> </element> <element id="victorian"> <element id="gothic" /> <element id="second empire" /> <element id="queen anne" /> </element> <element id="contemporary"> <element id="ranch"/> <element id="raised ranch" /> <element id="split-level" /> <element id="bauhaus" /> <element id="art moderne" /> <element id="transitional" /> </element> </element> </dimension>
In this example, a query for single-family
homes is executed.
{
"criteria": [
{
"dimension": "style",
"id": "single-family"
}
],
"startIndex": 0,
"pageSize": 5
}
There are two properties in the example dataset that are specifically identified
as single-family homes, and you can see in the
response that these two properties are listed first. Notice that all matches are
exact
, but only the first two listings have a relevance score of 1.0.
{
"itemIds": ["31", "6", "10", "15", "19"],
"exactMatches": [true, true, true, true, true],
"relevanceValues": [1.0, 1.0, 0.99898016, 0.99898016, 0.99898016],
"pageSize": 5,
"currentPageSize": 5,
"startIndex": 0,
"exactSize": 40,
"totalSize": 50,
"datasetSize": 50}
The relevance score is calculated according to each listing’s proximity to the
style searched for. After listings whose style is
single-family
, listings of styles that are children of single-family
are listed.
ID Style 31 single-family 6 single-family 10 classical 15 victorian 19 contemporary
In the first example, single-family
is a parent node, at the top of the
tree. The next query searches for homes whose style is federal
, a style much
lower in the hierarchy.
{
"criteria": [
{
"dimension": "style",
"id": "federal"
}
],
"startIndex": 0,
"pageSize": 5
}
The response differs from the first example because now only the first two listings are considered exact matches, and the relevance scores are significantly lower for the remaining items.
{
"itemIds": ["11", "36", "10", "35", "12"],
"exactMatches": [true, true, false, false, false],
"relevanceValues": [1.0, 1.0, 0.22472607, 0.22472607, 0.2245151],
"pageSize": 5,
"currentPageSize": 5,
"startIndex": 0,
"exactSize": 2,
"totalSize": 50,
"datasetSize": 50}
The following table shows the styles of the items in the response. The reason
they are not considered exact matches is because the listings that follow the
first two have styles that are either parents of or siblings to the federal
style. In the first example, the listings that followed were children of
single-family
.
ID Style 11 federal 36 federal 10 classical 35 classical 12 greek
In the final example, we will search for homes whose style is cape cod
. The
query also establishes a page size of 10 in order to retrieve a longer list of
results in order to illustrate how the engine traverses the tree when
calculating relevance scores.
{ "criteria": [ { "dimension": "style", "id": "cape cod" } ], "startIndex": 0, "pageSize": 10 }
The response:
{ "itemIds": ["34", "9", "32", "7", "33", "8", "31", "6", "10", "15"], "exactMatches": [true, true, false, false, false, false, false, false, false, false], "relevanceValues": [1.0, 1.0, 0.22472607, 0.22472607, 0.2245151, 0.2245151, 0.049330108, 0.049330108, 0.048394997, 0.048394997], "pageSize": 10, "currentPageSize": 10, "startIndex": 0, "exactSize": 2, "totalSize": 50, "datasetSize": 50}
You can see from the following table that the Discovery Search Engine first returns
exact matches – those homes whose style is cape cod
, following by matches
of the direct parent of the cape cod
style, colonial
. This is followed
by the sibling new england
. Finally, once all the siblings have been
returned, then listings whose style is single-family
are returned, the style
that is the direct parent of colonial
.
ID Style 34 cape cod 9 cape cod 32 colonial 7 colonial 33 new england 8 new england 31 single-family 6 single-family 10 classical 15 victorian
Weight¶
The weight
of a criterion is used to calculate the relevance value of a
listing relative to that criterion. In the following example, a query is being
executed for a three bedroom home whose style is a townhome
.
{ "criteria": [ { "dimension": "bedroom", "value": 3 }, { "dimension": "style", "id": "townhome" } ], "startIndex": 0, "pageSize": 5 }
The response indicates that there are no listings that are exact matches. There
are two items whose style is townhome
, but neither of them have three
bedrooms. Therefore all of the relevance scores are less than 1.0
.
{ "itemIds": ["30", "5", "1", "26", "2"], "exactMatches": [false, false, false, false, false], "relevanceValues": [0.9999995, 0.9999995, 0.6097551, 0.6097551, 0.60964316], "pageSize": 5, "currentPageSize": 5, "startIndex": 0, "exactSize": 0, "totalSize": 50, "datasetSize": 50}
In the next example, the relevance score for the style of the listing is lowered
from the default value of 1.0
to be 0.5
. This means that whether a
listing is a townhome
or not is not as important as whether it should have
three bedrooms.
{ "criteria": [ { "dimension": "bedroom", "value": 3 }, { "dimension": "style", "id": "townhome", "weight": 0.5 } ], "startIndex": 0, "pageSize": 5 }
The response includes the same list of items as the previous search held, but
the relevance scores are higher. This is because the second
query says that whether a listing is a townhome
or not is less important.
As a consequence, the relevance scores for townhomes are higher than they were
in the original query.
{ "itemIds": ["30", "5", "1", "26", "2"], "exactMatches": [false, false, false, false, false], "relevanceValues": [0.9999994, 0.9999994, 0.73983604, 0.73983604, 0.7397614], "pageSize": 5, "currentPageSize": 5, "startIndex": 0, "exactSize": 0, "totalSize": 50, "datasetSize": 50}
Querying with drill down counts¶
{ "criteria": [ { "dimension": "bedroom", "value": 3 } ], "drillDown": [ { "dimension": "bedroom" } ], "startIndex": 0, "pageSize": 10 }{ "itemIds": ["10", "11", "12", "13", "14", "15", "16", "17", "18", "19"], "exactMatches": [true, true, true, true, true, true, true, true, true, true], "relevanceValues": [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0], "pageSize": 10, "currentPageSize": 10, "startIndex": 0, "exactSize": 38, "totalSize": 50, "drillDown": [ { "dimension": "bedroom", "ids": ["studio", "1", "2", "3", "4", "5+"], "exactCounts": [0, 8, 2, 38, 2, 0], "fuzzyCounts": [0, 8, 2, 38, 2, 0]}], "datasetSize": 50}
Rendering parameters¶
When a search is executed against the Discovery Search Engine, a list of IDs are
returned. In a typical integration, these IDs are often used to generate a
second HTTP request to retrieve the data associated with the ID. The Discovery
Engine’s renderParameters
setting is designed to simplify the second HTTP
request by generating a query string that can be used in an HTTP GET
request.
The first example shows a request that specifies the use of a ‘,’ delimiter to
separate the list of itemIds
.
{ "criteria": [ { "dimension": "price", "value": 200000 } ], "renderParameters": { "itemIdsDelimiter": "," }, "startIndex": 0, "pageSize": 5 }
In the response, the renderParameters
value returns a string in which the
itemIds
are listed, delimited by a comma, which, when url encoded is
represented %2C
.
{ "itemIds": ["1", "2", "3", "4", "5"], "exactMatches": [true, true, true, true, false], "relevanceValues": [1.0, 1.0, 1.0, 1.0, 0.95], "pageSize": 5, "currentPageSize": 5, "startIndex": 0, "exactSize": 4, "totalSize": 25, "datasetSize": 50, "renderParameters": "startIndex=0&pageSize=5&exactSize=4&totalSize=25&itemIds=1%2C2%2C3%2C4%2C5&exactMatches=11110"}
The second example shows a typical query that relies on the default delimiter, which is a space character.
{ "criteria": [ { "dimension": "price", "value": 200000 } ], "renderParameters": true, "startIndex": 0, "pageSize": 5 }
Once url encoded, the space character is rendered as +
.
{ "itemIds": ["1", "2", "3", "4", "5"], "exactMatches": [true, true, true, true, false], "relevanceValues": [1.0, 1.0, 1.0, 1.0, 0.95], "pageSize": 5, "currentPageSize": 5, "startIndex": 0, "exactSize": 4, "totalSize": 25, "datasetSize": 50, "renderParameters": "startIndex=0&pageSize=5&exactSize=4&totalSize=25&itemIds=1+2+3+4+5&exactMatches=11110"}