Web Services¶
The Discovery Search Engine supports a variety of web service entry points.
Changesets¶
The Changesets web service supports listing stored, posting new or exporting existing changesets.
Listing Stored Changesets¶
List stored changesets. URL Parameters: full - returns extended history past the last snapshot.
http://example.com:8090/ws/changeset [GET or HEAD]
Example Request¶
GET /ws/changeset HTTP/1.1
Example Response¶
HTTP/1.1 200 OK
Content-Type: text/xml; charset=utf-8
<changesets total="1">
<changeset id="a873e97062faebe9a4eb4402e6952188" snapshot="true"
date="Wed, 13 Oct 2010 20:36:31 GMT" md5="ed37baa943969c6fadabed9806f58fb9"
href="http://localhost:8090/ws/changeset/a873e97062faebe9a4eb4402e6952188"
length="67291575" rawLength="14785239" />
</changesets>
Exporting a Stored Changeset¶
Retrieve the contents of a changeset by identifier id.
http://example.com:8090/ws/changeset/[id] [GET or HEAD]
Parameters¶
id
- (Required.) Identifier of changeset to retrieve
Example Request¶
GET /ws/changeset/123 HTTP/1.1
Example Response¶
HTTP/1.1 200 OK
Content-Type: text/xml
[Changeset follows in response body]
Apply/Store Changeset¶
Apply and store a new changeset. The input is a changeset. In the response, the Location header is set to the URL of the changeset, and the body lists the changeset identifier.
http://example.com:8090/ws/changeset/[id] [POST]
Query Parameters¶
type
Specifies the type of changeset to post, one of eitherdelta
(default),snapshot
,full
,bulk
orreset
.
Example Request¶
POST /ws/changeset HTTP/1.1
Content-Type: text/xml
[Changeset follows in response body]
Example Response¶
HTTP/1.1 201 Created
Content-Type: text/plain
3b9897903f544040ba40e48b82d01310
Status Codes¶
201
Changeset was created204
Changeset was empty (as of version 2.8.5)400
Changeset was invalid
Dimensions¶
The Dimensions web service supports importing a new or exporting an existing dimensions document. The input is a dimensions document when exporting.
http://example.com:8090/ws/dimensions [GET or POST]
Export the Dimensions Document¶
Retrieve the current defined dimensions. The output is a dimension definition document.
http://example.com:8090/ws/dimensions [POST]
Example Usage¶
curl http://example.com:8090/ws/dimensions
Post a new Dimensions Document¶
Set the current defined dimensions. The input is a dimensions specification document. The content type is text/xml.
http://example.com:8090/ws/dimensions [POST]
Example Usage¶
curl --header 'Content-Type: text/xml' --data-binary @dimensions.xml \
http://example.com:8090/ws/dimensions
Item¶
Fetches changeset property data for the requested item. In contrast to /ws/values, /ws/item returns the original changeset property data (which may be different than what was actually indexed).
http://example.com:8090/ws/item [GET]
Parameters¶
id
- (Required.) Identifier of item to retrieve
Example Request¶
GET /ws/item/123 HTTP/1.1
Example Response¶
HTTP/1.1 200 OK
Content-Type: application/json
{
"_id": "123",
"genres": "Science Fiction,Action,Adventure",
"release_year": "2009",
"MPAA_rating": "PG-13",
"duration": "127",
"title": "Star Trek",
"synopsis": "Boldly going where no one has gone before."
}
DidYouMean¶
Fetches suggested query values for for the requested query string. Did You Mean? suggestions can also be requested via a search criterion.
See: DidYouMean Criterion for a description of the Did You Mean? request API for additional information, data types, default and valid values.
http://example.com:8090/ws/didyoumean/[dimension] [GET or POST]
Parameters¶
query
- (Required.) Query string to process for suggestionsmaxSuggestions
- Maximum number of suggestions to return/integer
distanceAlgorithm
- Word distance algorithm to use.string
morePopular
- Uses indexed term relevance to identify best suggestions.boolean
highlighting
- Enabled or disables highlighting of the suggested querieshighlighting.preTemplate
- If highlighting is enabled, specifies the string to place before of the replaced query term.highlighting.postTemplate
- If highlighting is enabled, specifies the string to place after of the replaced query term.escapeHtml
- Sanitizes any HTML in the original query and suggestions.boolean
Example Request¶
GET /ws/didyoumean/freetext?query=st+marys&distanceAlgorithm=levenshtein
&highlighting.preTemplate=%3Ci%3E&highlighting.postTemplate=%3C/i%3E HTTP/1.1
Example Response¶
See: DidYouMean for a description of the Did You Mean? response JSON format.
HTTP/1.1 200 OK
Content-Type: application/json
{
"tokenCount": 2,
"uncertainCount": 1,
"query": {
"value": "st marys",
"label": "<i>st</i> <i>marys</i>"
},
"suggestions": [
{
"value": "st mary",
"label": "st <i>mary</i>"
},
{
"value": "st maria",
"label": "st <i>maria</i>"
},
{
"value": "st mark",
"label": "st <i>mark</i>"
},
{
"value": "st mar",
"label": "st <i>mar</i>"
}
]
}
Items¶
Fetches changeset property data for one or more items, returning the results in a JSON array in the order in which the items were specified in the request.
If the any of the requested items are not found, the response will indicate which items are missing.
New in version 2.8.3.
The property data fetched from the changeset database is cached.
http://example.com:8090/ws/items [GET or POST]
Parameters (GET)¶
items
- (Required.) Delimited list of identifiers of items to retrieve.delimiter
- The delimited used to separate the ids in the request. Default is a comma (,).properties
- Changeset property ids (names) for which data should be returned. Default is all properties.
Example Request (GET)¶
GET /ws/items?items=123,missing_id&delimiter=, HTTP/1.1
Example Request (POST)¶
The POST method takes a JSON object as its input. The input defines the identifiers and (optionally) properties to return.
curl --header 'Content-Type: application/json' \
http://example.com:8090/ws/items
The format of the JSON request object is
{
"ids": [
"123",
"missing_id"
],
"properties":
[
"_id",
"genres",
"release_year",
"MPAA_rating",
"duration",
"title",
"synopsis"
]
}
Example Response (POST and GET)¶
If any requested id is not found, then the response includes a key “_exists” that has the value of “false”.
HTTP/1.1 200 OK
Content-Type: application/json
[
{
"_id": "123",
"genres": "Science Fiction,Action,Adventure",
"release_year": "2009",
"MPAA_rating": "PG-13",
"duration": "127",
"title": "Star Trek",
"synopsis": "Boldly going where no one has gone before."
},
{
"_id": "missing_id",
"_exists": false
}
]
Values¶
NOTE: This web service is provided as-is for development purposes only. Transparensee reserves the rights to alter this API at any time in the future.
Fetches indexed values for the requested item. In contrast to /ws/item, /ws/values returns the indexed data (which may be different than the original changeset property data.
http://example.com:8090/ws/values[id] [GET]
Parameters¶
id
- (Required.) Identifier of item to retrieve
Example Request¶
GET /ws/values/123 HTTP/1.1
Example Response¶
In this example, there are four dimensions defined: “content-type,” “genres,” “release_year” and “title.”
HTTP/1.1 200 OK
Content-Type: application/json
{
"id": "123",
"content-type": "Movie",
"genres": ["Action","Adventure","Science Fiction"],
"release_year": 2009,
"title":"star trek"
}
Statistics¶
The Discovery Search Engine exposes a number of data points that are of interest on a web service at /ws/statistics. The statistics exposed are transient, all counters will reset when the engine stops.
http://example.com:8090/ws/statistics [GET]
We provide a munin plugin that can be used to generate graphs based on the exposed data.
The project page is hosted at github at:
http://github.com/t11e/discovery_munin
You can download the latest source as a zip file from:
http://github.com/t11e/discovery_munin/releases
Example Usage¶
To see the available data points:
$ curl http://localhost:8090/ws/statistics
http
changeset
item
checkpoint
changeset.apply
index
index.query.tree
index.query.long
index.query.keyword
index.query.groupBy
index.query.integer
index.query.geoloc
index.query.text
index.query.time
index.query.double
query
json
xmlrpc
To get the data itself GET from the URL with /fetch/${datapoint}. Where datapoint is a space delimited list of options as output from /ws/statistics.
Get the HTTP statistics:
$ curl http://localhost:8090/ws/statistics/fetch/http
http.time.count: 2663
http.time.mean: 88.51032669921146
http.time.min: 0
http.time.max: 34629
http.time.variance: 480464.4911592846
http.time.stddev: 693.1554595898994
http.time.sum: 235703
http.uncaught.io: 0
http.uncaught.runtime: 0
http.uncaught.error: 0
Get the item statistics:
$ curl http://localhost:8090/ws/statistics/fetch/item
item.count: 2381
item.errors: 0
item.disk: 8518018
Get both the HTTP and item statistics:
$ curl http://localhost:8090/ws/statistics/fetch/item+http
item.count: 2381
item.errors: 0
item.disk: 8518018
http.time.count: 2668
http.time.mean: 88.34632683658175
http.time.min: 0
http.time.max: 34629
http.time.variance: 479578.0629898773
http.time.stddev: 692.515749272085
http.time.sum: 235708
http.uncaught.io: 0
http.uncaught.runtime: 0
http.uncaught.error: 0
The returned data expands as needed, so if an engine has only serviced one query it will look like this:
$ curl http://localhost:8090/ws/statistics/fetch/query
query.regular.count: 1
query.regular.size.mean: 22555
query.regular.size.sum: 22555
query.regular.time.mean: 91
query.regular.time.sum: 91
If it has serviced two queries then you’ll get more information:
$ curl http://localhost:8090/ws/statistics/fetch/query
query.regular.count: 2
query.regular.size.mean: 22556.0
query.regular.size.min: 22555
query.regular.size.max: 22557
query.regular.size.variance: 2.0
query.regular.size.stddev: 1.4142135623730951
query.regular.size.sum: 45112
query.regular.time.mean: 70.0
query.regular.time.min: 49
query.regular.time.max: 91
query.regular.time.variance: 882.0
query.regular.time.stddev: 29.698484809834994
query.regular.time.sum: 140
Field details¶
The most interesting fields are described here.
Total time taken for all queries returning non empty results
- query.regular.time.sum
Total time taken for queries returning empty results
- query.empty.time.sum
Total number of invalid/failed/successful XMLRPC
- xmlrpc.invalid.count
- xmlrpc.failed.count
- xmlrpc.success.count
Total time taken for invalid/failed/successful XMLRPC
- xmlrpc.invalid.sum
- xmlrpc.failed.sum
- xmlrpc.success.sum
Total number of invalid/failed/successful JSON calls
- json.invalid.count
- json.failed.count
- json.success.count
Total time taken for invalid/failed/successful JSON calls
- json.invalid.sum
- json.failed.sum
- json.success.sum
Number of items in the current partition
- index.items
Number of indexes
- index.count
Number of items in the dataset
- item.count
Size of the dataset on disk (db/items directory)
- item.disk
Number of created changesets by type:
- changeset.reset.size.count
- changeset.delta.size.count
- changeset.snapshot.size.count
- changeset.bulk.size.count
- changeset.checkpoint.size.count
Total uncompressed size:
- changeset.reset.size.sum
- changeset.delta.size.sum
- changeset.snapshot.size.sum
- changeset.bulk.size.sum
- changeset.checkpoint.size.sum
Total compressed size:
- changeset.reset.compressed.sum
- changeset.delta.compressed.sum
- changeset.snapshot.compressed.sum
- changeset.bulk.compressed.sum
- changeset.checkpoint.compressed.sum
Total number of applied changesets (those that are written to the DB file)
- changeset.apply.count
Break down of item actions across the changeset applications
- changeset.apply.created.sum
- changeset.apply.modified.sum
- changeset.apply.deleted.sum
Total time taken generating checkpoints:
- checkpoint.time.sum
Total number of checkpoints generated;
- checkpoint.time.count
Total number of HTTP requests served;
- http.time.count
Total time to server them:
- http.time.sum
Checkpoint¶
The checkpoint web service will create a checkpoint on demand. This web service performs an action and takes no parameters.
New in version 2.8.3.
Queryable¶
The Queryable web service can be used to determine if the engine is in a state that it can respond to queries, e.g. “queryable.”
The service is meant to be used by smart load-balancers (such as Amazon’s elastic load balancer), though it could be used by other applications interacting with the engine.
When the system is able to respond to queries, it will return status code 204 (No content). When it is not able to respond with query results, it will return status code 503 (Service Unavailable).
While the engine is booting up and populating its indexes, the queryable web
service will fail if the indices are not yet fully populated unless the URL contains
the whilePopulating
URL parameter.
This web service is also available on the path /queryable
for compatility with older
releases. Release 3.8 introduced the current path of /ws/queryable
.
Optional URL Parameters¶
The following options can be specified as query parameters on the URL
success
- An integer status code value that determines the status code returned if the engine is “queryable.” For example, if using an Amazon elastic load balancer that does not support status code 204, success=200 might be an appropriate option.Default: 204 (No Content)
error
- An integer status code value that determines the status code returned if the engine is not “queryable”.Default: 503 (Service Unavailable)
New in version 3.8.
retryAfter
- Optional value for the response headerRetry-After
for when the engine is not “queryable”.Default: 30
New in version 3.8.
whilePopulating
- A string boolean that indicates to reply with success if the engine is starting up but has not yet fully populated its indexes.Default: false
New in version 2.8.3.
Example Usage¶
To check if the engine can respond to queries while it is populating indexes on startup.
$ curl http://example.com:8090/ws/queryable?whilePopulating=true
System State¶
http://example.com:8090/ws/info/system-state [GET]
Example Response¶
<system-state>
<running>
<true />
</running>
<queryable>
<true />
</queryable>
</system-state>