Getting changesets to the engine¶
The Discovery Search Engine has been designed as a push-based system. However, it is sometimes more convenient for it to pull the data.
If you would like to use a pull-based approach, refer to the Changeset Publisher chapter of the documentation.
If you would like to use a push based approach then you will need to track changes in your data as well as which ones have been successfully pushed to the Discovery Search Engine. The changes tracked should include creation, modification and deletion of records. Pushing is probably best handled with a persistent queue so that failed pushes can be handled at a later date.
Validating and pushing an initial changeset to the engine¶
The changeset publisher should generate an XML document that you can validate against the schema at http://transparensee.com/dtd/zebra/changeset-1.0.dtd.
$ xmllint --dtdvalid http://transparensee.com/dtd/zebra/changeset-1.0.dtd \ --noout changeset.xml
If the document validates, push an initial changeset to the engine at its changeset URL, ensuring that Content-Type is text/xml. ( curl does not stream data and may fail when POSTing large changesets). The id of the changeset is returned.
$ curl --header 'Content-type: text/xml' --data-binary @changeset.xml \ http://localhost:8090/ws/changeset 86d9cc25c9df185898e0affb8385b978-0
Changeset document structure¶
A changeset document consists of a set of actions to add and remove items and add and remove properties for each of those items. Items are the uniquely defined objects that users of your system are searching for, while properties are the characteristics of the objects by which users will perform searches. Properties usually map well to columns in a database and can be named similarly for clarity.
Definition of actions:
- Creates the item if it does not already exist. Removes any existing properties of the item, and sets the properties specified.
- Adds the item. Ignored if the item already exists.
- Adds the properties to the item.
- Deletes the item. Ignored if the item doesn’t exist.
- Deletes the properties from the item. Properties that don’t exist are ignored.
The order of actions is important, they are guaranteed to be executed in the order in which they are defined.
An item cannot be deleted then added (or added then deleted) within the same changeset – this will cause a conflict on the server due to transaction semantics.
Attempts to remove an item that doesn’t exist or add properties to an item that doesn’t exist will fail silently.
Example changeset :
<?xml version="1.0"?> <changeset> <set-item id="007"> <properties> <struct> <entry name="gender"><int>1</int></entry> <entry name="religion"><int>5</int></entry> <entry name="weight"><int>140</int></entry> <entry name="height"><real>61.5</real></entry> <entry name="bio"><string>Stand up man</string></entry> <entry name="ethnicity"> <array> <element><int>1</int></element> <element><int>5</int></element> </array> </entry> </struct> </properties> </set-item> <add-item id="001"/> <remove-item id="002"/> <add-to-item id="001"> <properties> <struct> <entry name="bio"><string>Nice guy</string></entry> <entry name="religion"><int>4</int></entry> <entry name="ethnicity"> <array> <element><int>2</int></element> </array> </entry> </struct> </properties> </add-to-item> <remove-from-item id="003"> <all/> </remove-from-item> <remove-from-item id="003"> <properties> <struct> <entry name="bio"/> </struct> </properties> </remove-from-item> </changeset>