Engine Configuration Scenarios

The discovery engine supports two general types of server configuration, depending on customer needs.

  • Single Engine. For customers who do not need any backup redundancy, such as in a development environment, a single engine setup provides the most flexibility and control because all engine functions are performed by the same instance.
  • Data Distribution. This configuration scenario allows easy horizontal scaling with scaling cloned copies of a query engine. This configuration model works best with a minimum of two engine instances, described in more detail below. This configuration is the easiest to configure for a production system because it enables easy capacity improvements as traffic increases.

Single Engine

There are no special instructions to configure a single engine instance since a single engine can perform all the required tasks. The discovery.properties file includes all of the necessary configuration information.

Data Distribution

There are two server types in this configuration: Data Master and Data Slave. It is important to understand that any engine instance can support data distribution as query processing; however, in this model, it is practical to dedicate specific engine instances to fulfill a single role.

Data Master Configuration

The Data Master’s sole role is to process changesets and dimensions. It makes them available to any engine that requests either. Any engine instance, regardless of the selected configuration, can serve as a data master because this feature is built into the engine’s core functionality.

If your configuration plans include having the Data Master engine instance respond to queries, this can be configured in discovery.properties. Transparensee recommends that the Data Master not be configured to respond to queries in such a configuration.

The following discovery.properties configures a data master that does not respond to queries and only acts as a Data Master. Refer to Dimension, Changeset, Feed & Checkpoints Configuration for a full reference to the discovery.properties settings that apply to data and query distribution.

port = 8090
total-partitions = 0
# JVM Memory requirements for a Data Master are minimal
jvm.memory = 256

Data Slave Configuration

The Data Slave’s primary role is to respond to query requests. Its secondary role is to periodically query the Data Master for the most recent changeset data. If you have multiple Data Slaves, then each Data Slave works independently of all other slaves when it pulls its changesets from the Data Master.

The Data Master does not need to be restarted to connect to and deliver changesets to a Data Slave.

Distribution of queries to the Data Slaves is accomplished by your load balancer of choice. Discovery search engines in the Data Distribution configuration do not automatically load balance.

The following discovery.properties configures a Data Slave which responds to queries, checks for new changeset data and dimensions every 2 minutes from the Data Master engine defined in the previous example. Refer to Dimension, Changeset, Feed & Checkpoints Configuration for a full reference to the discovery.properties settings that apply to data and query distribution.

port = 8091
jvm.memory = 512
master = http://example.com:8090
master-interval = 120000