Engine and Data Tool Installation for UNIX Variants¶

This document covers the steps involved in the installation of the Transparensee Discovery Search Engine and Discovery Data Tool on UNIX variants. It assumes a basic understanding of UNIX system administration concepts and tools. Syntax may vary slightly based on your installed UNIX.

Note that the Discovery Data Tool is an optional separate component. It may be deployed alongside the Discovery Engine on the same server, or may be deployed separately. In order to ease maintenance and configuration, both the Discovery Engine and the Discovery Data Tool use similar directory layouts, start/stop scripts, and configuration files. They are set up for running as UNIX daemons in the same manner, and can be upgraded in a similar fashion.

If you are interested in upgrading an existing engine, you should be familiar with this section and refer to Upgrading an Engine for UNIX Variants for specific instructions, best practices and tips.

Hardware Requirements¶

We require the following minimum hardware to run the Discovery Engine:

2.0 GHz Athlon
1 GB RAM
40 GB HD

Software Requirements¶

The following software needs to be installed to run the engine or data tool:

Oracle Java SE Runtime Environment 7.

Recommended Installation Directory Hierarchy¶

Transparensee recommends a simple directory structure, whose top-level directory we will call the home directory. The home directory includes a read-only archives sub directory for the compressed engine version binary file; a read-only releases directory that includes the uncompressed binaries; and a working engines directory for each running instance of the engine or data tool. This structure makes it easier to manage various server instances at any stage of your development life cycle, making upgrading an engine or data tool instance easier to manage.

Archives Directory¶

The recommended directory structure is designed such that the archives directory can be considered read-only. There should be one directory for each version release of the Discovery Search Engine and Discovery Data Tool, named the same as the compressed binary file.

For example, engine version 2.8.3 would be delivered as discovery-2.8.3.zip. Copy this file to archives/discovery-2.8.3.zip.

Follow the same procedure for subsequent version upgrades you receive from Transparensee.

Releases Directory¶

The recommended directory structure is designed such that the releases directory, like the archives directory, be considered read-only.

There should be one directory for each version release of the Discovery Search Engine and Discovery Data Tool, named the same as the compressed binary file, that contains the uncompressed and ready-to-run engine binaries.

For example, engine version 2.8.3 would be delivered as discovery-2.8.3.zip. When you unzip this file under releases it will unpack into releases/discovery-2.8.3/. Similarly, data tool version 1.10 would be delivered as discovery_datatool-1.10.zip, and when you unzip this file under releases it will unpack into releases/discovery_datatool-1.10/.

Follow the same procedure for subsequent version upgrades you receive from Transparensee.

Each named releases sub directory for each engine executable version contains a bin directory. The discovery shell script from that directory will start and stop the engine contained in it’s releases directory.

Each named releases sub directory for each data tool executable version contains a bin directory. The discovery_datatool shell script from that directory will start and stop the data tool contained in it’s releases directory.

Engines Directory¶

There should be one engine data directory for each running instance of an engine or data tool. We recommend that these directories be created under the engines directory.

For the Discovery Engine, the directory should contain the instance-specific discovery.properties file and a symlink to the discovery start/stop script of the version this instance runs. The engine creates all other folders and directories automatically when it starts up. The directories that it creates include all of the index definitions, changeset data and log files.

For the Discovery Data Tool, the directory should contain the instance-specific datatool.properties file, the discovery_datatool.xml configuraiton file, and a symlink to the discovery_datatool start/stop script of the version this instance runs. The data tool creates all other directories automatically when it starts up.

Recommended Installation Example¶

In this example, we consider a single box that was configured originally for engine version 2.6.10. Since then, other versions have been archived and installed.

The production directory includes a discovery symlink that points to the discovery script for version 2.8.1. The testing directory is set up to test the newer 2.8.3 version.

A production_feed instance of the Discovery Data Tool is configured and includes a symlink that points to the discovery_datatool script for version 1.10.

A testing_feed instance of the Discovery Data Tool is presumably configured to connect to a testing database and includes a symlink that points to the discovery_datatool script for version 1.10.

~discovery/
   archives/
     discovery-2.6.10.zip
     discovery-2.7.1.zip
     discovery-2.7.2.zip
     discovery-2.8.1.zip
     discovery-2.8.3.zip
     discovery_datatool-1.10.zip
   engines/
      production/
         discovery # symlink to ../../releases/discovery-2.8.1/bin/discovery
         discovery.properties
      production_feed/
         discovery_datatool # symlink to ../../releases/discovery_datatool-1.10/bin/discovery_datatool
         datatool.properties
         discovery_datatool.xml
      testing/
         discovery # symlink to ../../releases/discovery-2.8.3/bin/discovery
         discovery.properties
      testing_feed/
         discovery_datatool # symlink to ../../releases/discovery_datatool-1.10/bin/discovery_datatool
         datatool.properties
         discovery_datatool.xml
   releases/
      discovery-2.6.10/
      discovery-2.7.1/
      discovery-2.7.2/
      discovery-2.8.1/
      discovery-2.8.3/
        bin/
           discovery
           discovery-spawner
           server.sh
        lib/
           README
           VERSION
     discovery_datatool-1.10/
        bin/
           discovery_datatool
           datatool-spawner
           run.sh

Shell Script Configuration Variables¶

The Discovery Engine shell scripts refer to the releases directory containing the discovery engine release files as RELEASE_DIR. The directory in engines that contains the per-instance data and configuration files is referred to as DISCOVERY_DIR.

The Discovery Data Tool shell scripts refer to the releases directory containing the discovery engine release files as RELEASE_DIR. The directory in engines that contains the per-instance data and configuration files is referred to as DATATOOL_DIR.

Discovery.properties File¶

The discovery.properties file, found in a named sub directory in the engines directory, serves as the configuration medium for an engine. The most important configuration settings to determine is the port to listen on and the amount of memory to allocate to the JVM. For information on other discovery.properties settings refer to discovery.properties Reference.

The discovery.properties file is a plain text file, can be edited with any text editor or created from the command line. The following example creates an engines directory for the production engine instance, to listen on port 8090 and allocates no more than 512 Mb of RAM:

$ cd ~discovery/engines
$ mkdir production
$ cd production
$ # Now create discovery.properties as describe before
$ # or use the next line for a quick setup
$ echo "port = 8090" >discovery.properties
$ echo "jvm.memory = 512" >>discovery.properties
$ # create a symlink to the discovery start/stop script
$ ln -s ../../releases/discovery-2.8.3/bin/discovery .

Datatool.properties and discovery_datatool.xml Files¶

The datatool.properties file, found in a named sub directory in the engines directory, serves as the configuration medium for an instance of the data tool. For basic use of the Discovery Data Tool, default configuration values can be used. Even if no properties are to be set, and the file is empty, a datatool.properties file must exist.

The datatool.properties file is a plain text file, can be edited with any text editor or created from the command line.

For information on the settings available, please see Configuration options in datatool.properties.

The Discovery Data Tool cannot start without a valid discovery_datatool.xml configuration file. Please see Data Tool for details.

Developer Mode - Starting and Stopping¶

To start an engine instance manually, first change working directory to the engine directory of the instance you want to start. Use the discovery symlink to start the instance, in this example, the production instance:

$ cd ~discovery/engines/production
$ ./discovery start

To stop an engine instance manually, change the working directory to the engine directory of the instance you want to stop and use the discovery symlink to stop the instance:

$ cd ~discovery/engines/production
$ ./discovery stop

Starting and stopping a data tool instance is quite similar to working with an engine. The only difference is that the script is named discovery_datatool.

To start a data tool instance manually, first change working directory to the engine directory of the instance you want to start. Use the discovery_datatool symlink to start the instance, in this example, the production_feed instance:

$ cd ~discovery/engines/production_feed
$ ./discovery_datatool start

To stop a data tool instance manually, change the working directory to the engine directory of the instance you want to stop and use the discovery_datatool symlink to stop the instance:

$ cd ~discovery/engines/production_feed
$ ./discovery_datatool stop

The discovery and discovery_datatool scripts are useful when your team is in full-fledged development mode because it easily allows you to start/stop and monitor an engine or data tool instance. As you transition into production mode, you will most likely automate starting these components when your server starts up. Refer to the next section for more information about configuring init.d scripts to automatically start instances of the Discovery Search engine and Discovery Data Tool.

For information about the other discovery shell script command line options, refer to Discovery Script Command Line Options.

Create a user to run the Discovery Components¶

Running the engine or data tool as root poses an unnecessary security risk. It is never recommended to run the Discovery components as root, even if you are going to automatically start them when your server boots. Instead we recommend that you create a non-privileged user to run the engine and data tool.

$ adduser discovery

Extract the Release Archive¶

The engine and data tool can be installed at any location. Unless you configure the applications to run on restricted ports, they can by run as a user with basic privileges. Extract the compressed file and change to the newly created discovery/ directory.

The user that will run the Discovery Search Engine needs to have write permissions in the discovery/ directory.

$ unzip discovery-X.X.X.zip
$ cd discovery_X.X.X/

Configure and Start Engine¶

Edit discovery.properties to configure engine port (default is 8090).

port = 8090

The Discovery Search Engine uses your machine’s hostname and resolved IP to generate unique ids. If your machine’s IP is not resolved remotely (e.g. by DNS or NIS), make sure it can be resolved locally, either in /etc/hosts or elsewhere. For example your /etc/hosts may look like this:

127.0.0.1 localhost
192.168.1.100 myhostname

If the machine’s host name cannot be resolved, then the engine will not start. You will see an exception like the one below in the file logs/discovery.log.

[20111026 11:53:39,672] [0000000a] [ERROR ] [com.t11e.discovery.Application] Fatal uncaught exception
java.lang.RuntimeException: java.lang.RuntimeException: java.net.UnknownHostException: myhostname: myhostname
        at com.t11e.discovery.common.functors.UniqueStringGenerator.newBaseDigest(UniqueStringGenerator.java:98)
        at com.t11e.discovery.common.functors.UniqueStringGenerator.<init>(UniqueStringGenerator.java:20)

Caused by: java.net.UnknownHostException: myhostname: myhostname
        at java.net.InetAddress.getLocalHost(InetAddress.java:1360)
        at com.t11e.discovery.common.functors.UniqueStringGenerator.newBaseDigest(UniqueStringGenerator.java:92)
        ... 72 more

Starting the Discovery Engine and/or Discovery Data Tool At System Startup (init.d)¶

Each of the discovery engine and data tool releases contain an init-script folder structure with sample init.d and sysconfig files that can be customized and installed to start the discovery search engine and discovery data tool at system startup. The init.d scripts are structured so that basic settings such as the directory where your desired instance’s properties file is, and the user to run as can be configured without modification of the provided init.d script. This is done to make future upgrades easy - just drop in the updated init.d script without the need to merge any local changes.

The init-script folders each contain a standard init script wrapper for its discovery component.

To install the engine init.d script:

Copy init.d/discovery to /etc/init.d/discovery.
Copy sysconfig/discovery file to /etc/sysconfig/discovery.
Edit /etc/sysconfig/discovery and set the RELEASE_DIR variable.

The default settings require a discovery user. The default run directory the is /opt/discovery/engines/production.

Archived release zip files should be stored in /opt/discovery/archives and the unzipped releases should be in /opt/discovery/releases.

Once configured you can start/stop the engine by running /etc/init.d/discovery.

/etc/init.d/discovery

#!/bin/bash

# chkconfig: 2345 85 15
# description: Enable discovery search engine
### BEGIN INIT INFO
# Provides:          discovery
# Required-Start:    $network $syslog
# Required-Stop:     $network $syslog
# Default-Start:     2 3 4 5
# Default-Stop:      0 1 6
# Short-Description: Start discovery search engine at boot time
# Description:       Enable discovery search engine
### END INIT INFO

set -e

CONF_FILE=/etc/sysconfig/discovery
DISCOVERY_DIR=/opt/discovery/engines/production
DISCOVERY_USER=discovery

if [ -f $CONF_FILE ]; then
    . $CONF_FILE
fi

if [ -z "$RELEASE_DIR" ]; then
  echo >&2 "RELEASE_DIR must be specified in $CONF_FILE"
  exit 1
fi
if [ ! -x "$RELEASE_DIR/bin/discovery" ]; then
  echo >&2 "Release is missing or executable: $RELEASE_DIR/bin/discovery"
  exit 1
fi
if [ ! -d "$DISCOVERY_DIR" ]; then
  echo >&2 "DISCOVERY_DIR is missing: $DISCOVERY_DIR"
  exit 1
fi
if [ ! -f "$DISCOVERY_DIR/discovery.properties" ]; then
  echo >&2 "Discovery properties file is missing: $DISCOVERY_DIR/discovery.properties"
  exit 1
fi

cd "$DISCOVERY_DIR"
if [ "$USER" != "$DISCOVERY_USER" ]; then
  su -l "$DISCOVERY_USER" \
    -c "env DISCOVERY_DIR=\"$DISCOVERY_DIR\" RELEASE_DIR=\"$RELEASE_DIR\" \"$RELEASE_DIR\"/bin/discovery $*"
else
  export DISCOVERY_DIR RELEASE_DIR
  "$RELEASE_DIR/bin/discovery" $*
fi

/etc/sysconfig/discovery

# You may customize the startup of the discovery
# search engine by specifying variables in this
# file, RELEASE_DIR is required
#
# DISCOVERY_USER
#   will set what user account the engine is started
#   under
# DISCOVERY_DIR
#   will set which directory is used for the data
#   files and properties files for startup of the
#   engine. The default is
#   /opt/discovery/engines/production
# RELEASE_DIR
#   will set which version of the engine is started

RELEASE_DIR=

To install the data tool init.d script:

Copy init.d/discovery_datatool to /etc/init.d/discovery_datatool.
Copy sysconfig/discovery_datatool file to /etc/sysconfig/discovery_datatool.
Edit /etc/sysconfig/discovery_datatool and set the RELEASE_DIR variable.

The default settings require a discovery user. The default run directory the is /opt/discovery/engines/feed.

Archived release zip files should be stored in /opt/discovery/archives and the unzipped releases should be in /opt/discovery/releases.

Once configured you can start/stop the data tool by running /etc/init.d/discovery_datatool.

/etc/init.d/discovery_datatool

#!/bin/bash

# chkconfig: 2345 85 15
# description: Enable discovery data tool
### BEGIN INIT INFO
# Provides:          discovery_datatool
# Required-Start:    $network $syslog
# Required-Stop:     $network $syslog
# Default-Start:     2 3 4 5
# Default-Stop:      0 1 6
# Short-Description: Start discovery data tool at boot time
# Description:       Enable discovery data tool
### END INIT INFO

set -e

CONF_FILE=/etc/sysconfig/discovery_datatool
DATATOOL_DIR=/opt/discovery/engines/feed
DATATOOL_USER=discovery

if [ -f $CONF_FILE ]; then
    . $CONF_FILE
fi

if [ -z "$RELEASE_DIR" ]; then
  echo >&2 "RELEASE_DIR must be specified in $CONF_FILE"
  exit 1
fi
if [ ! -x "$RELEASE_DIR/bin/discovery_datatool" ]; then
  echo >&2 "Release is missing or executable: $RELEASE_DIR/bin/discovery_datatool"
  exit 1
fi
if [ ! -d "$DATATOOL_DIR" ]; then
  echo >&2 "DATATOOL_DIR is missing: $DATATOOL_DIR"
  exit 1
fi
if [ ! -f "$DATATOOL_DIR/datatool.properties" ]; then
  echo >&2 "Discovery data tool properties file is missing: $DATATOOL_DIR/datatool.properties"
  exit 1
fi

cd "$DATATOOL_DIR"
if [ "$USER" != "$DATATOOL_USER" ]; then
  su -l "$DATATOOL_USER" \
    -c "env DATATOOL_DIR=\"$DATATOOL_DIR\" RELEASE_DIR=\"$RELEASE_DIR\" \"$RELEASE_DIR\"/bin/discovery_datatool $*"
else
  export DATATOOL_DIR RELEASE_DIR
  "$RELEASE_DIR/bin/discovery_datatool" $*
fi

/etc/sysconfig/discovery_datatool

# You may customize the startup of the discovery
# data tool by specifying variables in this
# file, RELEASE_DIR is required
#
# DATATOOL_USER
#   Which user account to use when running the data tool
#
# DATATOOL_DIR
#   The directory used for the configuration
#   files for startup of the data tool. The default is
#   /opt/discovery/engines/feed
#
# RELEASE_DIR
#   Determines which version of the data tool is started

RELEASE_DIR=

Startup with Multiple Engine or Data Tool Instances¶

If you require multiple engine installs or multiple data tool installs on a single box, then copy the discovery scripts in /etc/init.d and /etc/sysconfig. Give each install a different name. Note that each init.d script references its configuration file under /etc/sysconfig with the variable CONF_FILE. You will need to update that line in the init.d script to reference its corresponding configuration file.

File Naming Guildines

It is recommended that you use a suffix that helps identify which engine or data tool is starting and synchronize the init.d script name with the configuration file name.

For example, /etc/init.d/discovery-testing would have the CONF_FILE variable set to /etc/sysconfig/discovery-testing which would be created from /etc/sysconfig/discovery and set the correct DISCOVERY_DIR to use.

DEBIAN

Debian users should follow the previous instructions but replace all references to /etc/sysconfig with /etc/default.

Starting the Discovery Engine or Data Tool At System Startup (chkconfig)¶

The supplied init.d scripts for starting and stopping the discovery components contain chkconfig and LSB 3.1 compliant comments instructing the engine or data tool to start at run levels 2, 3, 4 and 5.

To enable startup at these run levels run:

# chkconfig discovery reset

or

# chkconfig discovery_datatool reset

Data Tool Log Files¶

When using the discovery_datatool start/stop script, the datatool log file, datatool-[0-9].log, will be located in the logs directory of the named engines sub directory.

As of version 1.10, the Discovery Data Tool uses JVM 1.4 logging and enabled log file rotation. The logging configuration file is called logging.properties.

Monitor the output from the datatool-0.log to ensure the server started successfully.

$ cd ~discovery/engines/production_feed
$ tail -f log/datatool-0.log
[20120110 14:49:03,009] [0000000a] [INFO   ] \
  [com.t11e.discovery.datatool.ConfigurationManager] \
    Checking that the configuration is valid...
[20120110 14:49:03,184] [0000000a] [INFO   ] \
  [com.t11e.discovery.datatool.ConfigurationManager] \
    Configuration is valid.
[20120110 14:49:03,513] [0000000a] [INFO   ] \
  [com.t11e.discovery.datatool.WebServerMain] \
    Discovery Data Tool listening for HTTP on 8089

Engine Log Files¶

When using the discovery start/stop script, the engine log file, discovery.log, will be located in the logs directory of the named engines sub directory.

As of version 3.13, the Discovery Engine logs via log4j and is preconfigured to rotate the log files by size. You can customize logging by setting the log4j.configuration system property. For more information on configuring the Discovery engine’s logging, refer to Discovery Log File.

Monitor the output from the discovery.log to ensure the server started successfully.

$ cd ~discovery/engines/production
$ tail -f log/discovery.log
Java HotSpot(TM) 64-Bit Server VM version 16.3-b01-279

[20050824 13:14:20,747] [00c1f10e] [info ] \
  [com.t11e.common.property.ConfigurationFactory] Loading configuration \
  from: common
[20050824 13:14:20,914] [00c1f10e] [info ] \
  [com.t11e.common.property.ConfigurationFactory] Loading configuration \
   from: rabbit
[20050824 13:14:20,930] [00c1f10e] [info ] \
  [com.t11e.common.property.ConfigurationFactory] Loading configuration \
  from: zebra
[20050824 13:14:20,979] [00c1f10e] [info ] \
  [com.t11e.common.property.ConfigurationFactory] Loading configuration \
  from: discovery
[20050824 13:14:20,982] [00c1f10e] [warn ] \
  [com.t11e.common.property.ConfigurationFactory] Unable to load configuration \
  from local (ignored)
[20050824 13:14:20,984] [00c1f10e] [info ] \
  [com.t11e.common.property.ConfigurationFactory] Loading configuration \
  from system properties
[20050824 13:14:21,279] [00c1f10e] [info ] [com.t11e.zebra.Application] \
  Starting
[20050824 13:14:24,368] [00c1f10e] [info ] \
 [com.t11e.common.http.JettyHttpServer] Listening on http://localhost:8090

When the engine is running, its status can be checked at http://localhost:8090, the URL of the Admin Tool Interface.

Engine Settings File¶

As of version 2.8.5, the engine’s settings are stored in discovery.settings. This file should not be edited as it is intended to be managed by the Admin Tool.

When a customer upgrades to version 2.8.5, the discovery.settings file is automatically created with copies of the settings migrated from discovery.properties.

The original settings remain in discovery.properties but they are ignored. Only settings that determine the startup options for the engine’s environment remain in discovery.properties.

For more information, refer to discovery.properties Reference.