Engine and Data Tool Installation for UNIX Variants

This document covers the steps involved in the installation of the Transparensee Discovery Search Engine and Discovery Data Tool on UNIX variants. It assumes a basic understanding of UNIX system administration concepts and tools. Syntax may vary slightly based on your installed UNIX.

Note that the Discovery Data Tool is an optional separate component. It may be deployed alongside the Discovery Engine on the same server, or may be deployed separately. In order to ease maintenance and configuration, both the Discovery Engine and the Discovery Data Tool use similar directory layouts, start/stop scripts, and configuration files. They are set up for running as UNIX daemons in the same manner, and can be upgraded in a similar fashion.

If you are interested in upgrading an existing engine, you should be familiar with this section and refer to Upgrading an Engine for UNIX Variants for specific instructions, best practices and tips.

Hardware Requirements

We require the following minimum hardware to run the Discovery Engine:

  • 2.0 GHz Athlon
  • 1 GB RAM
  • 40 GB HD

Software Requirements

The following software needs to be installed to run the engine or data tool:

  • Sun Java Runtime Environment 1.6.

Create a user to run the Discovery Components

Running the engine or data tool as root poses an unnecessary security risk. It is never recommended to run the Discovery components as root, even if you are going to automatically start them when your server boots. Instead we recommend that you create a non-privileged user to run the engine and data tool.

$ adduser discovery

Extract the Release Archive

The engine and data tool can be installed at any location. Unless you configure the applications to run on restricted ports, they can by run as a user with basic privileges. Extract the compressed file and change to the newly created discovery/ directory.

The user that will run the Discovery Search Engine needs to have write permissions in the discovery/ directory.

$ unzip discovery-X.X.X.zip
$ cd discovery_X.X.X/

Configure and Start Engine

Edit discovery.properties to configure engine port (default is 8090).

port = 8090

The Discovery Search Engine uses your machine’s hostname and resolved IP to generate unique ids. If your machine’s IP is not resolved remotely (e.g. by DNS or NIS), make sure it can be resolved locally, either in /etc/hosts or elsewhere. For example your /etc/hosts may look like this:

127.0.0.1 localhost
192.168.1.100 myhostname

If the machine’s host name cannot be resolved, then the engine will not start. You will see an exception like the one below in the file logs/discovery.log.

[20111026 11:53:39,672] [0000000a] [ERROR ] [com.t11e.discovery.Application] Fatal uncaught exception
java.lang.RuntimeException: java.lang.RuntimeException: java.net.UnknownHostException: myhostname: myhostname
        at com.t11e.discovery.common.functors.UniqueStringGenerator.newBaseDigest(UniqueStringGenerator.java:98)
        at com.t11e.discovery.common.functors.UniqueStringGenerator.<init>(UniqueStringGenerator.java:20)

Caused by: java.net.UnknownHostException: myhostname: myhostname
        at java.net.InetAddress.getLocalHost(InetAddress.java:1360)
        at com.t11e.discovery.common.functors.UniqueStringGenerator.newBaseDigest(UniqueStringGenerator.java:92)
        ... 72 more

Starting the Discovery Engine and/or Discovery Data Tool At System Startup (init.d)

Each of the discovery engine and data tool releases contain an init-script folder structure with sample init.d and sysconfig files that can be customized and installed to start the discovery search engine and discovery data tool at system startup. The init.d scripts are structured so that basic settings such as the directory where your desired instance’s properties file is, and the user to run as can be configured without modification of the provided init.d script. This is done to make future upgrades easy - just drop in the updated init.d script without the need to merge any local changes.

The init-script folders each contain a standard init script wrapper for its discovery component.

To install the engine init.d script:

  • Copy init.d/discovery to /etc/init.d/discovery.
  • Copy sysconfig/discovery file to /etc/sysconfig/discovery.
  • Edit /etc/sysconfig/discovery and set the RELEASE_DIR variable.

The default settings require a discovery user. The default run directory the is /opt/discovery/engines/production.

Archived release zip files should be stored in /opt/discovery/archives and the unzipped releases should be in /opt/discovery/releases.

Once configured you can start/stop the engine by running /etc/init.d/discovery.

/etc/init.d/discovery

#!/bin/bash

# chkconfig: 2345 85 15
# description: Enable discovery search engine
### BEGIN INIT INFO
# Provides:          discovery
# Required-Start:    $network $syslog
# Required-Stop:     $network $syslog
# Default-Start:     2 3 4 5
# Default-Stop:      0 1 6
# Short-Description: Start discovery search engine at boot time
# Description:       Enable discovery search engine
### END INIT INFO

set -e

CONF_FILE=/etc/sysconfig/discovery
DISCOVERY_DIR=/opt/discovery/engines/production
DISCOVERY_USER=discovery

if [ -f $CONF_FILE ]; then
    . $CONF_FILE
fi

if [ -z "$RELEASE_DIR" ]; then
  echo >&2 "RELEASE_DIR must be specified in $CONF_FILE"
  exit 1
fi
if [ ! -x "$RELEASE_DIR/bin/discovery" ]; then
  echo >&2 "Release is missing or executable: $RELEASE_DIR/bin/discovery"
  exit 1
fi
if [ ! -d "$DISCOVERY_DIR" ]; then
  echo >&2 "DISCOVERY_DIR is missing: $DISCOVERY_DIR"
  exit 1
fi
if [ ! -f "$DISCOVERY_DIR/discovery.properties" ]; then
  echo >&2 "Discovery properties file is missing: $DISCOVERY_DIR/discovery.properties"
  exit 1
fi

cd "$DISCOVERY_DIR"
if [ "$USER" != "$DISCOVERY_USER" ]; then
  su -l "$DISCOVERY_USER" \
    -c "env DISCOVERY_DIR=\"$DISCOVERY_DIR\" RELEASE_DIR=\"$RELEASE_DIR\" \"$RELEASE_DIR\"/bin/discovery $*"
else
  export DISCOVERY_DIR RELEASE_DIR
  "$RELEASE_DIR/bin/discovery" $*
fi

/etc/sysconfig/discovery

# You may customize the startup of the discovery
# search engine by specifying variables in this
# file, RELEASE_DIR is required
#
# DISCOVERY_USER
#   will set what user account the engine is started
#   under
# DISCOVERY_DIR
#   will set which directory is used for the data
#   files and properties files for startup of the
#   engine. The default is
#   /opt/discovery/engines/production
# RELEASE_DIR
#   will set which version of the engine is started

RELEASE_DIR=

To install the data tool init.d script:

  • Copy init.d/discovery_datatool to /etc/init.d/discovery_datatool.
  • Copy sysconfig/discovery_datatool file to /etc/sysconfig/discovery_datatool.
  • Edit /etc/sysconfig/discovery_datatool and set the RELEASE_DIR variable.

The default settings require a discovery user. The default run directory the is /opt/discovery/engines/feed.

Archived release zip files should be stored in /opt/discovery/archives and the unzipped releases should be in /opt/discovery/releases.

Once configured you can start/stop the data tool by running /etc/init.d/discovery_datatool.

/etc/init.d/discovery_datatool

#!/bin/bash

# chkconfig: 2345 85 15
# description: Enable discovery data tool
### BEGIN INIT INFO
# Provides:          discovery_datatool
# Required-Start:    $network $syslog
# Required-Stop:     $network $syslog
# Default-Start:     2 3 4 5
# Default-Stop:      0 1 6
# Short-Description: Start discovery data tool at boot time
# Description:       Enable discovery data tool
### END INIT INFO

set -e

CONF_FILE=/etc/sysconfig/discovery_datatool
DATATOOL_DIR=/opt/discovery/engines/feed
DATATOOL_USER=discovery

if [ -f $CONF_FILE ]; then
    . $CONF_FILE
fi

if [ -z "$RELEASE_DIR" ]; then
  echo >&2 "RELEASE_DIR must be specified in $CONF_FILE"
  exit 1
fi
if [ ! -x "$RELEASE_DIR/bin/discovery_datatool" ]; then
  echo >&2 "Release is missing or executable: $RELEASE_DIR/bin/discovery_datatool"
  exit 1
fi
if [ ! -d "$DATATOOL_DIR" ]; then
  echo >&2 "DATATOOL_DIR is missing: $DATATOOL_DIR"
  exit 1
fi
if [ ! -f "$DATATOOL_DIR/datatool.properties" ]; then
  echo >&2 "Discovery data tool properties file is missing: $DATATOOL_DIR/datatool.properties"
  exit 1
fi

cd "$DATATOOL_DIR"
if [ "$USER" != "$DATATOOL_USER" ]; then
  su -l "$DATATOOL_USER" \
    -c "env DATATOOL_DIR=\"$DATATOOL_DIR\" RELEASE_DIR=\"$RELEASE_DIR\" \"$RELEASE_DIR\"/bin/discovery_datatool $*"
else
  export DATATOOL_DIR RELEASE_DIR
  "$RELEASE_DIR/bin/discovery_datatool" $*
fi

/etc/sysconfig/discovery_datatool

# You may customize the startup of the discovery
# data tool by specifying variables in this
# file, RELEASE_DIR is required
#
# DATATOOL_USER
#   Which user account to use when running the data tool
#
# DATATOOL_DIR
#   The directory used for the configuration
#   files for startup of the data tool. The default is
#   /opt/discovery/engines/feed
#
# RELEASE_DIR
#   Determines which version of the data tool is started

RELEASE_DIR=

Startup with Multiple Engine or Data Tool Instances

If you require multiple engine installs or multiple data tool installs on a single box, then copy the discovery scripts in /etc/init.d and /etc/sysconfig. Give each install a different name. Note that each init.d script references its configuration file under /etc/sysconfig with the variable CONF_FILE. You will need to update that line in the init.d script to reference its corresponding configuration file.

File Naming Guildines

It is recommended that you use a suffix that helps identify which engine or data tool is starting and synchronize the init.d script name with the configuration file name.

For example, /etc/init.d/discovery-testing would have the CONF_FILE variable set to /etc/sysconfig/discovery-testing which would be created from /etc/sysconfig/discovery and set the correct DISCOVERY_DIR to use.

DEBIAN

Debian users should follow the previous instructions but replace all references to /etc/sysconfig with /etc/default.

Starting the Discovery Engine or Data Tool At System Startup (chkconfig)

The supplied init.d scripts for starting and stopping the discovery components contain chkconfig and LSB 3.1 compliant comments instructing the engine or data tool to start at run levels 2, 3, 4 and 5.

To enable startup at these run levels run:

# chkconfig discovery reset

or

# chkconfig discovery_datatool reset

Data Tool Log Files

When using the discovery_datatool start/stop script, the datatool log file, datatool-[0-9].log, will be located in the logs directory of the named engines sub directory.

As of version 1.10, the Discovery Data Tool uses JVM 1.4 logging and enabled log file rotation. The logging configuration file is called logging.properties.

Monitor the output from the datatool-0.log to ensure the server started successfully.

$ cd ~discovery/engines/production_feed
$ tail -f log/datatool-0.log
[20120110 14:49:03,009] [0000000a] [INFO   ] \
  [com.t11e.discovery.datatool.ConfigurationManager] \
    Checking that the configuration is valid...
[20120110 14:49:03,184] [0000000a] [INFO   ] \
  [com.t11e.discovery.datatool.ConfigurationManager] \
    Configuration is valid.
[20120110 14:49:03,513] [0000000a] [INFO   ] \
  [com.t11e.discovery.datatool.WebServerMain] \
    Discovery Data Tool listening for HTTP on 8089

Engine Log Files

When using the discovery start/stop script, the engine log file, discovery.log, will be located in the logs directory of the named engines sub directory.

As of version 3.13, the Discovery Engine logs via log4j and is preconfigured to rotate the log files by size. You can customize logging by setting the log4j.configuration system property. For more information on configuring the Discovery engine’s logging, refer to Discovery Log File.

Monitor the output from the discovery.log to ensure the server started successfully.

$ cd ~discovery/engines/production
$ tail -f log/discovery.log
Java HotSpot(TM) 64-Bit Server VM version 16.3-b01-279

[20050824 13:14:20,747] [00c1f10e] [info ] \
  [com.t11e.common.property.ConfigurationFactory] Loading configuration \
  from: common
[20050824 13:14:20,914] [00c1f10e] [info ] \
  [com.t11e.common.property.ConfigurationFactory] Loading configuration \
   from: rabbit
[20050824 13:14:20,930] [00c1f10e] [info ] \
  [com.t11e.common.property.ConfigurationFactory] Loading configuration \
  from: zebra
[20050824 13:14:20,979] [00c1f10e] [info ] \
  [com.t11e.common.property.ConfigurationFactory] Loading configuration \
  from: discovery
[20050824 13:14:20,982] [00c1f10e] [warn ] \
  [com.t11e.common.property.ConfigurationFactory] Unable to load configuration \
  from local (ignored)
[20050824 13:14:20,984] [00c1f10e] [info ] \
  [com.t11e.common.property.ConfigurationFactory] Loading configuration \
  from system properties
[20050824 13:14:21,279] [00c1f10e] [info ] [com.t11e.zebra.Application] \
  Starting
[20050824 13:14:24,368] [00c1f10e] [info ] \
 [com.t11e.common.http.JettyHttpServer] Listening on http://localhost:8090

When the engine is running, its status can be checked at http://localhost:8090, the URL of the Admin Tool Interface.

Engine Settings File

As of version 2.8.5, the engine’s settings are stored in discovery.settings. This file should not be edited as it is intended to be managed by the Admin Tool.

When a customer upgrades to version 2.8.5, the discovery.settings file is automatically created with copies of the settings migrated from discovery.properties.

The original settings remain in discovery.properties but they are ignored. Only settings that determine the startup options for the engine’s environment remain in discovery.properties.

For more information, refer to discovery.properties Reference.