3.14 (August 4 2014)

Improvements

ENGINE-952 Support Unicode 6.1 for character normalization and accent folding

Updates text normalization and case folding to use ICU instead JDK, decoupling these features from the version of Java that the engine is run on and giving access to the latest Unicode standard.

This has the side effect of obsoleting the normalizeFullWidthChars text dimension option (ENGINE-953).

ENGINE-954 Chinese language support

Text dimensions now support transliteration from Traditional Chinese to Simplified Chinese. To take advantage of this feature set the locale for any relevant text dimension (or field) to one of the following:

Language Country Locale
Chinese   zh
Chinese (Simplified) China zh_CN
Chinese (Simplified) Singapore zh_SG
Chinese (Traditional) Hong Kong zh_HK
Chinese (Traditional) Taiwan zh_TW

ENGINE-955 Better Chinese word detection

Text dimension now have experimental support for better word segmentation. This uses the smartcn tokenizer from the Lucene project to detect word boundaries using a Hidden Markov Model instead of relying on whitespace.

To enable, set the tokenizer attribute on your text dimension (or field) to smartcn.

ENGINE-969 ENGINE-970 ENGINE-972 Upgrades dependent libraries

Upgrades dependent libraries.

library previous current
jackson-core 2.3.3 2.4.1
spring-framework 4.0.3 4.0.5
woodstox-core 4.3.0 4.4.0

Bug Fixes

ENGINE-964 Race conditions when applying changesets and dimensions at the same time

Fixes some race conditions in tree dimensions that could cause the engine to jam up if posting a dimensions file when a changeset was already being applied.

Additionally, you no longer need to wait for the dimensions to fully populate if you clear them during engine startup.

ENGINE-965 Item storage can fail when populating indices and apply deltas concurrently

Application of deltas during index population could sometimes hit a race condition when writing to the item storage.

The item storage has transitioned from Apache Derby to Apache Lucene which resolves this issue.

See ENGINE-814 and ENGINE-816.

ENGINE-967 Incorrect facet counts for 1D scalar dimensions

The 1D scalar dimensions didn’t correctly handle changeset deletion events when recording facet counts. This affects integer, double, time, and long dimensions.

To verify, create a 1D scalar dimension with facet nodes.

<dimension id="example" type="integer">
    <element id="small" value="[0, 10)"/>
    <element id="medium" value="[10, 20)"/>
    <element id="large" value="[20, 30)"/>
</dimension>

Query against just one of the nodes.

{
  "criteria": [
    {
      "dimension": "example",
      "id": "small",
      "cull": true
    }
  ],
  "facets": {
    "example": {}
  }
}

The reported facet counts for the node will be equal to exactSize and totalSize.

{
    "exactSize": 1163,
    "totalSize": 1163,
    "datasetSize": 6253,
    "facets": {
        "example": {
            "data": {
                "small": {
                    "count": 1163
                },
                "medium": {
                    "count": 4658
                },
                "large": {
                    "count": 432
                }
            }
        }
    }
}

ENGINE-977 Example server.sh and server.bat scripts changes

Release 3.13 included some improvements to the engine startup scripts that introduced the location of the discovery.properties file as a command line parameter when executing the jar. The example scripts for custom startup (server.bat and server.sh) were not changed to reflect this parameter.

The last line in both scripts has been changed from:

java -jar lib/discovery.jar

to

java -jar lib/discovery.jar discovery.properties

Compatibility Changes

ENGINE-962 Simplify checkpoint triggering mechanism

The esoteric checkpoint triggering has been replaced with a more simple mechanism.

As a reminder, the engine wakes up at checkpoint time, performs some tests and if any triggers are met then will go ahead and reate a checkpoint.

The new, more simple triggers are based on the current generation of changesets. They do not include the first changeset and checkpoints are triggered when any of the following conditions are met:

  1. there are more than 500 changesets
  2. the total compressed size of the changesets exceeds 100MB

Additionally, the conditions are now logged.

[com.t11e.discovery.item.changeset.checkpoint.DefaultCheckpointPolicy] Checkpointing is required, failed checks trigger it:
[com.t11e.discovery.item.changeset.checkpoint.DefaultCheckpointPolicy]     FAIL number of changesets is 1350, maximum is 500
[com.t11e.discovery.item.changeset.checkpoint.DefaultCheckpointPolicy]     PASS total compressed bytes is 13128757, maximum is 104857600

ENGINE-814 ENGINE-816 Migrates item storage from Derby to Lucene

The internal item storage is now based on Apache Lucene instead of Apache Derby. This fixes some stability issues, improves performance, and reduces disk usage.

Due to this change, the engine will recreate the db/items directory on startup. This process will happen automatic on either upgrade or downgrade.

Removed features

ENGINE-963 Changesets tab in the admin interface no longer displays applied counts

The applied counts are no longer reported in the right most column on the changesets tab of the admin interface.

An internal webservice option (/ws/changeset?meta=1) now generates more simple XML. The <applied/> elements have been replaced with a single state attribute on the parent element.

Additionally,

Was:

<changesets>
    <changeset id="0001" type="delta">
        <applied state="applied" created="10" modified="2" deleted="1" />
    </changeset>
</changesets>

Now:

<changesets>
    <changeset id="0001" type="delta" state="applied"/>
</changesets>

ENGINE-966 Changesets tab of the admin interface no longer displays total item counts

The changesets tab no longer displays the number of items in the dataset, instead look at the indices tab.

The internal webservice at /ws/itemsSummary has been removed.

ENGINE-953 Remove normalizeFullWidthChars text dimension attribute

Text dimensions no longer support the normalizeFullWidthChars option. Instead, full-width characters are always converted to their half-width equivalents as part of Unicode normalization (ENGINE-952).