3.14 (August 4 2014)¶
Improvements¶
ENGINE-952 Support Unicode 6.1 for character normalization and accent folding¶
Updates text normalization and case folding to use ICU instead JDK, decoupling these features from the version of Java that the engine is run on and giving access to the latest Unicode standard.
This has the side effect of obsoleting the normalizeFullWidthChars text dimension option (ENGINE-953).
ENGINE-954 Chinese language support¶
Text dimensions now support transliteration from Traditional Chinese to Simplified Chinese. To take advantage of this feature set the locale for any relevant text dimension (or field) to one of the following:
Language | Country | Locale |
---|---|---|
Chinese | zh | |
Chinese (Simplified) | China | zh_CN |
Chinese (Simplified) | Singapore | zh_SG |
Chinese (Traditional) | Hong Kong | zh_HK |
Chinese (Traditional) | Taiwan | zh_TW |
ENGINE-955 Better Chinese word detection¶
Text dimension now have experimental support for better word segmentation. This uses the smartcn tokenizer from the Lucene project to detect word boundaries using a Hidden Markov Model instead of relying on whitespace.
To enable, set the tokenizer attribute on your text dimension (or field) to smartcn.
ENGINE-969 ENGINE-970 ENGINE-972 Upgrades dependent libraries¶
Upgrades dependent libraries.
library | previous | current |
---|---|---|
jackson-core | 2.3.3 | 2.4.1 |
spring-framework | 4.0.3 | 4.0.5 |
woodstox-core | 4.3.0 | 4.4.0 |
Bug Fixes¶
ENGINE-964 Race conditions when applying changesets and dimensions at the same time¶
Fixes some race conditions in tree dimensions that could cause the engine to jam up if posting a dimensions file when a changeset was already being applied.
Additionally, you no longer need to wait for the dimensions to fully populate if you clear them during engine startup.
ENGINE-965 Item storage can fail when populating indices and apply deltas concurrently¶
Application of deltas during index population could sometimes hit a race condition when writing to the item storage.
The item storage has transitioned from Apache Derby to Apache Lucene which resolves this issue.
See ENGINE-814 and ENGINE-816.
ENGINE-967 Incorrect facet counts for 1D scalar dimensions¶
The 1D scalar dimensions didn’t correctly handle changeset deletion events when recording facet counts. This affects integer, double, time, and long dimensions.
To verify, create a 1D scalar dimension with facet nodes.
<dimension id="example" type="integer">
<element id="small" value="[0, 10)"/>
<element id="medium" value="[10, 20)"/>
<element id="large" value="[20, 30)"/>
</dimension>
Query against just one of the nodes.
{
"criteria": [
{
"dimension": "example",
"id": "small",
"cull": true
}
],
"facets": {
"example": {}
}
}
The reported facet counts for the node will be equal to exactSize and totalSize.
{
"exactSize": 1163,
"totalSize": 1163,
"datasetSize": 6253,
"facets": {
"example": {
"data": {
"small": {
"count": 1163
},
"medium": {
"count": 4658
},
"large": {
"count": 432
}
}
}
}
}
ENGINE-977 Example server.sh and server.bat scripts changes¶
Release 3.13 included some improvements to the engine startup scripts that
introduced the location of the discovery.properties
file as a command
line parameter when executing the jar. The example scripts for custom startup
(server.bat
and server.sh
) were not changed to reflect this parameter.
The last line in both scripts has been changed from:
java -jar lib/discovery.jar
to
java -jar lib/discovery.jar discovery.properties
Compatibility Changes¶
ENGINE-962 Simplify checkpoint triggering mechanism¶
The esoteric checkpoint triggering has been replaced with a more simple mechanism.
As a reminder, the engine wakes up at checkpoint time, performs some tests and if any triggers are met then will go ahead and reate a checkpoint.
The new, more simple triggers are based on the current generation of changesets. They do not include the first changeset and checkpoints are triggered when any of the following conditions are met:
- there are more than 500 changesets
- the total compressed size of the changesets exceeds 100MB
Additionally, the conditions are now logged.
[com.t11e.discovery.item.changeset.checkpoint.DefaultCheckpointPolicy] Checkpointing is required, failed checks trigger it:
[com.t11e.discovery.item.changeset.checkpoint.DefaultCheckpointPolicy] FAIL number of changesets is 1350, maximum is 500
[com.t11e.discovery.item.changeset.checkpoint.DefaultCheckpointPolicy] PASS total compressed bytes is 13128757, maximum is 104857600
ENGINE-814 ENGINE-816 Migrates item storage from Derby to Lucene¶
The internal item storage is now based on Apache Lucene instead of Apache Derby. This fixes some stability issues, improves performance, and reduces disk usage.
Due to this change, the engine will recreate the db/items directory on startup. This process will happen automatic on either upgrade or downgrade.
Removed features¶
ENGINE-963 Changesets tab in the admin interface no longer displays applied counts¶
The applied counts are no longer reported in the right most column on the changesets tab of the admin interface.
An internal webservice option (/ws/changeset?meta=1) now generates more simple XML. The <applied/> elements have been replaced with a single state attribute on the parent element.
Additionally,
Was:
<changesets>
<changeset id="0001" type="delta">
<applied state="applied" created="10" modified="2" deleted="1" />
</changeset>
</changesets>
Now:
<changesets>
<changeset id="0001" type="delta" state="applied"/>
</changesets>
ENGINE-966 Changesets tab of the admin interface no longer displays total item counts¶
The changesets tab no longer displays the number of items in the dataset, instead look at the indices tab.
The internal webservice at /ws/itemsSummary has been removed.
ENGINE-953 Remove normalizeFullWidthChars text dimension attribute¶
Text dimensions no longer support the normalizeFullWidthChars option. Instead, full-width characters are always converted to their half-width equivalents as part of Unicode normalization (ENGINE-952).