4.1 (October 11 2016)

Compatibility Changes

ENGINE-995 Minimum Java version is now Java 8

Oracle stopped supporting Java 7 on April 2015.

The Discovery Search Engine now requires a minimum of Java 8 to run.

Improvements

ENGINE-1039 Upgrades Apache Lucene from 6.1.0 to 6.2.1

The Apache Lucene library has been upgraded from 6.1.0 to 6.2.1.

ENGINE-1048 Feed fetches now log duration

Log entries for feed fetches now include an indication of how long the fetch took.

You will now see log entries similar to the following:

[2016-08-31 14:07:30,470] [INFO ] Fetched dimensions after 609ms from http://localhost:8092
[2016-08-31 14:19:20,657] [INFO ] Fetched changeset feed after 143ms from http://localhost:8089/ws/publisher/example?profile=test
[2016-08-31 14:05:30,699] [WARN ] Unable to connect when fetching changeset feed from: http://example.com:8089
[2016-08-31 14:06:38,421] [ERROR] Problem fetching changeset feed after 8ms from http://localhost:8089/ws/publisher/invalid?profile=test
com.t11e.discovery.common.http.HttpErrorException: [404] Not Found http://localhost:8089/ws/publisher/invalid?profile=test
... SNIP ...
[2016-08-31 14:08:21,963] [INFO ] Fetched changesets after 44s:135ms from http://localhost:8091

ENGINE-1043 Adds highlighting support to keyword dimensions

Highlighting now supports keyword dimensions.

To preserve backward compatibility highlighting only applies to text dimensions by default. To enable highlighting for a keyword dimension you need to add it to the the new includeDimensions option of the highlighting request. Be aware that setting this option requires you additionally supply any text dimensions that you wish highlighting to apply to.

The following example will highlight only dimensions a and c as long as they have a type of text or keyword.

{
  "criteria": [
    {"dimension":"a","value":"foo"},
    {"dimension":"b","value":"bar"},
    {"dimension":"c","value":"baz"}
  ],
  "highlighting": {
    "includeDimensions": ["a", "c"]
  },
  "pageSize": 10
}

For more information, see Highlighting Criterion.

ENGINE-1030 Upgrades dependent libraries

Upgrades or removes dependent libraries.

library previous current
jackson-core-asl 2.5.3 removed
woodstox-core-asl 4.4.1 removed
commons-collections 4.0 removed
commons-lang3 3.4 removed
slf4j 1.7.12 1.7.13
spring-framework 4.1.6 4.1.8
icu4j 55.1 56.1

ENGINE-1031 Split the project into modules

The project has been split up into smaller modules to allow for quicker iteration of new features. These modules are only visible as extra jar files in the lib/ directory of the distribution.

ENGINE-1032 Improved debug timing

The timings generated by the Debug API are now nested when groupBy is used. The key total has been renamed to _total and the key _other has been added to account for non-annotated work.

Adding this to your query:

{
    "debug": {"time": true}
}

Used to generate this:

{
  "debug" {
    "time": {
      "groupBy": {
        "context": 0
      },
      "search": 1,
      "searchFuzzy": 56,
      "count": 7,
      "page": 0,
      "properties": 0,
      "higlighting": 1,
      "indexValues": 0,
      "total": 79
    }
  }
}

But now generates this:

{
  "debug": {
    "time": {
      "groupBy": {
        "context": 0,
        "secondPass": 0,
        "secondPassFuzzy": 5,
        "secondPassMerge": 0,
        "properties": 2,
        "_total": 8
      },
      "search": 3,
      "searchFuzzy": 49,
      "count": 6,
      "topGroups": 0,
      "properties": 0,
      "highlighting": 3,
      "indexValues": 0,
      "_other": 1,
      "_total": 73
    }
  }
}

ENGINE-1033 Expose background merges

The background process that merges Lucene segments is now exposed via the same progress service used to report changeset application.

Example log snippet:

[2016-08-01 15:16:16,634] [INFO ] Merging segments on index-1 using 1 thread across 10 segments, 0 deleted docs, 72848 max docs, and 193.9 MiB of disk space [###############] [done]   100% [203,336,114/203,336,114] (9s, 203,336,114 bytes, 21,191,882.647 bytes/sec, 21,191,882 bytes/sec)

ENGINE-1034 Improve progress display on the status page

Improves the layout of the progress table on the status page. The longer changeset identifiers and paths were causing the description column to take up too much space.

ENGINE-1035 Automatically save completed merges, even if no changesets are flowing

Background segment merges that finish after the last changeset are now automatically committed when no more changesets are forthcoming. The engine also checks for pending merges on startup when the changeset backlog is empty.

ENGINE-1036 Don’t convert the topN grouped data when not needed

Small optimization to only convert the topN grouped rows when necessary.

ENGINE-1029 Upgrades Lucene

Upgrade Lucene from 5.4.0 to 6.1.0.

Note that Lucene 6 no longer supports Java 7, see ENGINE-995 above.

Also: ENGINE-1037 (Lucene 5.4.1), ENGINE-1028 (Lucene 5.5.0), ENGINE-1029 (Lucene 6.0, 6.0.1)

ENGINE-1038 Change default Lucene similarity

This is an internal change that migrates the engine from our custom BoostOnlySimilarity to BM25Similarity which is the new Lucene 6.0 default.

Bug Fixes

ENGINE-1051 Fixes groupBy query building for some multi-valued single criterion searches

An uncommon edge case was found in process used to generate the query used to support a groupBy search that applied to only optional clauses. When triggered, the engine would return extra results at the end of the fuzzy section that were not matches for the original (simple) query.

This is triggered with a single criterion search that contains multiple values.

To reproduce, index the following documents:

{"_id":"hit","group":"good","tag":"a"}
{"_id":"miss","group":"bad"}

Against these dimensions:

<dimension id="group" type="groupBy"/>
<dimension id="tag" type="keyword"/>

And execute this query:

{
  "criteria": [
    {
      "dimension": "tag",
      "id": ["a","b"],
      "cull": true
    }
  ],
  "groupBy": {
    "dimension": "group"
  },
  "pageSize": 2
}

Versions of the engine before 4.0 correctly returned just the group good but version 4.0 incorrectly returned the group bad. This has now been resolved.

ENGINE-1044 Better fragment highlighting of arrays

The fragment highlighter would convert arrays of indexed values to a single string instead of highlighting each element individually. It now treats arrays the same as the inline highlighter.

With this example document:

{
  "test": [
    "foo",
    "bar",
    "baz"
  ]
}

Fragment highlighting of bar would generate:

{
  "example": "[foo, <b>bar</b>"
}

It now generates:

{
  "test": [
    "foo",
    "<b>bar</b>",
    "baz"
  ]
}

ENGINE-1045 Hit detection in indexValues only uses the first criterion for each dimension

Only the first matching criterion was used for hitDetection using the indexValues API.

With the following inputs.

Dimension:

<dimension id="example" type="keyword"/>

Item:

{
 "_id": "001",
 "example": ["a", "b"]
}

Query:

{
 "criteria": [
   {"dimension":"example","value":"a"},
   {"dimension":"example","value":"b"}
 ],
 "indexValues": [
   {"dimension":"test","hitDetection":true}
 ],
 "pageSize": 1
}

The engine would previously detect only hits that match the first criterion for the dimension.

{
 "indexValues": {
   "example": {
     "value": [
       ["a", "b"]
     ],
     "hit": [
       [true, false]
     ]
   }
 }
}

It now correctly detects b as being a hit.

{
  "indexValues": {
    "example": {
      "value": [
        ["a", "b"]
      ],
      "hit": [
        [true, true]
      ]
    }
  }
}

ENGINE-1046 Index switching pauses if the most recent changeset is a reset

When switching indices due to a dimension file change, the engine would leave the old index in place if the most recent changeset was of type reset. Pushing a non-reset changeset to the engine would allow the index switch to occur. Seeing as there are no indexed items during this event, this bug only affects clients that are pushing different dimension files to an engine that has been reset for test purposes.

To reproduce:

  1. Start a test engine
  2. Hit the “Clear all changesets and indexed data” button on the changesets tab
  3. Upload a new dimensions file on the dimensions tab
  4. Check the indices tab to see if index has the new dimensions applied

ENGINE-1042 User quoted phrases can miss some hits

When performing a search with a user-quoted phrase the engine would sometimes stop processing alternate terms at a position for the current segment. This would cause it to miss matching documents.

You query must contain a user-quoted phrase (the JSON escaped quotes) to trigger this bug:

{
  "criteria": [
    {
      "dimension": "example",
      "value": "\"Bob's phrasing with alternate terms\""
    }
  ]
}