4.2 (May 24 2017)

Compatibility Changes

ENGINE-1057 Paged search results now have a soft limit of 10,000

Any combination of startIndex and pageSize that exceed this limit will act as if the search only contained 10,000 matches.

See max-paged-docs to learn about relaxing this limit.

ENGINE-1054 No longer restricts startIndex to be less than totalSize

If you request a page that starts after the last result the returned startIndex is no longer capped based on the totalSize of the results

Query:

{
  "startIndex": 100,
  "pageSize": 10,
  "criteria":[{"dimension":"example"}]
}

Now preserves startIndex:

{
  "itemIds": [],
  "exactMatches": [],
  "relevanceValues": [],
  "pageSize": 10,
  "currentPageSize": 0,
  "startIndex": 100,
  "exactSize": 24,
  "totalSize": 59,
  "datasetSize": 5000
}

Previously the capped based on totalSize:

{
  "itemIds": [],
  "exactMatches": [],
  "relevanceValues": [],
  "pageSize": 10,
  "currentPageSize": 0,
  "startIndex": 59,
  "exactSize": 24,
  "totalSize": 59,
  "datasetSize": 5000
}

Improvements

ENGINE-1062 Dynamically defined facets for numeric dimensions

You can now specify the facet buckets for numeric dimensions (integer, long, double, time) at query time as an alternative to specifying them in the dimensions file (index time). This is useful when you have facet bounds that change frequently.

Previously you could only specify facet buckets in the dimensions file like so.

<dimension id="available" type="time"format="yyyy-dd-MM">
  <element id="today" value="2017-01-30"/>
  <element id="tomorrow" value="2017-02-31"/>
  <element id="thisweekend" value="[2017-02-04, 2017-02-05]"/>
  <element id="nextweek" value="[2017-02-06, 2017-02-12]"/>
</dimension>

Now you can trim down the dimensions definition and specify the buckets at query time.

<dimension id="available" type="time"format="yyyy-dd-MM"/>
{
    "facets": {
        "available": {
            "dynamic": {
                "today": "2017-01-30",
                "tomorrow": "2017-02-31",
                "thisweekend": "[2017-02-04, 2017-02-05]",
                "nextweek": "[2017-02-06, 2017-02-12]"
            }
        }
    }
}

See dynamic facets field to learn more.

ENGINE-1041 Migrate away from legacy numeric fields to the new Lucene 6 point fields

Lucene 6 introduced better support for indexing numeric data with it’s new N-dimensional point fields. This replaces their previous term/trie based numeric fields. The engine now takes advantage of the new field types for integer, double, long, time, and geoloc dimensions.

ENGINE-1064 No longer show zero count rows on the indices tab for keyword dimensions

Rows with a count of zero are now filtered out when displaying the indices tab for keyword dimensions. This is helpful for quick data navigation when combined with the provider and content type filters.

ENGINE-1049 Upgrades Apache Lucene from 6.2.1 to 6.5.1

The Apache Lucene library has been upgraded from 6.2.1 to 6.5.1.

ENGINE-1057 Adds soft limit for deep paging and groupBy topN

Using startIndex and pageSize to obtain search results past the first 10,000 hits is now prevented. Similarly a previously undocumented limit of 100 for the groupBy topN option is now documented and configurable.

See max-paged-docs and max-groupby-topn to learn about relaxing these limits.

ENGINE-1050 Support for specifying different sort criteria for fuzzy tail

Adds soryByFuzzy as an option that allows you to specify a different sort order for the fuzzy tail. When enabled this forces exact matches to be ordered first.

Example to randomize exact matches and order fuzzy matches by distance to a location:

{
  "criteria": [{"dimension":"example"}],
  "pageSize": 20,
  "sortBy": [
    {"builtin":"random"}
  ],
  "sortByFuzzy": [
    {"dimension":"location","longitude":-74.04,"latitude":40.69}
  ]
}

ENGINE-1056 Debug API no longer exposes placeholder fuzzy queries when there is no fuzzy tail

The debug API response will no longer contain queryFuzzy or explainFuzzy when there is no fuzzy tail. Previously a fake empty fuzzy tail would be described.

ENGINE-1055 Invalid queries now return a HTTP 400 instead of 500

Queries that failed validation would return a HTTP 500 status (server error). They now return a HTTP 400 status (bad request) and log the payload.

You can trigger this by POSTing a query with either invalid JSON or a negative startIndex.

ENGINE-1058 Upgrades dependent libraries

Upgrades dependent libraries.

library previous current
icu4j 56.1 58.2
jetty 8.1.15 9.4.1
springframework 4.1.8 4.3.6
commons-fileupload 1.3.1 1.3.2
commons-io 2.4 2.5
slf4j 1.7.13 1.7.22

Bug Fixes

ENGINE-1063 Indices tab forgets the selected dimension when you change the provider or content type

If you have a provider or content type filter dimension configured in the settings tab of the admin interface your current selection would be lost when you change the provider or content type when looking at a single dimension’s data on the indices tab. You can now change these selections without loosing your place.

ENGINE-1059 XML entity limit in Java 8u101

In release 4.1 we migrated from our older bundled Woodstox XML parser to the one provided by the JVM. Oracle updated this in Java 8u101 to have a default limit to the number of entity expansions that could happen. For some client changesets this limit is hit and the changeset cannot be processed. Generating an error like so:

[20170105 10:04:10,698] [0000001c] [ERROR] [com.t11e.discovery.lucene.ChangesetIndexUpdater] [index-0] Problem processing changesets, will no longer process changes
java.lang.RuntimeException: java.lang.RuntimeException: javax.xml.stream.XMLStreamException: ParseError at [row,col]:[929140,10699]
Message: JAXP00010004: The accumulated size of entities is "50,000,001" that exceeded the "50,000,000" limit set by "FEATURE_SECURE_PROCESSING".
        at com.t11e.discovery.lucene.ChangesetRecoverer.apply(ChangesetRecoverer.java:472)

The engine now sets the appropriate JAXP property to ensure this limit it not enforced.

This affects Java 8u101 and higher, clients still using Java 8u91 or lower are not affected by this bug. If you aren’t ready to upgrade to release 4.2 and need a temporary workaround for release 4.1 you can set add -Djdk.xml.totalEntitySizeLimit=0 to the jvm.args line in your discovery.properties file.

ENGINE-1052 Queries that use groupBy and a custom sortBy can have an incorrectly populated exactMatches array

Triggered when using groupBy and a custom sortBy that does not place exact matches first. When populating the exactMatches array, the engine was incorrectly promoting groups to exact if their first matching document (based on the current sortBy) was exact instead of making a group exact when any of it’s matching documents are exact. With this change, the contents of the exactMatches array agree with exactSize and any facet or drillDown counts.

Query:

{
  "criteria": [{"dimension":"example"}],
  "groupBy": {"dimension":"group"},
  "sortBy": [{"builtin":"exactMatch","reverse":true}],
  "pageSize": 10
}

Could previously return:

{
  "itemIds": ["g1","g2","g3"],
  "exactMatches": [false,false,true],
  "relevanceValues": [1.0,1.0,1.0],
  "isGrouped": true,
  "pageSize": 10,
  "currentPageSize": 3,
  "startIndex": 0,
  "exactSize": 2,
  "totalSize": 3,
  "datasetSize": 5000
}

And will now return:

{
  "itemIds": ["g1","g2","g3"],
  "exactMatches": [true,false,true],
  "relevanceValues": [1.0,1.0,,1.0],
  "isGrouped": true,
  "pageSize": 10,
  "currentPageSize": 3,
  "startIndex": 0,
  "exactSize": 2,
  "totalSize": 3,
  "datasetSize": 5000
}

ENGINE-1053 Returned page can be too large when spanning the exact/fuzzy boundary with groupBy enabled

If the current page spanned the exact and fuzzy boundary when using groupBy the parallel arrays in the response would be too long. This bug was introduced in release 4.0.

Query:

{
  "criteria": [{"dimension":"mysearch"}],
  "groupBy": {"dimension": "mygroup"},
  "startIndex": 1,
  "pageSize": 2
}

Could previously return:

{
  "itemIds": ["g2","g3","g4"],
  "exactMatches": [true,false,false],
  "relevanceValues": [1.0,0.0,0.0],
  "isGrouped": true,
  "pageSize": 2,
  "currentPageSize": 3,
  "startIndex": 1,
  "exactSize": 2,
  "totalSize": 35,
  "datasetSize": 5000
}

And will now return:

{
  "itemIds": ["g2","g3"],
  "exactMatches": [true,false],
  "relevanceValues": [1.0,0.0],
  "isGrouped": true,
  "pageSize": 2,
  "currentPageSize": 2,
  "startIndex": 1,
  "exactSize": 2,
  "totalSize": 35,
  "datasetSize": 5000
}