Vespa Product Updates, December 2018: ONNX Import and Map Attribute Grouping

Hi Vespa Community!

Today we’re kicking off a blog post series of need-to-know updates on Vespa, summarizing the features and fixes detailed in Github issues.

We welcome your contributions and feedback about any new features or improvements you’d like to see.

For December, we’re excited to share the following product news:

Streaming Search Performance Improvement
Streaming Search is a solution for applications where each query only searches a small, statically determined subset of the corpus. In this case, Vespa searches without building reverse indexes, reducing storage cost and making writes more efficient. With the latest changes, the document type is used to further limit data scanning, resulting in lower latencies and higher throughput. Read more here.

ONNX Integration
ONNX is an open ecosystem for interchangeable AI models. Vespa now supports importing models in the ONNX format and transforming the models into Tensors for use in ranking. This adds to the TensorFlow import included earlier this year and allows Vespa to support many training tools. While Vespa’s strength is real-time model evaluation over large datasets, to get started using single data points, try the stateless model evaluation API. Explore this integration more in Ranking with ONNX models.

Precise Transaction Log Pruning
Vespa is built for large applications running continuous integration and deployment. This means nodes restart often for software upgrades, and node restart time matters. A common pattern is serving while restarting hosts one by one. Vespa has optimized transaction log pruning with prepareRestart, due to flushing as much as possible before stopping, which is quicker than replaying the same data after restarting. This feature is on by default. Learn more in live upgrade and prepareRestart.

Grouping on Maps
Grouping is used to implement faceting. Vespa has added support to group using map attribute fields, creating a group for values whose keys match the specified key, or field values referenced by the key. This support is useful to create indirections and relations in data and is great for use cases with structured data like e-commerce. Leverage key values instead of field names to simplify the search definition. Read more in Grouping on Map Attributes.

Questions or suggestions? Send us a tweet or an email.

Result diversification using Vespa result grouping

Decorative
image

Photo by Robert
Lukeman on Unsplash

Result diversification is a broad topic, and this blog post only scratches the surface of
the problem. One might classify result diversification as a multi-objective list optimization
problem. Conceptually, the core search or recommendation ranking function produces a ranked list of documents
scored on a per-document basis. On the other hand, diversification looks at the overall list of ranked
documents. From a high level perspective, the diversity constraints could be expressed as a
similarity function which re-order or remove similar matches to obtain an
ordering that optimizes variety. This post does not cover query intent diversification where
query specification is ambiguous and where the engine might show different types of results based on most probable query intents.

Before getting into the details on result diversification with
Vespa, one needs to have a basic understanding of the distributed query execution in
Vespa. For a query request that asks for a list of 100 best-ranking hits, the Vespa
stateless container layer will fan out the query to the content nodes. Each content node
produces a locally ranked list of top 100 hits out of potentially hundreds of millions of
documents matching the query specification. The stateless layer
merges the top 100 ranking matches from
all the content nodes and retains the global 100 top-ranking hits.
The number of hits that are returned is controlled by the
hits
or the YQL limit
parameter.
See also Vespa search sizing
for more details on Vespa query execution and scaling Vespa deployments.

In the following sections, a differentiation is made between between hits and
matches, where
matches are documents that match the query formulation. Hits are a typically a small subset of the
matches that are surfaced to the stateless search container.

The Vespa grouping language is a list processing
language that describes how the query
matches should be grouped, aggregated, and presented in results as hits. A grouping
statement takes the list of all matches to a query as input and groups/aggregates it,
possibly in multiple nested and parallel ways to produce the result output.

The Vespa grouping list processing language runs over matches selected by the query formulation,
supporting matches retrieved by traditional query filters, keywords, and nearest neighbor
search. Thus, the Vespa grouping framework allows building rich search experience result
pages with facets and list diversification irrespective of the query retrieval method.

The following YQL query
asks for ten best ranking hits from a doc document type
ranked by the default ranking profile:

select * from doc where userQuery() limit 10;

The list of hits is not diversified in any way, just a flat list
of ten top ranking hits out of possible millions of matches out of possible billions of
documents in the index.

In the first example of using Vespa grouping language, one can imagine there has been an
offline process which have categorized documents
into a predefined category, uniquely identified by a numeric identifier.

schema doc {
  document doc {
    field title type string {..}
    field category type int {
      indexing: summary | attribute
    }
    field doc_embedding type tensor<int8>(x[128]) {}
  }
  rank-profile default inherits default {
    first-phase { expression { .. }} 
  }
  document-summary short {
    summary title type string {
      source: title
    }
  }
}

Fields used in grouping expressions must be declared with attribute.

The following YQL query
groups results for a userQuery() by category.

select * from doc where userQuery() limit 10 | all(group(category) max(10)
each(max(2) each(output(summary(short)))));

The above query and grouping specification groups all matches
by the category field. The undiversified ranked list of ten hits are
retained in the result set when using limit 10.
The hits from the diversified result set is emitted
by the each(output(summary())) expression.

To remove the un-diversified ranked list use limit 0:

select * from doc where userQuery() limit 0 | all(group(category) max(10)
each(max(2) each(output(summary(short)))));

Groups are by default sorted by the maximum hit relevancy score within the
group. The outer max(10) controls the maximum number of groups
returned. In
addition, the two highest-ranking hits (max(2)) is emitted for each of the
unique groups, in this case, categories.
The Vespa grouping API supports pagination
using continuation tokens.
In the above example, the grouping expression used a single document attribute, it is also
possible to group by expressions. See the grouping
reference documentation.
In the below example the group identifier is a concatenation of category and brand:

all(group(cat(category,brand)) max(10)
each(max(2) each(output(summary(short)))));

Result grouping is also supported when using dense retrieval with the
nearest neighbor search query
operator:

select * from doc where
([{"targetHits":100}]nearestNeighbor(doc_embedding,query_embedding)) limit 0 |
all(group(journal) max(10) each(max(2) each(output(summary(short)))));

In the dense retrieval case, using the (approximate) nearest neighbor search operator,
the number of matches exposed to grouping is limited by the targetHits.
This behavior is due to the nature of the
approximate nearest neighbor search. There is no clear separation between a
match or no-match like with sparse term-based query retrieval.

The default
ordering
of groups is, as mentioned, the maximum relevance score of the hits in the group.

all(group(category) max(10) each(max(1) each(output(summary(short)))))

Is the equivalent of

all(group(category) order(-max(relevance())) max(10) each(max(1)
each(output(summary(short)))))

The denotes descending sort order.

It is possible to order groups by more complex expressions working on
match aggregates like sum() and count(), for example:

  • Number of matches in the group times the maximum relevance:
    order(-max(relevance())*count())
  • The sum of a document attribute times the
    maximum relevance order(-max(relevance())*sum(ctr))

The grouping examples in previous sections
are using “bucketing” as the diversity similarity
function. The advantage of group bucketing is scalability. It is fast to compute globally over many nodes,
over many matches. The downside is that the expressiveness of the diversity similarity function is limited.

Once the grouped result hits have been computed in parallel over all content nodes,
the resulting hits can be post-processed using a more complex diversity similarity function. At
the post processing stage the potential large number of matches have been reduced to lists of top ranked hits. Therefore,
the diversity similarity function can use a richer set of features and compute complexity.

This type of architecture is a phased or tiered architecture where the first phase selects
diverse “bucketed” candidates efficiently over potentially a large number of matches and
the subsequent phases post processes the results using a more complex diversity function:

  • Vespa grouping language is used to fetch candidate documents which are diversified and
    ranked using bucketing.
  • Post processing logic on the top result ranking lists from the parallel grouping execution and
    implements a more complex diversity similarity function

It is also considerably easier to implement custom business logic like “Never display more than four of type x for users of type z” in
the post processing stage. The downside of custom post-processing is that it complicates pagination support.

Post-processing diversity and business logic routines are best added by writing stateless
searchers, for example implementing
this diversity algorithm.
Deploying the post processing logic in a searcher avoids network round trips and serialization and
de-serialization. The internal communication protocol between stateless and stateful Vespa nodes
is binary and is secured using mTLS so there is considerably less overhead doing round trips between
a stateless searcher component and the content nodes.

The custom searcher function can also build and process the grouping request and response programmatically, see
Searcher grouping api.

Four main components drive serving performance of query requests that
use Vespa result grouping. In order of importance:

  • The number of matches the query produces per node. All the document matches
    get
    exposed to the grouping framework. The total result hit count of the query is
    equal to the number of matches exposed to grouping.
  • The total number of unique values in the field.
  • Ordering groups – using ordering expressions involving aggregates like count()
    or sum() is more resource-intensive than using the default max relevance order.
  • Finally, the number of nodes involved in the query.

The query selection logic controls the number of matches.
Therefore, efficient
retrievers like weakAnd/wand
or approximate nearest neighbor
search
expose fewer matches to ranking and grouping.
Thus, reducing the number of matches can improve the serving performance significantly and
also, enhance the quality of the groups as low-scoring documents are excluded from the
result grouping phase.
The grouping language also allows limiting the number of matches that are grouped.
For example, to limit the number of matches per node, use an

all(max(K) all(group(category) .... ))

In this expression, K is the maximum number of matches per
node that grouping runs over. Only top-K ranked from the matching and ranking phases
are considered.

It’s also possible to limit the number of matches exposed to grouping by using
the match-phase
degradation ranking
feature. The match-phase limit sets an upper limit on the number of documents matched and ranked,
using a document side quality attribute. The match phase degradation feature supports diversity,
so that the matches exposed to grouping also are diversified. The match-phase diversification is
currently not supported for the approximate nearest neighbor search operator.

The number of unique values the field is data-dependent, fewer unique values is better.
The number of nodes in the cluster
to which the query is fanned out increases network bandwidth. The performance
impact could be mitigated using the
precision parameter,
limiting the number of
unique groups returned to the stateless container nodes and reducing network bandwidth
usage.

This blog post covered how to use Vespa result grouping to produce a mixed
result set for both search and recommendation use cases. As many search and recommendation
use cases it is best solved using a phased execution with gradually increasing complexity.

This post only covered
single-level grouping expressions.
However, the Vespa grouping framework also allows
running multi-level grouping. For
example, group by category, then by brand to further diversify the result.
For further reading, see result
grouping and
result grouping
reference.