Vespa Product Updates, October/November 2019: Nearest Neighbor and Tensor Ranking, Optimized JSON Tensor Feed Format, Matched Elements in Complex Multi-value Fields, Large Weighted Set Update Performance, and Datadog Monitoring Support

Kristian Aune

Kristian Aune

Head of Customer Success, Vespa


In the September Vespa product update, we mentioned Tensor Float Support, Reduced Memory Use for Text Attributes, Prometheus Monitoring Support, and Query Dispatch Integrated in Container.

This month, we’re excited to share the following updates:

Nearest Neighbor and Tensor Ranking

Tensors are native to Vespa. We compared elastic.co to vespa.ai testing nearest neighbor ranking using dense tensor dot product. The result of an out-of-the-box configuration demonstrated that Vespa performed 5 times faster than Elastic. View the test results.

Optimized JSON Tensor Feed Format

A tensor is a data type used for advanced ranking and recommendation use cases in Vespa. This month, we released an optimized tensor format, enabling a more than 10x improvement in feed rate. Read more.

Matched Elements in Complex Multi-value Fields 

Vespa is used in many use cases with structured data – documents can have arrays of structs or maps. Such arrays and maps can grow large, and often only the entries matching the query are relevant. You can now use the recently released matched-elements-only setting to return matches only. This increases performance and simplifies front-end code.

Large Weighted Set Update Performance

Weighted sets in documents are used to store a large number of elements used in ranking. Such sets are often updated at high volume, in real-time, enabling online big data serving. Vespa-7.129 includes a performance optimization for updating large sets. E.g. a set with 10K elements, without fast-search, is 86.5% faster to update.

Datadog Monitoring Support

Vespa is often used in large scale mission-critical applications. For easy integration into dashboards,
Vespa is now in Datadog’s integrations-extras GitHub repository.
Existing Datadog users will now find it easy to monitor Vespa.
Read more.

About Vespa: Largely developed by Yahoo engineers, Vespa is an open source big data processing and serving engine. It’s in use by many products, such as Yahoo News, Yahoo Sports, Yahoo Finance, and the Verizon Media Ad Platform. Thanks to feedback and contributions from the community, Vespa continues to grow.

We welcome your contributions and feedback (tweet or email) about any of these new features or future improvements you’d like to request.

Vespa Product Updates, December 2019: Improved ONNX support, New rank feature attributeMatch().maxWeight, Free lists for attribute multivalue mapping, faster updates for out-of-sync documents, Zookeeper 3.5.6

Kristian Aune

Kristian Aune

Head of Customer Success, Vespa


In the November Vespa product update, we mentioned Nearest Neighbor and Tensor Ranking, Optimized JSON Tensor Feed Format, Matched Elements in Complex Multi-value Fields, Large Weighted Set Update Performance and Datadog Monitoring Support.

Today, we’re excited to share the following updates:

Improved ONNX Support

Vespa has added more operations to its ONNX model API, such as GEneral Matrix to Matrix Multiplication (GEMM) –
see list of supported opsets.
Vespa has also improved support for PyTorch through ONNX,
see the pytorch_test.py example.

New Rank Feature attributeMatch().maxWeight

attributeMatch(name).maxWeight was added in Vespa-7.135.5. The value is  the maximum weight of the attribute keys matched in a weighted set attribute.

Free Lists for Attribute Multivalue Mapping

Since Vespa-7.141.8, multivalue attributes uses a free list to improve performance. This reduces CPU (no compaction jobs) and approximately 10% memory. This primarily benefits applications with a high update rate to such attributes.

Faster Updates for Out-of-Sync Documents

Vespa handles replica consistency using bucket checksums. Updating documents can be cheaper than putting a new document, due to less updates to posting lists. For updates to documents in inconsistent buckets, a GET-UPDATE is now used instead of a GET-PUT whenever the document to update is consistent across replicas. This is the common case when only a subset of the documents in the bucket are out of sync. This is useful for applications with high update rates, updating multi-value fields with large sets. Explore details here.

ZooKeeper 3.5.6

Vespa now uses Apache ZooKeeper 3.5.6 and can encrypt communication between ZooKeeper servers.

About Vespa: Largely developed by Yahoo engineers, Vespa is an open source big data processing and serving engine. It’s in use by many products, such as Yahoo News, Yahoo Sports, Yahoo Finance, and the Verizon Media Ad Platform. Thanks to feedback and contributions from the community, Vespa continues to grow.

We welcome your contributions and feedback (tweet or email) about any of these new features or future improvements you’d like to request.