Vespa Product Updates, July 2021

Kristian Aune

Kristian Aune

Head of Customer Success, Vespa

In the previous update,
we mentioned bfloat16 and int8 tensor value types, case-sensitive attribute search,
attributes with hashed dictionary and Hamming distance metric for ANN search.

This month, we’re excited to share the following updates:


HTTP/2 is now available for both search and feed endpoints.
HTTP/2 delivers more efficient network usage and increases security.
With HTTP/2, it becomes possible to feed equally efficiently using the
document/v1 REST API as with the Vespa HTTP client.
There is also a new, simplified vespa-feed-client.
Read more.

ONNX RUNTIME in the Vespa Container

We have integrated ONNX RUNTIME also in the stateless Vespa container
which allows ONNX models to be used with:

  • Automatically generated REST API for stateless model serving.
  • Creating lightweight request handlers for serving models with some custom code without the need for content nodes.
  • Model evaluation to Searchers for query processing and enrichment.
  • Model evaluation to Document Processors for transforming content before ingestion.
  • Processing results from the content nodes to add additional ranking phases.

Read more.
is the automated build and test system for – now open for everyone.
to inspect changes in each release as normally releases 4 times a week.
The Vespa Factory is useful to track performance improvements tested in the performance test suite,
see e.g. testrun/31415803.

Berlin Buzzwords 2021 recordings

Approximate nearest neighbor with filtering and real time updates has generated much attention,
and Vespa’s real time indexing structures is well explained in these talks
at Berlin Buzzwords 2021.
The search engine debate is a follow-up to the
Haystack event last January:

About Vespa: Largely developed by Yahoo engineers,
Vespa is an open source big data processing and serving engine.
It’s in use by many products, such as Yahoo News, Yahoo Sports, Yahoo Finance, and the Verizon Media Ad Platform.
Thanks to feedback and contributions from the community, Vespa continues to grow.

Vespa Newsletter, July 2023 | Vespa Blog

Kristian Aune

Kristian Aune

Head of Customer Success, Vespa

In the previous update,
we mentioned multi-vector HNSW Indexing, global-phase re-ranking, LangChain support, improved bfloat16 throughput,
and new document feed/export features in the Vespa CLI.
Today, we’re excited to share Vector Streaming Search, multiple new embedding features,
MIPS support, and performance optimizations:

When searching personal data or other data sets which are divided into many subsets you never search across,
maintaining global indexes is unnecessarily expensive.
Vespa streaming search is built for these use cases, and now supports vectors in searching and ranking.

This enables vector search in personal search use cases such as personal assistants
at typically less than 5% of the usual cost,
while delivering complete rather than approximate results,
something which is often crucial with personal data.
Read more in our announcement blog post.

Use Embedder Models from Huggingface

Vespa now comes with generic support for embedding models hosted on Huggingface.
With the new Huggingface Embedder functionality,
developers can export embedding models from Huggingface
and import them in ONNX format in Vespa for accelerated inference close to where the data is created.
The Huggingface Embedder supports multilingual embedding models as well as multi-vector representations –
read more.

GPU Acceleration of Embedding Models

GPU acceleration of embedding model inferences is now supported,
unlocking larger and more powerful embedding models while maintaining low serving latency.
With this, Vespa embedders can efficiently process large amounts of text data,
resulting in faster response times, improved scalability, and lower cost.

Embedding GPU acceleration is available both on Vespa Cloud and for Open Source Vespa use –
read more.

More models for Vespa Cloud users

As more teams use embeddings to improve search and recommendation use cases,
easy access to models is key for productivity. From the paper:

E5 is a family of state-of-the-art text embeddings that transfer well to a wide range of tasks.
The model is trained in a contrastive manner with weak supervision signals
from our curated large-scale text pair dataset (called CCPairs).
E5 can be readily used as a general-purpose embedding model for any tasks
requiring a single-vector representation of texts such as retrieval, clustering, and classification,
achieving strong performance in both zero-shot and fine-tuned settings.

Vespa Cloud users can find a set of E5 models on the
model hub.

Dotproduct distance metric for ANN

The Maximum Inner Product Search (MIPS) problem arises naturally in recommender systems,
where item recommendations and user preferences are modeled with vectors,
and the scoring is just the dot product (inner product) between the item vector and the query vector.

Vespa supports a range of distance metrics
for approximate nearest neighbor search.
Since 8.172, Vespa supports a dotproduct distance metric,
used for distance calculations and an extension to HNSW index structures.
Read more about how using an extra dimension to map points on a 3D hemisphere
makes the vector have the same magnitude and hence solvable as a nearest neighbor problem in the
blog post.

Optimizations and features

  • Query using emojis!
    The Unicode Characters of Category “Other Symbol” contains emojis, math symbols, etc.
    From Vespa 8.172 these are indexed as letter characters to support searching for them.
    E.g., you can now try vespa query ‘select * from music where song contains “🍉“‘.
  • Sorting on multivalue fields like array
    or weightedset is now supported:
    Ascending sort order uses the lowest value while descending sort order uses the highest value.
    E.g., descending order sort on an array field with [“apple”, “banana”, “melon”] will use “melon” as the sort value –
    see the reference documentation.
  • Since Vespa 8.185, you can balance feed vs query resource usage using feeding
    niceness – use this configuration to de-prioritize feeding.
  • Since Vespa 8.178, users can use conditional puts with auto-create –
    read more.
  • With lidspace max-bloat-factor
    you can fine tune this compaction job in the content node – since Vespa 8.171.
  • Vespa supports multivalue attributes,
    like arrays and maps.
    In Vespa 8.181 the static memory usage of multivalue attributes is reduced by up to 40%.
    This is useful for applications with many such fields, with little data each –
    see #26640 for details.

Blog posts since last newsletter

Thanks for reading! Try out Vespa on Vespa Cloud
or grab the latest release at and run it yourself! 😀