Vespa Newsletter, November 2021 | Vespa Blog

Kristian Aune

Kristian Aune

Head of Customer Success, Vespa

In the previous update,
we mentioned vespa CLI, Nearest neighbor search performance improvement, Paged tensor attributes, mTLS,
improved Feed performance, and the SentencePiece Embedder. This time, we have the following updates:

Schema Inheritance

In applications with multiple document types
it is often convenient to put common fields in shared parent document types to avoid duplication.
This is done by declaring that the document type in a schema inherits other types.

However, this does not inherit the other elements of the schema,
such as rank profiles and fields outside the document.
From 7.487.27 onwards, you can also let a schema inherit another.
It will then include all the content of the parent schema, not just the document type part.

In Vespa 7.498.22, we also added support for lettings structs inherit each other;
see #19949.

Improved data dump performance

The visit operation
is used to export data in batch from a Vespa instance.
In November, we added features to increase throughput when visiting a lot of data:

  • Streaming HTTP responses enables higher throughput,
    particularly where the client has high latency to the Vespa instance.
  • Slicing lets you partition the selected document space
    and iterate over the slices in parallel using multiple clients to get linear scaling with the number of clients.

Matching all your documents

Vespa now has a true query item,
simplifying queries matching all documents, like select * from sources music, books where true.

More query performance tuning

More configuration options are added for query performance tuning:

  • min-hits-per-thread
  • termwise-limit
  • num-search-partitions

These address various aspects of query and document matching,
see the schema reference.

Faster deployment

Vespa application packages can become large, especially when you want to use modern large ML models.
Such applications will now deploy faster, due to a series of optimizations we have made over the last few months.
Distribution to content nodes is faster, and rank profiles are evaluated in parallel using multiple threads –
we have measured an 8x improvement on some complex applications.

Hamming distance

Bitwise Hamming distance is now supported as a mathematical operation in
ranking expressions,
in addition to being a distance metric option in nearest neighbor searches.

The neural search paradigm shift

November 8, Jo Kristian Bergum from the Vespa team presented
From research to production – bringing the neural search paradigm shift to production at Glasgow University.
The slides are available

About Vespa: Largely developed by Yahoo engineers,
Vespa is an open source big data processing and serving engine.
It’s in use by many products, such as Yahoo News, Yahoo Sports, Yahoo Finance, and the Yahoo Ad Platform.
Thanks to feedback and contributions from the community, Vespa continues to grow.

Vespa Newsletter, December 2021 | Vespa Blog

Kristian Aune

Kristian Aune

Head of Customer Success, Vespa

In the previous update,
we mentioned schema inheritance, improved data dump performance,
“true” query item, faster deployments and Hamming distance for ranking.
This time, we have the following updates:

Tensor performance improvements

Since Vespa 7.507.67, Euclidian distance calculations using int8 are 250% faster, using HW-accelerated instructions.
This speeds up feeding to HSNW-based indices, and reduces latency for nearest neighbor queries.
This is relevant for applications with large data sets per node – using int8 instead of float uses 4x less memory,
and the performance improvement is measured to bring us to 10k puts/node when using HSNW.

With Vespa 7.514.11, tensor field memory alignment for types <= 16 bytes is optimized.
E.g. a 104 bit = 13 bytes int8 tensor field will be aligned at 16 bytes, previously 32, a 2x improvement.
Query latency might improve too, due to less memory bandwidth used.

Refer for #20073 Representing SPANN with Vespa
for details on this work, and also see
Bringing the neural search paradigm shift to production
from the London Information Retrieval Meetup Group.

Match features

Any Vespa rank feature or function output can be returned along with regular document fields by adding it to the list of
summary-features of the rank profile.
If a feature is both used for ranking and returned with results,
it is re-calculated by Vespa when fetching the document data of the final result
as this happens after the global merge of matched and scored documents.
This can be wasteful when these features are the output of complex functions such as a neural language model.

The new match-features
allows you to configure features that are returned from content nodes
as part of the document information returned before merging the global list of matches.
This avoids re-calculating such features for serving results
and makes it possible to use them as inputs to a (third) re-ranking evaluated over the globally best ranking hits.
Furthermore, calculating match-features is also part of the
multi-threaded per-matching-and-ranking execution on the content nodes,
while features fetched with summary-features are single-threaded.

Vespa IntelliJ plugin

Shahar Ariel has created an IntelliJ plugin for editing schema files,
find it at
Thanks a lot for the contribution!

Vespa Newsletter, January 2022 | Vespa Blog

Kristian Aune

Kristian Aune

Head of Customer Success, Vespa

In the previous update,
we mentioned Tensor performance improvements, Match features and the Vespa IntelliJ plugin.
Today, we’re excited to share the following updates:

Faster node recovery and re-balancing

When Vespa content nodes are added or removed,
data is auto-migrated between nodes
to maintain the configured data distribution.
The throughput of this migration is throttled to avoid impact to regular query and write traffic.
We have worked to improve this throughput by using available resources better,
and since November we have been able to approximately double it –
read the blog post.

Reindexing speed

Most schema changes in Vespa are effected immediately,
but some require re-indexing.
Reindexing the corpus can take time, and consumes resources.
It is now possible to configure how fast to re-index in order to balance this tradeoff,
see reindex speed.
Read more about schema changes.


pyvespa 0.14.0 is released with the following changes:

  • Add retry strategy to delete_data,
    get_data and update_data (#222).
  • Deployment parameter disk_folder defaults to the current working directory for both Docker and Cloud deployments
  • Vespa connection now accepts cert and key as separate arguments.
    Using both certificate and key values in the cert file continue to work as before

Explore the new text-image
and text-video sample applications with pyvespa,
and read more about pyvespa.

Improved support for Weak And and unstructured user input

You can now use type=weakAnd in the Query API.
Used with userInput,
it is easy to create a query using weakAnd
with unstructured input data in a query, for a better relevance / performance tradeoff compared to all / any queries.


Semantic Rules have added better support for making synonym expansion rules through the * operator,
see #20386,
and proper stemming in multiple languages,
see Semantic Rules directives.
Read more about query rewriting.

Language detection

If no language is explicitly set in a document or a query, and stemming/nlp tokenization is used,
Vespa will run a language detector on the available text.
Since Vespa 7.518.53, the default has changed from Optimaize to OpenNLP.
Read more.

New blog posts

  • ML model serving at scale
    is about model serving latency and concurrency,
    and is a great primer on inference threads, intra-operation threads and inter-operation threads.
  • Billion-scale knn part two
    goes in detail on tensor vector precision types, memory usage, precision and performance
    for both nearest neighbor and approximate nearest neighbor search.
    Also learn how HNSW works with number of links in the graph and neighbors to explore at insert time,
    and how this affects precision.

Vespa Newsletter, April 2022 | Vespa Blog

Kristian Aune

Kristian Aune

Head of Customer Success, Vespa

In the previous update,
we mentioned faster node recovery and re-balancing, reindexing speed, WeakAnd, synonyms and language detection.
Today, we’re excited to share the following updates:

In the news

In Dimitry Khan’s
Vector Podcast,
enjoy Jo Kristian Bergum from the Vespa Team in the

Journey of Vespa from Sparse into Neural Search.
This is a great 90 minutes of Vespa, vector search, multi-stage ranking and approximate nearest neighbor, and more!

Compact tensor format

Vespa now supports short form parsing for unbound dense (e.g. tensor(d0[],d1[])),
and partially unbound (e.g. tensor(d0[],d1[128]).
Available since Vespa 7.459.15.
Refer to document-json-format.html#tensor and

Modular rank profiles

A rank-profile
is a named set of ranking expression functions and -settings which can be selected in the query.
Complex applications typically have multiple schemas and rank profiles.
Now, multiple inheritance of rank profiles and support for defining profiles in separate files
is supported from Vespa 7.538.


Result Grouping is used to aggregate data in fields in query hits,
to implement use cases like number of items per category, inventory check, maximum values per category, etc.
As the aggregation functions possibly spans the full corpus, temporary memory usage can be a problem for some queries.
Use the new configuration parameters
defaultMaxHits and
to control grouping result set sizes.


pyvespa is Vespa’s simplified python bindings for query and ranking experiments.
With pyvespa 0.16.0, it is possible to specify the
Docker image –
use this for M1 testing, ref pyvespa#231.
With pyvespa 0.17.0, one can deploy to Docker using POST, without using a disk mount –
see pyvespa#296.

New query guides

Vespa has unmatched query performance for (approximate) nearest neighbor search
with filters and real-time update-able fields.
It can however be a challenge to balance the cost/performance tradeoffs to get the configuration optimal.
The new guides practical search performance
and nearest neighbor search are great resources,
exploring multithreaded queries, use of embeddings, HNSW configuration and multivalue query operators and more –
including advanced query tracing.

Get ready for Vespa 8

Vespa uses semantic versioning and releases new features continuously on minor versions.
Major version changes are used to mark versions which break compatibility,
by removing previously deprecated features, changing default values and similar.
The next time this happens will be in June, when we release Vespa 8.
Review the release notes to make sure your applications
are compatible with Vespa 8.

Vespa Newsletter, June 2022 | Vespa Blog

Kristian Aune

Kristian Aune

Head of Customer Success, Vespa

In the previous update,
we mentioned tensor formats, grouping improvements, new query guides,
modular rank profiles and pyvespa docker image and deployments.
Today, we’re excited to share the following updates:

Vespa 8

Vespa 8 is released. Vespa is now on Java 17 and
CentOS Stream 8.
Read more about what this means for you in the blog post.

Pre/Post ANN filtering support

Approximate Nearest Neighbor is a popular feature in Vector Search applications, also supported in Vespa.
Vespa has integral support for combining ANN search with filters,
like “similar articles to this, in US market, not older than 14 days”.
From Vespa 7.586.113, users can configure whether to use pre- or post-filtering, with thresholds.
This enables a much better toolset to trade off precision with performance, i.e. balance cost and quality.
Read more in constrained-approximate-nearest-neighbor-search.

Fuzzy matching

Thanks to alexeyche, Vespa supports fuzzy query matching since 7.585 –
a user typing “spageti” will now match documents with “spaghetti”.
This is implemented using Levenshtein edit distance search –
e.g. one must make two “edits” (one-character changes) to make “spaghetti” from “spageti”.
Find the full contribution in #21689 and documentation at


pyvespa 0.22 introduces an experimental ranking module
to support learning to rank tasks that can be applied to
data collected from Vespa applications containing ranking features.
It starts by creating a listwise ranking framework based on TensorFlow Ranking that covers data pipelines,
fitting models and feature selection algorithms.

Embedding support

A common technique in modern big data serving applications is to map the subject data – say, text or images –
to points in an abstract vector space and then do computation in that vector space.
For example, retrieve similar data by finding nearby points in the vector space,
or using the vectors as input to a neural net.
This mapping is usually referred to as embedding
read more about Vespa’s built-in support.

Tensors and ranking

enables ranking expression evaluation without de-serialization, to decrease latency, on the expense of more memory used.
Supported for tensor field types with at least one mapped dimension.

Tensor short format
is now supported in the /document/v1 API.

Support for importing onnx models in rank profiles is added.

Blog posts and training videos

Find great Vespa blog posts on
constrained ANN-search,
hybrid billion scale vector search,
and Lester Solbakken + Jo Kristian Bergum at the
Berlin Buzzwords conference –
follow Jo Kristian for industry leading commentary.

New training videos for Vespa startup troubleshooting and auto document redistribution
are available at Troubleshooting startup, singlenode Troubleshooting startup, multinode Bucket distribution - intro

Vespa Newsletter, September 2022 | Vespa Blog

Kristian Aune

Kristian Aune

Head of Customer Success, Vespa

In the previous update,
we mentioned Vespa 8, pre/post ANN filtering support, fuzzy matching, pyvespa experimental ranking module,
embedding support and new tensor features
Today, we’re excited to share the following updates:

Rank-phase statistics

With rank-phase statistics
it is easy to measure relative query performance on a per-document-level,
like “Which documents appear most often in results, which ones never do?”.
The statistics are written in configurable attributes per document,
for analysis using the Vespa query- and aggregation APIs.
Use this feature for real-time tracking of ranking performance,
and combine with real-time updates for tuning.

Schema feeding flexibility

Since Vespa 8.20, a document feed can contain unknown fields using
While the default behavior is to reject feeds with unknown fields,
this can make it easier to optimize or evolve the schema to new use cases,
with less need to coordinate with client feeds.

Beta: Query Builder and Trace Visualizer

New beta applications for building queries and analyzing query traces available at
This is the first step towards helping users experiment easily with queries,
and the Trace Visualizer can be used to help pinpoint query latency bottlenecks.

Rank trace profiling

Use rank trace profiling to expose information about how time spent on ranking is distributed between individual
rank features.
Available since Vespa 8.48,
use trace.profileDepth
as a query parameter, e.g. &tracelevel=1&trace.profileDepth=10.
This feature can be used for content node rank performance analysis.

Feeding bandwidth test

When doing feeding throughput tests, it can often be hard to distinguish latency inside your Vespa application
vs. validating the available bandwidth between client and server.
Since Vespa 8.35, the vespa-feed-client
supports the --speed-test parameter for bandwidth testing.
Note that both client and server Vespa must be on 8.35 or higher.

Training video

Vespa allows plugging in your own Java code in both the document- and query-flows, to implement advanced use cases.
Using query tracing and a debugger can be very useful in developing and troubleshooting this custom code.
For an introduction, see Debugging a Vespa Searcher: Debugging a Vespa Searcher

Vespa Newsletter, October 2022 | Vespa Blog

Kristian Aune

Kristian Aune

Head of Customer Success, Vespa

In the previous update,
we mentioned Rank-phase statistics, Schema feeding flexibility, the Query Builder and Trace Visualizer,
Rank trace profiling, the new --speed-test parameter and a new video.
Today, we’re excited to share the following updates:

Create vector embeddings in Vespa without custom Java code

An increasingly popular reason for using Vespa is the ability to use vector embeddings
to be able to retrieve documents by semantic similarity in addition to retrieving by text tokens or attributes.
Since Vespa 8.52, we have made this easier by making it possible to use BERT-style models
to create document and query embeddings inside Vespa without writing any custom code.

The BertBase embedder bundled with Vespa
uses a WordPiece embedder to produce a token sequence that is then input to a transformer model.
A BERT-Base compatible transformer model must have three inputs:

  • A token sequence (input_ids)
  • An attention mask (attention_mask)
  • (Optionally) Token types for cross encoding (token_type_ids)

Give this a try at

Model hub: Provided ML models on Vespa Cloud

The BERT base embedder allows you to use vector search without bringing your own vectors, or writing any Java code –
but you still have to bring the model.
For our Vespa Cloud users we have made this even simpler by
providing the models out of the platform as well.

For us working on, it is always a goal to empower application developers
by making it as simple as possible to get started,
while at the same time being able to scale seamlessly to more data, higher traffic, and more complex use cases.
So of course you can still bring your own models, write your own embedders, or pass in your own vectors,
and mix and match all these capabilities in all possible ways.

Improved query performance for filters with common terms

When making text indexes,
Vespa stores a bitvector in addition to the posting list for frequent terms to enable maximally fast matching.
If the field is used as a filter only, no ranking is needed,
and the bitvector will be used instead of the posting list.
This makes queries using such terms faster and cheaper.
The bitvector optimization is now also available for
attribute fields with fast-search.

Paged attributes

Fields which are stored in column stores suitable for random memory access are called attributes in Vespa.
These are used for matching, ranking and grouping, and enabling high-throughput partial updates.
By default, attributes are stored completely in memory to make all accesses maximally fast,
but some have also supported paging out to disk
to support a wider range of tradeoffs between lookup speed and memory cost –
see e.g. hybrid billion scale vector search.

Since Vespa 8.69, paging support has been extended to all attribute types,
except tensor with fast-rank and

ARM64 support

Vespa container images are now released as multiplatform, supporting both x86_64 and ARM64.
ARM64 is also available on Vespa Cloud.
Read more.

Query result highlighting for arrays of string

Highlighting query words in results helps users see why a particular document is returned in their search result.
Since Vespa 8.53, this is supported for arrays of string in addition to single-value strings –
see the schema reference.

Vespa scripts are becoming go only

The Vespa Container image was set on a diet and now has zero Perl-dependencies.
Most Vespa utilities have now instead been ported to using Go to support a wider range of client platforms
without requiring any dependencies.

Vespa Newsletter, November 2022 | Vespa Blog

Kristian Aune

Kristian Aune

Head of Customer Success, Vespa

In the previous update,
we mentioned Vector Embeddings, Vespa Cloud Model Hub, Paged Attributes, ARM64 Support, and Result Highlighting.
Today, we’re excited to share the following updates:

Improved performance when using ANN and pre-filter

Since Vespa 8.78.45, multithreaded pre-filtering before running the
approximate nearest neighbor query operator is supported by using
in the rank-profile.
Multithreading can cut latencies for applications using pre-filtering,
where the filtering amounts to a significant part of the query latency.
Read more.

Better hit estimates from parent document attributes

Applications can use parent/child to normalize data –
keeping fields common for many documents in a parent schema.
This simplifies updating such fields and makes the update use fewer resources with many children.
When using parent fields in matching,
one can use fast-search
for better performance by using a dictionary.
Since Vespa 8.84.14, a parent field with fast-search set will have a better hit estimate using the dictionary data.
The estimate is then used when creating the query plan to limit the candidate result set quicker,
resulting in lower query latency.

New XGBoost and LightGBM model training notebooks

Vespa supports gradient boosting decision tree (GBDT) models trained with
XGBoost and LightGBM.
To get you started, we have released two new sample notebooks for easy training of XGBoost and LightGBM models in
Vespa sample apps notebooks.
Linked from these is an exciting blog post series on using these models in Product Search applications.

Vespa Cloud on GCP

Vespa Cloud has been available in AWS zones since its start in 2019.
Now, we are happy to announce Vespa Cloud availability in Google Cloud Platform (GCP) zones!
To add a GCP zone to your application,
simply add <region>gcp-us-central1-f</region> to deployment.xml.
See the announcement for more details.

Vespa Newsletter, January 2023 | Vespa Blog

Kristian Aune

Kristian Aune

Head of Customer Success, Vespa

It’s a busy winter at the Vespa HQ. We are working on some major new features,
which will be announced soon, but we’re also finding the time to make smaller improvements – see below!

Interested in search ranking? Don’t miss these blog posts

We have done some deep diving into using machine learning to improve ranking in search applications lately,
and of course, we’re blogging and open-sourcing all the details to make it easy for you to build on what we are doing.
See these recent blog posts:

New Vespa improvements

In the previous update,
we mentioned ANN pre-filter performance, parent field hit estimates,
model training notebooks, and Vespa Cloud GCP Support.
This time, we have the following improvements:

Simpler tensor JSON format

Since Vespa 8.111, Vespa allows tensor field values to be written in JSON
without the intermediate map containing “blocks”, “cells” or “values”.
The tensor type will then dictate the format of the tensor content.
Tensors can also be returned in this format in
and model evaluation
by requesting the format short-value –
see the tensor format documentation.

Supplying values for missing fields during indexing

Vespa allows you to add fields outside the “document” section in the schema configuration
that get their values from fields in the document.
For example, you can add a vector embedding of a title and description field like this:

field myEmbedding type tensor(x[128]) {
    indexing: input title . " " . input description | embed | attribute

But what if descriptions are sometimes missing?
Then Vespa won’t produce an embedding value at all, which may not be what you want.
From 8.116, you can specify an alternative value for expressions that don’t produce a value
using the || syntax:

field myEmbedding type tensor(x[128]) {
    indexing: input title . " " . (input description || "") | embed | attribute

Since January 31, it is possible to set up private connectivity between a customer’s VPC
and their Vespa Cloud application using AWS PrivateLink.
This provides clients safe, non-public access to their applications
using private IPs accessible from within their own VPCs –
read more.

Content node performance

Vespa content nodes store the data written to Vespa, maintain indexes over it, and run matching and ranking.
Most applications spend the majority of their hardware resources on content nodes.

  • Improving query performance is made easier with new match phase profiling
    since Vespa 8.114. This gives insight into what’s most costly in matching your queries (ranking was already supported).
    Read more at phased-ranking.html.
  • Since Vespa 8.116, Vespa requires minimum
    Haswell microarchitecture.
    A more recent optimization target enables better optimizations and, in some cases, gives 10-15% better ranking performance.
    It is still possible to run on older microarchitectures, but then you must compile from source;
    see #25693.

Vespa start and stop script improvements

Vespa runs in many environments, from various self-hosted technology stacks to Vespa Cloud –
see multinode-systems
and basic-search-on-gke.
To support running as a non-root user inside containers with better debug support,
the vespa start/stop-scripts are now refactored and simplified –
this will also make Vespa start/stop snappier in some cases.

Container Performance and Security

With Vespa 8.111, Vespa upgraded its embedded Jetty server from version 9.x to 11.0.13.
The upgrade increases performance in some use cases, mainly when using HTTP/2,
and also includes several security fixes provided with the Jetty upgrade.

Log settings in services.xml

During debugging, it is useful to be able to tune which messages end up in the log,
especially when developing custom components.
This can be done with the vespa-logctl tool on each node.
Since Vespa 8.100, you can also control log settings in services.xml –
see logging.
This is also very convenient when deploying on Vespa Cloud.

Vespa Cloud: Autoscaling with multiple groups

When allocating resources on Vespa Cloud
you can specify both the number of nodes and node groups you want in content clusters
(each group has one or more complete copies of all the data and can handle a query independently):

<nodes count="20" groups="2">

If you want the system to automatically find the best values for the given load, you can configure ranges:

<nodes count="[10, 30]" groups="[1, 3]">

This might lead to groups of sizes from 4 to 30, which may be fine,
but sometimes you want to control the size of groups instead?
From 8.116, you can configure group size instead (or in addition to) the number of groups:

<nodes count="[10, 30]" group-size="10">

Like the other values, group-size can also be ranges.
See the documentation.

In addition to choosing resources, a content cluster must also be configured with a redundancy –
the number of copies to keep of each piece of data in each group.
With variable groups this may cause you to have more copies than you strictly need to avoid data loss,
so since 8.116, you can instead configure the minimum redundancy:


The system will then ensure you have at least this many copies of the data,
but not make more copies than necessary in each group.

Vespa Cloud: Separate read/write data plane access control

When configuring the client certificate to use for your incoming requests (data plane) on Vespa Cloud,
you can now specify whether each certificate should have read- or write-access or both.
This allows you to e.g., use one certificate for clients with read access while having another –
perhaps less distributed – certificate for write access.
See the Security Guide
for more details on how to configure it.

Thanks for reading! Try out Vespa on Vespa Cloud
or grab the latest release at and run it yourself! 😀

Vespa Newsletter, March 2023 | Vespa Blog

Kristian Aune

Kristian Aune

Head of Customer Success, Vespa

In the previous update,
we mentioned Better Tensor formats, AWS PrivateLink, Autoscaling, Data Plane Access Control
as well as Container and Content Node Performance.

We also want to thank you for your PRs! In particular (see below),
most of the new pyvespa features were submitted from non-Vespa Team members – thank you!
We are grateful for the contributions, please do keep those PRs coming!

We’re excited to share the following updates:

GPU-accelerated ML inference

In machine learning, computing model inference is a good candidate for being accelerated by special-purpose hardware, such as GPUs.
Vespa supports evaluating multiple types of machine-learned models in stateless containers,
e.g., TensorFlow,
and LightGBM models.
For some use cases, using a GPU makes it possible to perform model inference with higher performance,
and at a lower price point, when compared to using a general-purpose CPU.

The Vespa Team is announcing support for GPU-accelerated ONNX model inference in Vespa,
including support for GPU instances in Vespa Cloud –
read more.

Vespa Cloud: BCP-aware autoscaling

As part of a business continuity plan (BCP),
applications are often deployed to multiple zones so the system has ready, on-hand capacity
to absorb traffic should a zone fail.
Using autoscaling in Vespa Cloud sets aside resources in each zone to handle an equal share of the traffic from the other zones
in case one of them goes down – e.g., it assumes a flat BCP structure.

This is not always how applications wish to structure their BCP traffic shifting though –
so applications can now define their BCP structure explicitly
using the BCP tag in
Also, during a BCP event, when it is acceptable to have some delay until capacity is ready,
you can set a deadline until another zone must have sufficient capacity to accept the overload;
permitting delays like this allows autoscaling to save resources.

Vespa for e-commerce


Vespa is often used in e-commerce applications.
We have added exciting features to the shopping sample application:

  • Use NLP techniques to generate query suggestions from the index content
    based on spaCy and en_core_web_sm.
  • Use the fuzzy query operator
    and prefix search for great query suggestions –
    this handles misspelled words and creates much better suggestions than prefix search alone.
  • For query-contextualized navigation,
    the order in which the groups are rendered is determined by both counting and the relevance of the hits.
  • Native embedders are used to map the textual query and document representations into dense high-dimensional vectors,
    which are used for semantic search – see embeddings.
    The application uses an open-source embedding model,
    and inference is performed using stateless model evaluation,
    during document and query processing.
  • Hybrid ranking /
    Vector search:
    The default retrieval uses approximate nearest neighbor search in combination with traditional lexical matching.
    The keyword and vector matching is constrained by the filters such as brand, price, or category.

Read more about these and other Vespa features used in

Optimizations and features

  • Vespa supports multiple schemas with multiple fields.
    This can amount to thousands of fields.
    Vespa’s index structures are built for real-time, high-throughput reads and writes.
    With Vespa 8.140, the static memory usage is cut by 75%, depending on field types.
    Find more details in #26350.
  • Extracting documents is made easier using vespa visit in the Vespa CLI.
    This makes it easier to clone applications
    with data to/from self-hosted/Vespa Cloud applications.


Pyvespa – the Vespa Python experimentation library – is now split into two repositories:
pyvespa and learntorank;
this is for better separation of the python API and to facilitate prototyping and experimentation for data scientists.
Pyvespa 0.32 has been released with many new features for fields and ranking;
see the release notes.

This time, most of the new pyvespa features are submitted from non-Vespa Team members!
We are grateful for – and welcome more – contributions. Keep those PRs coming!

GCP Private Service Connect in Vespa Cloud

In January, we announced AWS Private Link.
We are now happy to announce support for GCP Private Service Connect in Vespa Cloud.
With this service, you can set up private endpoint services on your application clusters in Google Cloud,
providing clients with safe, non-public access to the application!

In addition, Vespa Cloud supports deployment to both AWS and GCP regions in the same application deployment.
This support simplifies migration projects, optimizes costs, adds cloud provider redundancy, and reduces complexity.
We’ve made adopting Vespa Cloud into your processes easy!

Blog posts since the last newsletter

Thanks for reading! Try out Vespa on Vespa Cloud
or grab the latest release at and run it yourself! 😀