Vespa Product Updates, October 2020

Kristian Aune

Kristian Aune

Head of Customer Success, Vespa


Photo by

Ilya Pavlov on

Unsplash

In the September updates,
we mentioned ONNX runtime integration and feeding improvements.

This month, we’re excited to share the following updates:

New Container thread pool configurations

When deploying, application changes are
live reloaded into the running JVM.
New code requires JVM JIT compilation, which temporarily loads the container
and causes increased query latencies for a second or two.
Many parallel threads aggravate this problem.
Vespa now has a dedicated container thread pool for feeding.
Compared to the previous default of a fixed size with 500 threads, it now defaults to 2x logical CPUs.
This both improves feed throughput and reduces latency impact during deployments.

Improved document/v1 API throughput

Vespa users feed their applications through feed containers in their Vespa cluster,
using either an asynchronous or a synchronous HTTP API.
Optimizations and fine-tuning of concurrent execution in these feed containers,
and the change to asynchronous handling of requests in the synchronous
document/v1 API,
has made the feed container more effective.
This has greatly increased quality of service for both search and feed during container restarts.
As a bonus, we also see a 50% increase in throughput for our performance test suite of the synchronous HTTP API,
since Vespa version 7.304.50 onwards.

Visibility-delay for feeding no more

Visibility-delay
was used to batch writes for increased write throughput.
With the recent optimizations, there is no gain in batching writes,
now it is as fast without it, the batch code is hence removed.
Visibility-delay is still working for queries, with a short cache with max 1 second TTL.
Vespa Team recommends stop using this feature, as there is no longer an advantage to have this delay.


About Vespa: Largely developed by Yahoo engineers,
Vespa is an open source big data processing and serving engine.
It’s in use by many products, such as Yahoo News, Yahoo Sports, Yahoo Finance, and the Verizon Media Ad Platform.
Thanks to feedback and contributions from the community, Vespa continues to grow.

We welcome your contributions and feedback (tweet
or email) about any of these new features or future improvements you’d like to request.

Subscribe to the mailing list for more frequent updates!

Vespa Newsletter, October 2022 | Vespa Blog

Kristian Aune

Kristian Aune

Head of Customer Success, Vespa


In the previous update,
we mentioned Rank-phase statistics, Schema feeding flexibility, the Query Builder and Trace Visualizer,
Rank trace profiling, the new --speed-test parameter and a new video.
Today, we’re excited to share the following updates:

Create vector embeddings in Vespa without custom Java code

An increasingly popular reason for using Vespa is the ability to use vector embeddings
to be able to retrieve documents by semantic similarity in addition to retrieving by text tokens or attributes.
Since Vespa 8.52, we have made this easier by making it possible to use BERT-style models
to create document and query embeddings inside Vespa without writing any custom code.

The BertBase embedder bundled with Vespa
uses a WordPiece embedder to produce a token sequence that is then input to a transformer model.
A BERT-Base compatible transformer model must have three inputs:

  • A token sequence (input_ids)
  • An attention mask (attention_mask)
  • (Optionally) Token types for cross encoding (token_type_ids)

Give this a try at
simple-semantic-search.

Model hub: Provided ML models on Vespa Cloud

The BERT base embedder allows you to use vector search without bringing your own vectors, or writing any Java code –
but you still have to bring the model.
For our Vespa Cloud users we have made this even simpler by
providing the models out of the platform as well.

For us working on Vespa.ai, it is always a goal to empower application developers
by making it as simple as possible to get started,
while at the same time being able to scale seamlessly to more data, higher traffic, and more complex use cases.
So of course you can still bring your own models, write your own embedders, or pass in your own vectors,
and mix and match all these capabilities in all possible ways.

Improved query performance for filters with common terms

When making text indexes,
Vespa stores a bitvector in addition to the posting list for frequent terms to enable maximally fast matching.
If the field is used as a filter only, no ranking is needed,
and the bitvector will be used instead of the posting list.
This makes queries using such terms faster and cheaper.
The bitvector optimization is now also available for
attribute fields with fast-search.

Paged attributes

Fields which are stored in column stores suitable for random memory access are called attributes in Vespa.
These are used for matching, ranking and grouping, and enabling high-throughput partial updates.
By default, attributes are stored completely in memory to make all accesses maximally fast,
but some have also supported paging out to disk
to support a wider range of tradeoffs between lookup speed and memory cost –
see e.g. hybrid billion scale vector search.

Since Vespa 8.69, paging support has been extended to all attribute types,
except tensor with fast-rank and
predicate.

ARM64 support

Vespa container images are now released as multiplatform, supporting both x86_64 and ARM64.
ARM64 is also available on Vespa Cloud.
Read more.

Query result highlighting for arrays of string

Highlighting query words in results helps users see why a particular document is returned in their search result.
Since Vespa 8.53, this is supported for arrays of string in addition to single-value strings –
see the schema reference.

Vespa scripts are becoming go only

The Vespa Container image was set on a diet and now has zero Perl-dependencies.
Most Vespa utilities have now instead been ported to using Go to support a wider range of client platforms
without requiring any dependencies.

Vespa Newsletter, October 2023 | Vespa Blog

Kristian Aune

Kristian Aune

Head of Customer Success, Vespa


First, we are happy to announce the improved search UI at search.vespa.ai!
AI-generated suggestions, paragraph indexing with hybrid ranking, results-based AI-generated abstract (RAG),
and original formatting in search results.
We hope this lets you find the right answer quicker and explore the rich Vespa feature space more easily –
please let us know and get started with queries,
like how to configure two-phased ranking.
And even better; the application itself is open source, so you can see for yourself how you could do something similar –
read more in the blog post.

In the previous update,
we mentioned multilingual models, more control over ANN queries, mapped tensors in queries,
and multiple new features in pyvespa and the Vespa CLI.
Today, we’re excited to share the following updates:

Vespa.ai is its own company!

We have spun out the Vespa.ai team as a separate company.
This will let us provide the community with more features even faster,
and help more companies run their Vespa applications cost-effectively
and with high quality on Vespa Cloud –
read more in the announcement.
Join us at slack.vespa.ai,
and please let us know what you want from us in the future.

Vespa Cloud Enclave – Bring your own cloud

Vespa Cloud Enclave lets Vespa Cloud applications in AWS and GCP run in your cloud account/project
while everything is still fully managed by Vespa Cloud’s automation with access to all Vespa Cloud features.
While this adds some administration overhead,
it lets you keep all data within resources controlled by your company at all times,
which is often a requirement for enterprises dealing with sensitive data.
Read more.

Lucene Linguistics integration

The Lucene Linguistics component, added in #27929,
lets you replace the default linguistics module in Vespa with Lucene’s, supporting 40 languages.
This can make it easier to migrate existing text search applications from Lucene-based engines to Vespa
by keeping the linguistics treatment unchanged.

Lucene Linguistics is a contribution to Vespa from Dainius Jocas in the Vinted team –
read the announcement in the blog post for more details.
Also, see their own blog post
for how they adopted Vespa for serving personalized second-hand fashion recommendations at Vinted.

Much faster fuzzy matching

Fuzzy matching lets you match attribute field values within a given edit distance from the value given in a query:

select * from music where myArtistAttribute contains
                    ({maxEditDistance: 1}fuzzy("the weekend"))

In Vespa 8.238 we made optimizations to our fuzzy search implementation when matching with
maxEditDistance of 1 or 2.
Fuzzy searching would previously run a linear scan of all dictionary terms.
We now use Deterministic Finite Automata (DFA) to generate the next possible successor term to any mismatching candidate term,
allowing us to skip all terms between the two immediately.
This enables sublinear dictionary matching.
To avoid having to build a DFA for each query term explicitly,
we use a custom lookup table-oriented implementation based on the paper Fast string correction with Levenshtein automata (2002)
by Klaus U. Schulz and Stoyan Mihov.

Internal performance testing on a dataset derived from English Wikipedia (approx 250K unique terms)
shows improvements for pure fuzzy searches between 10x-40x.
For fuzzy searches combined with filters, we have seen up to 180x speedup.

Cluster-specific model-serving settings

You can deploy machine-learned models for ranking and inference both in container and content clusters,
and container clusters optionally let you run models on GPUs.
In larger applications, you often want to set up multiple clusters to be able to size for different workloads separately.

Vespa clusters overview

From Vespa 8.220, you can configure GPU model inference settings per container cluster:

<container id="c1" version="1.0">
  <model-evaluation>
    <onnx>
      <models>
        <model name="mul">
          <intraop-threads>2</intraop-threads>

Instrumenting indexing performance

We have made it easier to find bottlenecks in the write path with a new set of metrics:

content.proton.executor.field_writer.utilization
content.proton.executor.field_writer.saturation

If .saturation is close to 1.0 and higher than .utilization, it indicates that worker threads are a bottleneck.
You can then use the Vespa Cloud Console searchnode API
and the documentation
to spot the limiting factor in fully utilizing the CPU when feeding:

searchnode API

Automated BM25 reconfiguration

Vespa has had BM25 ranking for a long time:

field content type string {
    indexing: index | summary
    index: enable-bm25
}

However, setting enable-bm25 on a field with already indexed data required a manual procedure for the index setting to take effect.
Since Vespa 8.241.13, this will happen as automated reindexing in the background like with other schema changes;
see the example
for how to observe the reindexing progress after enabling the field.

Minor feature improvements

  • The deploy feature in the Vespa CLI is improved with better deployment status tracking,
    as well as other minor changes for ease-of-use.
  • Nested grouping in query results, when grouping over an array of struct or maps,
    is scoped to preserve structure/order in the lower level from Vespa 8.223.
  • Document summaries can now inherit multiple
    other summary classes – since Vespa 8.250.

Performance improvements

  • In Vespa 8.220 we have changed how small allocations (under 128 kB)
    are handled for paged attributes (attributes on disk).
    Instead of mmapping each allocation, they share mmapped areas of 1 MB.
    This greatly reduces the number of mmapped areas used by vespa-proton-bin.
  • Vespa uses ONNXRuntime for model inference.
    Since Vespa 8.250, this supports bfloat16 and float16 as datatypes for ONNX models.
  • Custom components deployed to the Vespa container can use URLs to point to resources to be loaded at configuration time.
    From Vespa 8.216, the content will be cached on the nodes that need it.
    The cache saves bandwidth on subsequent deployments –
    see adding-files-to-the-component-configuration.

Did you know: Production deployment with code diff details

Tracking changes to the application through deployment is easy using the Vespa Cloud Console.
The source link is linked to the repository if added in the deploy command:

Deploy with diff

Add the link of the code diff deploy-time using source-url:

vespa prod deploy --source-url https://github.com/vespa-engine/sample-apps/commit/aa2d125229c4811771028525915a4779a8a7be6f

Find more details and how to automate in
source-code-repository-integration.

Blog posts since last newsletter


Thanks for reading! Try out Vespa by
deploying an application for free to Vespa Cloud
or install and run it yourself.