Changes in OS support for Vespa

Photo by Claudio Schwarz
on Unsplash

Currently, we support CentOS Stream 8 for open-source Vespa, and we announced that
in the Upcoming changes in OS support for Vespa blog
post in 2022. The choice to use CentOS Stream was made around the time that RedHat announced the EOL for CentOS 8 and the
new CentOS Stream initiative.
Other groups were scrambling to be the successor to CentOS, and the landscape was not settled. This is now about to change.

Introduction

We are committed to providing Vespa releases that we have high confidence in. Internally, at Vespa.ai,
we have migrated to AlmaLinux 8 and
Red Hat Enterprise Linux 8. CentOS Stream 8 is
also approaching its EOL, which is May 31st, 2024. Because of this, we want to change the supported OS for open-source Vespa.

Vespa is released up to 4 times a week depending on internal testing and battle-proven verification in our production
systems. Each high-confidence version is published as RPMs and a container image.
RPMs are built and published on Copr, and container images are
published on Docker Hub. In this blog post, we will look at options going
forward and announce the selected OS support for the upcoming Vespa releases.

Options

There is a wide selection of Linux distributions out there with different purposes and target groups. For us, we need to choose
an OS that is as close to what we use internally as possible, and that is acceptable for our open-source users. These
criteria limit the options significantly to an enterprise Linux-based distribution.

RPMs

Binary releases of Vespa is a set of RPMs. This set has to be built somewhere and must be uploaded to a
repository where it can be downloaded by package managers. These RPMs are then installed either on a host machine or in
container images. We will still build our RPMs on Copr, but we
have a choice there to compile for different environments that are either downstream or upstream of RHEL. In the time
between the EOL of CentOS 8 (end 2021) and now, Copr has added support to build for
EPEL 8 and 9. This means that we can build for EPEL 8 and install it on
RHEL 8 and its downstream clones.

Distribution of RPMs is currently done on Copr as the built RPMs are directly put in the repository there. The repositories
have limited capacity, and Copr only guarantees that the latest version is available. It would be nice to have an archive
of more than just the most recent version, but this will rely on vendors offering space and network traffic for the RPMs
to be hosted.

Container images

Given the choice of building RPMs on Copr for EPEL 8, this opens up a few options when selecting a base image for our
container image:

We should be able to select any of the above due to the RHEL ABI compatibility
and the distributions’ respective guarantees to be binary compatible with RHEL.

Red Hat has also announced a free version of RHEL
for developers and small workloads. However, it is not hassle-free, as it requires registration at Red Hat
to be able to use the version. We believe that this will not be well received by the consumers of Vespa
releases.

Selection

Container Image

Considering the options, we have chosen AlmaLinux 8 as the supported OS going forward. The main reasons for this decision
are:

  • AlmaLinux is used in the Vespa Cloud production systems
  • The OS is available to anyone free of charge
  • We will still be able to leverage the Copr build system for open-source Vespa artifacts

We will use the docker.io/almalinux:8 image as the base for the Vespa container image on Docker Hub.

RPM distribution

Although we will continue to build RPMs on Copr, we are going to switch to a new RPM repository that can keep an archive
of a limited set of historic releases. We have been accepted as an open-source project at
Cloudsmith and will use the
vespa/open-source-rpms repository to distribute our RPMs.
Cloudsmith generously allows qualified open-source projects to store 50 GB and have 200 GB of network traffic. The
vespa-engine.repo repository definition
will be updated shortly, and information about how to install Vespa from RPMs can be found in the
documentation. Within our storage limits, we will be able to store
approximately 50 Vespa releases.

Compatibility for current Vespa installations

The consumers of Vespa container images should not notice any differences when Vespa changes the base of the container
image to AlmaLinux 8. Everything comes preinstalled in the image, and this is tested the same way as it was before. If
you use the Vespa container image as a base of custom images, the differences between the latest AlmaLinux 8 and CentOS
Stream 8 are minimal. We do not expect any changes to be required.

For consumers that install Vespa RPMs in their container images or install directly on host instances, we will continue
to build and publish RPMs for CentOS Stream 8 on Copr until Dec 31st, 2023. RPMs built on EPEL 8 will be forward compatible
with CentOS Stream 8 due to the RHEL ABI compatibility. This
means that you can make the switch by replacing the repository configuration with the one defined in
vespa-engine.repo the next time Vespa
is upgraded. If you do not, no new Vespa versions will be available for upgrade once we stop building for CentOS Stream 8.

Future OS support

Predicting the path of future OS support is not trivial in an environment where the landscape is changing. RedHat
announced closing off the RHEL sources and strengthening
CentOS Stream. Open Enterprise Linux Association has popped up as a response to this, and AlmaLinux
commits to binary compatibility. We expect the landscape to change, and
hopefully, we will have more clarity when deciding on the next OS to support.

Regardless of the landscape, we are periodically publishing a preview on AlmaLinux 9 that can be used at your own risk
here. Please use this for testing purposes only.

Summary

We have selected AlmaLinux 8 as the supported OS for Vespa going forward. The change is expected to have no impact on the
consumers of Vespa container images and RPMs. The primary RPM repository has moved to a Cloudsmith-hosted
repository where we can have an archive of releases allowing installation of not just the latest Vespa version.

Introducing TensorFlow support | Vespa Blog

In previous blog posts we have talked about Vespa’s tensor API which enables some advanced ranking capabilities. The primary use case is for machine learned ranking, where you train your models using some machine learning framework, convert the models to Vespa’s tensor format, and deploy them to Vespa. This works well, but converting trained models to Vespa form is cumbersome.

We are now happy to announce a new feature that makes this process a lot easier: TensorFlow import. With this feature you can directly deploy models you’ve trained in TensorFlow to Vespa, and use these models during ranking. This means that the models are executed in parallel over multiple threads and machines for a single query, which makes it possible to evaluate the model over any number of data items and still bound the total response time. In addition the data items to evaluate with the TensorFlow model can be selected dynamically with a query, and with a cheaper first-phase rank function if needed. Since the TensorFlow models are evaluated on the nodes storing the data, we avoid sending any data over the wire for evaluation.

In this post we’d like to introduce this new feature by discussing how it works, some assumptions behind working with TensorFlow and Vespa, and how to use the feature.

Vespa is optimized to evaluate models repeatedly over many data items (documents).  To do this efficiently, we do not evaluate the model using the TensorFlow inference engine. TensorFlow adds a non-trivial amount of overhead and instrumentation which it uses to manage potentially large scale computations. This is significant in our case, since we need to evaluate models on a micro-second scale. Hence our approach is to extract the parameters (weights) into Vespa tensors, and use the model specification in the TensorFlow graph to generate efficient Vespa tensor expressions.

Importing TensorFlow models is as simple as saving the TensorFlow model using the
SavedModel API,
adding those files to the Vespa application package, and referencing the model using the new TensorFlow ranking feature.
For instance, if your files are in models/my_model in the application package:

first-phase {
    expression: sum(tensorflow(“my_model/saved”))
}

The above expressions runs the model, and sums it to a single scalar value to use in ranking.  One thing you will have to provide is the input(s), or feed, to the graph. Vespa expects you to provide a macro with the same name as the input placeholder. In the macro you can specify where the input should come from, be it a parameter sent along with the query, a document field (possibly in a parent document) or a constant.

As mentioned, Vespa evaluates the imported models once per document. Depending on the requirements of the application, this can impose some natural limitations on the size and complexity of the models that can be evaluated. However, Vespa has a number of other search and rank features that can be used to reduce the search space before running the machine learned models. Typically, one would use the search and first ranking phases to select a relatively small number of candidate documents, which are then given their final rank score in the more computationally expensive second phase model evaluation.

Also note that TensorFlow import is new to Vespa, and we currently only support a subset of the TensorFlow operations. While the supported operations should suffice for many relevant use cases, there are some that are not supported yet due to potentially being too expensive to evaluate per document. For instance, convolutional networks and recurrent networks (LSTMs etc) are not supported. We are continually working to add functionality, if you find that we have some glaring omissions, please let us know.

Going forward we are focusing on further improving performance of our tensor framework for important use cases. We’ll follow up this post with one showing how the performance of evaluation in Vespa compares with TensorFlow serving. We will also add more supported frameworks and our next target is ONNX.

You can read more about this feature in the ranking with TensorFlow model in Vespa documentation. We are excited to announce the TensorFlow support, and we’re eager to hear what you are building with it.

Introducing ONNX support | Vespa Blog

ONNX (Open Neural Network eXchange) is an open format for the sharing of neural network and other machine learned models between various machine learning and deep learning frameworks. As the open big data serving engine, Vespa aims to make it simple to evaluate machine learned models at serving time at scale. By adding ONNX support in Vespa in addition to our existing TensorFlow support, we’ve made it possible to evaluate models from all the commonly used ML frameworks with low latency over large amounts of data.

Introduction

With the rise of deep learning in the last few years, we’ve naturally enough seen an increase of deep learning frameworks as well: TensorFlow, PyTorch/Caffe2, MxNet etc. One reason for these different frameworks to exist is that they have been developed and optimized around some characteristic, such as fast training on distributed systems or GPUs, or efficient evaluation on mobile devices. Previously, complex projects with non-trivial data pipelines have been unable to pick the best framework for any given subtask due to lacking interoperability between these frameworks. ONNX is a solution to this problem.

ONNX

ONNX is an open format for AI models, and represents an effort to push open standards in AI forward. The goal is to help increase the speed of innovation in the AI community by enabling interoperability between different frameworks and thus streamlining the process of getting models from research to production.

There is one commonality between the frameworks mentioned above that enables an open format such as ONNX, and that is that they all make use of dataflow graphs in one way or another. While there are differences between each framework, they all provide APIs enabling developers to construct computational graphs and runtimes to process these graphs. Even though these graphs are conceptually similar, each framework has been a siloed stack of API, graph and runtime. The goal of ONNX is to empower developers to select the framework that works best for their project, by providing an extensible computational graph model that works as a common intermediate representation at any stage of development or deployment.

Vespa is an open source project which fits well within such an ecosystem, and we aim to make the process of deploying and serving models to production that have been trained on any framework as smooth as possible. Vespa is optimized toward serving and evaluating over potentially very large datasets while still responding in real time. In contrast to other ML model serving options, Vespa can more efficiently evaluate models over many data points. As such, Vespa is an excellent choice when combining model evaluation with serving of various types of content.

Our ONNX support is quite similar to our TensorFlow support. Importing ONNX models is as simple as adding the model to the Vespa application package (under “models/”) and referencing the model using the new ONNX ranking feature:

    expression: sum(onnx("my_model.onnx"))

The above expression runs the model and sums it to a single scalar value to use in ranking. You will have to provide the inputs to the graph. Vespa expects you to provide a macro with the same name as the input tensor. In the macro you can specify where the input should come from, be it a document field, constant or a parameter sent along with the query. More information can be had in the documentation about ONNX import.

Internally, Vespa converts the ONNX operations to Vespa’s tensor API. We do the same for TensorFlow import. So the cost of evaluating ONNX and TensorFlow models are the same. We have put a lot of effort in optimizing the evaluation of tensors, and evaluating neural network models can be quite efficient.

ONNX support is also quite new to Vespa, so we do not support all current ONNX operations. Part of the reason we don’t support all operations yet is that some are potentially too expensive to evaluate per document, such as convolutional neural networks and recurrent networks (LSTMs etc). ONNX also contains an extension, ONNX-ML, which contains additional operations for non-neural network cases. Support for this extension will come later at some point. We are continually working to add functionality, so please reach out to us if there is something you would like to have added.

Going forward we are continually working on improving performance as well as supporting more of the ONNX (and ONNX-ML) standard. You can read more about ranking with ONNX models in the Vespa documentation. We are excited to announce ONNX support. Let us know what you are building with it!

Vespa Product Updates, September 2019: Tensor Float Support, Reduced Memory Use for Text Attributes, Prometheus Monitoring Support, and Query Dispatch Integrated in Container

Kristian Aune

Kristian Aune

Head of Customer Success, Vespa


In the August Vespa product update, we mentioned BM25 Rank Feature, Searchable Parent References, Tensor Summary Features, and Metrics Export. Largely developed by Yahoo engineers, Vespa is an open source big data processing and serving engine. It’s in use by many products, such as Yahoo News, Yahoo Sports, Yahoo Finance, and the Verizon Media Ad Platform. Thanks to feedback and contributions from the community, Vespa continues to grow.

This month, we’re excited to share the following updates with you:

Tensor Float Support

Tensors now supports float cell values, for example tensor<float>(key{}, x[100]). Using the 32 bits float type cuts memory footprint in half compared to the 64 bits double, and can increase ranking performance up to 30%. Vespa’s TensorFlow and ONNX integration now converts to float tensors for higher performance. Read more.

Reduced Memory Use for Text Attributes 

Attributes in Vespa are fields stored in columnar form in memory for access during ranking and grouping. From Vespa 7.102, the enum store used to hold attribute data uses a set of smaller buffers instead of one large. This typically cuts static memory usage by 5%, but more importantly reduces peak memory usage (during background compaction) by 30%.

Prometheus Monitoring Support

Integrating with the Prometheus open-source monitoring solution is now easy to do
using the new interface to Vespa metrics.
Read more.

Query Dispatch Integrated in Container

The Vespa query flow is optimized for multi-phase evaluation over a large set of search nodes. Since Vespa-7-109.10, the dispatch function is integrated into the Vespa Container process which simplifies the architecture with one less service to manage. Read more.

We welcome your contributions and feedback (tweet or email) about any of these new features or future improvements you’d like to request.

Vespa Product Updates, October/November 2019: Nearest Neighbor and Tensor Ranking, Optimized JSON Tensor Feed Format, Matched Elements in Complex Multi-value Fields, Large Weighted Set Update Performance, and Datadog Monitoring Support

Kristian Aune

Kristian Aune

Head of Customer Success, Vespa


In the September Vespa product update, we mentioned Tensor Float Support, Reduced Memory Use for Text Attributes, Prometheus Monitoring Support, and Query Dispatch Integrated in Container.

This month, we’re excited to share the following updates:

Nearest Neighbor and Tensor Ranking

Tensors are native to Vespa. We compared elastic.co to vespa.ai testing nearest neighbor ranking using dense tensor dot product. The result of an out-of-the-box configuration demonstrated that Vespa performed 5 times faster than Elastic. View the test results.

Optimized JSON Tensor Feed Format

A tensor is a data type used for advanced ranking and recommendation use cases in Vespa. This month, we released an optimized tensor format, enabling a more than 10x improvement in feed rate. Read more.

Matched Elements in Complex Multi-value Fields 

Vespa is used in many use cases with structured data – documents can have arrays of structs or maps. Such arrays and maps can grow large, and often only the entries matching the query are relevant. You can now use the recently released matched-elements-only setting to return matches only. This increases performance and simplifies front-end code.

Large Weighted Set Update Performance

Weighted sets in documents are used to store a large number of elements used in ranking. Such sets are often updated at high volume, in real-time, enabling online big data serving. Vespa-7.129 includes a performance optimization for updating large sets. E.g. a set with 10K elements, without fast-search, is 86.5% faster to update.

Datadog Monitoring Support

Vespa is often used in large scale mission-critical applications. For easy integration into dashboards,
Vespa is now in Datadog’s integrations-extras GitHub repository.
Existing Datadog users will now find it easy to monitor Vespa.
Read more.

About Vespa: Largely developed by Yahoo engineers, Vespa is an open source big data processing and serving engine. It’s in use by many products, such as Yahoo News, Yahoo Sports, Yahoo Finance, and the Verizon Media Ad Platform. Thanks to feedback and contributions from the community, Vespa continues to grow.

We welcome your contributions and feedback (tweet or email) about any of these new features or future improvements you’d like to request.

Vespa Product Updates, December 2019: Improved ONNX support, New rank feature attributeMatch().maxWeight, Free lists for attribute multivalue mapping, faster updates for out-of-sync documents, Zookeeper 3.5.6

Kristian Aune

Kristian Aune

Head of Customer Success, Vespa


In the November Vespa product update, we mentioned Nearest Neighbor and Tensor Ranking, Optimized JSON Tensor Feed Format, Matched Elements in Complex Multi-value Fields, Large Weighted Set Update Performance and Datadog Monitoring Support.

Today, we’re excited to share the following updates:

Improved ONNX Support

Vespa has added more operations to its ONNX model API, such as GEneral Matrix to Matrix Multiplication (GEMM) –
see list of supported opsets.
Vespa has also improved support for PyTorch through ONNX,
see the pytorch_test.py example.

New Rank Feature attributeMatch().maxWeight

attributeMatch(name).maxWeight was added in Vespa-7.135.5. The value is  the maximum weight of the attribute keys matched in a weighted set attribute.

Free Lists for Attribute Multivalue Mapping

Since Vespa-7.141.8, multivalue attributes uses a free list to improve performance. This reduces CPU (no compaction jobs) and approximately 10% memory. This primarily benefits applications with a high update rate to such attributes.

Faster Updates for Out-of-Sync Documents

Vespa handles replica consistency using bucket checksums. Updating documents can be cheaper than putting a new document, due to less updates to posting lists. For updates to documents in inconsistent buckets, a GET-UPDATE is now used instead of a GET-PUT whenever the document to update is consistent across replicas. This is the common case when only a subset of the documents in the bucket are out of sync. This is useful for applications with high update rates, updating multi-value fields with large sets. Explore details here.

ZooKeeper 3.5.6

Vespa now uses Apache ZooKeeper 3.5.6 and can encrypt communication between ZooKeeper servers.

About Vespa: Largely developed by Yahoo engineers, Vespa is an open source big data processing and serving engine. It’s in use by many products, such as Yahoo News, Yahoo Sports, Yahoo Finance, and the Verizon Media Ad Platform. Thanks to feedback and contributions from the community, Vespa continues to grow.

We welcome your contributions and feedback (tweet or email) about any of these new features or future improvements you’d like to request.

Upcoming changes in OS support for Vespa

Photo by Jon Flobrant
on Unsplash

Today we support CentOS Linux 7 for open source Vespa release artifacts.
These are published as RPMs on Copr and as a
container image on Docker Hub. Vespa is released up to
4 times a week depending on internal testing and battle proven verification in our production systems.
In this blog post we will look at options going forward and announce the selected OS support for the
upcoming Vespa 8 release.

We are committed to providing Vespa releases that we have high confidence in. Internally in Yahoo we
migrated to RHEL 8 (Red Hat Enterprise Linux)
last year and we want to upgrade the supported OS for open source Vespa as well. Although CentOS 7 is
supported until June 2024, we want to move to an OS that is closer to what we use internally.

This could have been easy, but Red Hat surprised us and deprecated the CentOS Linux 8 in their December 2020 announcement.
This resulted in CentOS Linux 8 now being EOL. Since then the community and users of CentOS Linux have
been considering what to do and the void has also resulted in new distributions that are rebuilds of
RHEL just like CentOS Linux 8 was.

There are a myriad of Linux distributions out there with different purposes and target groups. For us,
we need to choose an OS that is as close to what we use internally as possible. This criteria limits
the options significantly.

CentOS Stream package flow

Figure 1: Package flow around Red Hat Enterprise Linux

In the figure above we see RHEL in the middle with their nightly build which at some point becomes the
next minor version of RHEL and the released RHEL. Downstream to the right we have distributions that
have emerged after the announcement of the CentOS Linux deprecation. Among these are Alma Linux
and Rocky Linux. Both seem to have gained traction and are now publishing
releases approximately one week after RHEL releases. Since these are rebuilds of the RHEL packages we
should be able to assume the same ABI compatibility.

Furthest upstream we have Fedora Linux
which is the source of all packages in the figure. A given major version of RHEL is branched off a version
of Fedora (Fedora 28 for the RHEL 8 release). The packages then undergo integration testing and QA before
they enter the RHEL nightly build. If the build and test of RHEL nightly is successful the package set is
pushed to CentOS Stream. The stability of CentOS Stream could be a
concern compared to the downstream distributions. In the CentOS Stream is continuous delivery
blog post the author with insight into RHEL pipelines writes that the updates to the CentOS Stream package
set is the RHEL nightly build. Packages that goes into the nightly builds are integration tested and quality
gated. The blog post CentOS Stream: Why It’s Awesome further details
how the packages flow in CentOS Stream and RHEL. Since the RHEL nightly build should be releasable any
day to become the next minor version of RHEL we can assume that the ABI compatibilities
are valid for CentOS Stream as well. If these break it is a bug and will be fixed. Another advantage for
us would be that we can continue using Fedora Copr to build our packages.

Red Hat has also announced a free version of RHEL
for developers and small workloads. However, it is not hassle free as it requires registration at Red Hat
to be able to use the version. We believe that this will not be well received by the consumers of Vespa
releases.

Considering the options, we have chosen CentOS Stream 8 as the supported OS for the next major version of
Vespa, which is due this year. The main reasons for this decision are:

  • The OS is available to anyone free of charge
  • We will still be able to leverage the Copr build system for open source Vespa artifacts
  • Updates and security fixes will be available earlier than for distributions downstream RHEL
  • CentOS Stream is close enough to what we use in production to be able to deliver open source Vespa releases with high confidence

This means that we will still build and distribute Vespa RPMs on Copr and we will use the quay.io/centos/centos:stream8 as the base for the container image on Docker Hub.

The consumers of Vespa artifacts can be impacted by this change depending on their environments. For those
that use our RPMs to install directly on VMs or bare metal hosts, the OS will have to be upgraded to CentOS
Stream 8 to be compatible. The OS must also be kept up to date as the Vespa releases will be based on the
state of CentOS Stream 8 at build time. Consumers of the container images will need to upgrade the hosts to
an OS close enough or newer than the CentOS Stream kernel which is currently 4.18.0.

We have selected CentOS Stream 8 and this will be the only supported OS for Vespa 8 artifacts. This decision
will impact those that consume Vespa artifacts like RPMs or container images to various degrees.

Vespa support in langchain | Vespa Blog

If you are experimenting using large language models (LLMs) in your applications you are probably
doing that with Langchain. That has become much simpler now with the addition of a Vespa retriever in Langchain.
See the documentation on the Langchain site.

Also, thanks to Langchain creator Harrison Chase for that perceptive observation
about all you Vespa builders!