Introducing ONNX support | Vespa Blog

ONNX (Open Neural Network eXchange) is an open format for the sharing of neural network and other machine learned models between various machine learning and deep learning frameworks. As the open big data serving engine, Vespa aims to make it simple to evaluate machine learned models at serving time at scale. By adding ONNX support in Vespa in addition to our existing TensorFlow support, we’ve made it possible to evaluate models from all the commonly used ML frameworks with low latency over large amounts of data.


With the rise of deep learning in the last few years, we’ve naturally enough seen an increase of deep learning frameworks as well: TensorFlow, PyTorch/Caffe2, MxNet etc. One reason for these different frameworks to exist is that they have been developed and optimized around some characteristic, such as fast training on distributed systems or GPUs, or efficient evaluation on mobile devices. Previously, complex projects with non-trivial data pipelines have been unable to pick the best framework for any given subtask due to lacking interoperability between these frameworks. ONNX is a solution to this problem.


ONNX is an open format for AI models, and represents an effort to push open standards in AI forward. The goal is to help increase the speed of innovation in the AI community by enabling interoperability between different frameworks and thus streamlining the process of getting models from research to production.

There is one commonality between the frameworks mentioned above that enables an open format such as ONNX, and that is that they all make use of dataflow graphs in one way or another. While there are differences between each framework, they all provide APIs enabling developers to construct computational graphs and runtimes to process these graphs. Even though these graphs are conceptually similar, each framework has been a siloed stack of API, graph and runtime. The goal of ONNX is to empower developers to select the framework that works best for their project, by providing an extensible computational graph model that works as a common intermediate representation at any stage of development or deployment.

Vespa is an open source project which fits well within such an ecosystem, and we aim to make the process of deploying and serving models to production that have been trained on any framework as smooth as possible. Vespa is optimized toward serving and evaluating over potentially very large datasets while still responding in real time. In contrast to other ML model serving options, Vespa can more efficiently evaluate models over many data points. As such, Vespa is an excellent choice when combining model evaluation with serving of various types of content.

Our ONNX support is quite similar to our TensorFlow support. Importing ONNX models is as simple as adding the model to the Vespa application package (under “models/”) and referencing the model using the new ONNX ranking feature:

    expression: sum(onnx("my_model.onnx"))

The above expression runs the model and sums it to a single scalar value to use in ranking. You will have to provide the inputs to the graph. Vespa expects you to provide a macro with the same name as the input tensor. In the macro you can specify where the input should come from, be it a document field, constant or a parameter sent along with the query. More information can be had in the documentation about ONNX import.

Internally, Vespa converts the ONNX operations to Vespa’s tensor API. We do the same for TensorFlow import. So the cost of evaluating ONNX and TensorFlow models are the same. We have put a lot of effort in optimizing the evaluation of tensors, and evaluating neural network models can be quite efficient.

ONNX support is also quite new to Vespa, so we do not support all current ONNX operations. Part of the reason we don’t support all operations yet is that some are potentially too expensive to evaluate per document, such as convolutional neural networks and recurrent networks (LSTMs etc). ONNX also contains an extension, ONNX-ML, which contains additional operations for non-neural network cases. Support for this extension will come later at some point. We are continually working to add functionality, so please reach out to us if there is something you would like to have added.

Going forward we are continually working on improving performance as well as supporting more of the ONNX (and ONNX-ML) standard. You can read more about ranking with ONNX models in the Vespa documentation. We are excited to announce ONNX support. Let us know what you are building with it!

Vespa Product Updates, December 2018: ONNX Import and Map Attribute Grouping

Hi Vespa Community!

Today we’re kicking off a blog post series of need-to-know updates on Vespa, summarizing the features and fixes detailed in Github issues.

We welcome your contributions and feedback about any new features or improvements you’d like to see.

For December, we’re excited to share the following product news:

Streaming Search Performance Improvement
Streaming Search is a solution for applications where each query only searches a small, statically determined subset of the corpus. In this case, Vespa searches without building reverse indexes, reducing storage cost and making writes more efficient. With the latest changes, the document type is used to further limit data scanning, resulting in lower latencies and higher throughput. Read more here.

ONNX Integration
ONNX is an open ecosystem for interchangeable AI models. Vespa now supports importing models in the ONNX format and transforming the models into Tensors for use in ranking. This adds to the TensorFlow import included earlier this year and allows Vespa to support many training tools. While Vespa’s strength is real-time model evaluation over large datasets, to get started using single data points, try the stateless model evaluation API. Explore this integration more in Ranking with ONNX models.

Precise Transaction Log Pruning
Vespa is built for large applications running continuous integration and deployment. This means nodes restart often for software upgrades, and node restart time matters. A common pattern is serving while restarting hosts one by one. Vespa has optimized transaction log pruning with prepareRestart, due to flushing as much as possible before stopping, which is quicker than replaying the same data after restarting. This feature is on by default. Learn more in live upgrade and prepareRestart.

Grouping on Maps
Grouping is used to implement faceting. Vespa has added support to group using map attribute fields, creating a group for values whose keys match the specified key, or field values referenced by the key. This support is useful to create indirections and relations in data and is great for use cases with structured data like e-commerce. Leverage key values instead of field names to simplify the search definition. Read more in Grouping on Map Attributes.

Questions or suggestions? Send us a tweet or an email.

Vespa Product Updates, December 2019: Improved ONNX support, New rank feature attributeMatch().maxWeight, Free lists for attribute multivalue mapping, faster updates for out-of-sync documents, Zookeeper 3.5.6

Kristian Aune

Kristian Aune

Head of Customer Success, Vespa

In the November Vespa product update, we mentioned Nearest Neighbor and Tensor Ranking, Optimized JSON Tensor Feed Format, Matched Elements in Complex Multi-value Fields, Large Weighted Set Update Performance and Datadog Monitoring Support.

Today, we’re excited to share the following updates:

Improved ONNX Support

Vespa has added more operations to its ONNX model API, such as GEneral Matrix to Matrix Multiplication (GEMM) –
see list of supported opsets.
Vespa has also improved support for PyTorch through ONNX,
see the example.

New Rank Feature attributeMatch().maxWeight

attributeMatch(name).maxWeight was added in Vespa-7.135.5. The value is  the maximum weight of the attribute keys matched in a weighted set attribute.

Free Lists for Attribute Multivalue Mapping

Since Vespa-7.141.8, multivalue attributes uses a free list to improve performance. This reduces CPU (no compaction jobs) and approximately 10% memory. This primarily benefits applications with a high update rate to such attributes.

Faster Updates for Out-of-Sync Documents

Vespa handles replica consistency using bucket checksums. Updating documents can be cheaper than putting a new document, due to less updates to posting lists. For updates to documents in inconsistent buckets, a GET-UPDATE is now used instead of a GET-PUT whenever the document to update is consistent across replicas. This is the common case when only a subset of the documents in the bucket are out of sync. This is useful for applications with high update rates, updating multi-value fields with large sets. Explore details here.

ZooKeeper 3.5.6

Vespa now uses Apache ZooKeeper 3.5.6 and can encrypt communication between ZooKeeper servers.

About Vespa: Largely developed by Yahoo engineers, Vespa is an open source big data processing and serving engine. It’s in use by many products, such as Yahoo News, Yahoo Sports, Yahoo Finance, and the Verizon Media Ad Platform. Thanks to feedback and contributions from the community, Vespa continues to grow.

We welcome your contributions and feedback (tweet or email) about any of these new features or future improvements you’d like to request.