How Using Screwdriver for CI/CD Reduced Vespa’s Time Spent on Builds and Pull Requests by 75%

By Arnstein Ressem, Principal Software Systems Engineer, Verizon Media

When Vespa was open sourced in 2017 we looked for a continuous integration platform to build our source code on. We looked at several hosted solutions as well as Screwdriver – an open source CI/CD platform built by Yahoo/Verizon Media – that had just been open sourced in 2016. Another platform seemed the best fit for us at that point in time and we integrated with that.

Decorative image

Photo by Bill Oxford on Unsplash

The Vespa codebase is large with approximately 700 KLOC C++, 700 KLOC Java and more than 10k unit tests. For a given version of Vespa we build the complete codebase and version the artifacts with semantic versioning. We always build from the master branch and have no feature branches.

Compiling and testing this codebase is resource demanding and we soon realized that the default VMs that the provider had were not up to the task and took more than 2 hours to complete. This was a serious issue for the developers waiting for feedback on their pull requests. We ended up subscribing to a premium plan and did more caching of Maven artifacts and compiled C++ objects (ccache) to bring the build time just under one hour.

In the fall of 2020 we became aware of big changes in the selected CI/CD platform and we needed to migrate to something else. As part of this work we took another look at the open sourced version of Screwdriver as we knew that the project had significantly matured over the past years. Screwdriver is an open source build platform designed for Continuous Delivery that can easily be deployed on different IaaS providers and is currently an incubee in the Continuous Delivery Foundation.

Screwdriver

Vespa pipeline on Screwdriver

The Vespa team got access to a hosted instance at cd.screwdriver.cd (invite only, but publicly readable with guest access). Working closely with Screwdriver we were able to reduce the build times for the master branch and pull requests from 50 minutes on the previous solution to 18 minutes. This result was obtained by using Screwdriver’s configurable resource management and fast build caches. We also appreciated the small set of requirements on container images allowing us to optimize the build image for our jobs.

Github

Screwdriver integrated with pull request builds on GitHub

To further increase the developer feedback and productivity we decided to do some pull request analysis to check if only C++ or Java source code was touched. In those cases we could only build and test for the respective language. This brought the pull request build times from 18 down to 12 minutes for C++ and 8 minutes for Java. This allowed developers to have more issues discovered in pull requests without having to wait for a long time for the review and merge.

We are very happy with having the time spent on builds and pull requests reduced by 75% on average and this leads to better productivity and happier developers.

Enhancing Vespa’s Embedding Management Capabilities

Decorative
image

Photo by
vnwayne fan
on Unsplash

We are thrilled to announce significant updates to Vespa’s support for inference with text embedding models
that maps texts into vector representations: General support for Huggingface models including multi-lingual embedding, embedding inference on GPUs, and new recommended models available on the Vespa Cloud model hub.

Vespa’s best-in-class vector and multi-vector
search support and inferences with embedding models
allow developers to build feature-rich semantic search applications
without managing separate systems for embedding inference and vector search over embedding representations.

embedding made easy
Vespa query request using embed
functionality to produce the vector embedding inside Vespa.

About text embedding models

Text embedding models have revolutionized natural language processing (NLP) and information retrieval tasks
by capturing the semantic meaning of unstructured text data.
Unlike traditional representations that treat words as discrete symbols,
embedding models maps text into continuous vector spaces.

multilingual embedding model

Embedding models trained on multilingual datasets can represent concepts across different languages enabling information retrieval across
diverse linguistic contexts.

Embedder Models from Huggingface

Vespa now comes with generic support for embedding models hosted on Huggingface.

With the new Huggingface Embedder functionality,
developers can export embedding models from Huggingface
and import them in ONNX format in Vespa for accelerated inference close to where the data is created:

<container id="default" version="1.0">
    <component id="my-embedder-id" type="hugging-face-embedder">
        <transformer-model model-id="cloud-model-id"
                           path="my-models/model.onnx"/>
        <tokenizer-model   model-id="cloud-model-id"
                           path="my-models/tokenizer.json"/>
    </component>
    ...
</container>

The Huggingface Embedder also supports multilingual embedding models that handle 100s of languages.
Multilingual embedding representations open new possibilities for cross-lingual applications
using Vespa linguistic processing
and multilingual vector representations to implement
hybrid search.
The new Huggingface Embedder also supports
multi-vector representations,
simplifying deploying semantic search applications at scale
without maintaining complex fan-out relationships due to model input context length constraints.
Read more about the Huggingface embedder in the
documentation.

GPU Acceleration of Embedding Models

Vespa now supports GPU acceleration of embedding model inferences.
By harnessing the power of GPUs, Vespa embedders can efficiently process large amounts of text data,
resulting in faster response times, improved scalability, and lower cost.
GPU support in Vespa also unlocks using larger and more powerful embedding models
while maintaining low serving latency and cost-effectiveness.

GPU acceleration is automatically enabled in Vespa Cloud for instances where GPU(s) is available.
Configure your stateless Vespa container cluster with a GPU resource in services.xml.
For open-source Vespa, specify the GPU device using the
embedder ONNX configuration.

Vespa Model Hub Updates

To make it easier to create embedding applications,
we have added new state-of-the-art text embedding models on the Vespa Model Hub for
Vespa Cloud users. The Vespa Model Hub is a centralized repository of selected models,
making it easier for developers to discover and use powerful open-source embedding models.

This expansion of the model hub provides developers with a broader range of embedding options.
It empowers them to make tradeoffs related to embedding quality, inference latency,
and embedding dimensionality-related resource footprint.

We expand the hub with the following open-source text embedding models:

Embedding ModelDimensionalityMetricLanguageVespa Hub Model Id
e5-small-v2384angularEnglishe5-small-v2
e5-base-v2768angularEnglishe5-base-v2
e5-large-v21024angularEnglishe5-large-v2
multilingual-e5-base768angularMultilingualmultilingual-e5-base

These embedding models perform strongly on various tasks,
as demonstrated on the MTEB: Massive Text Embedding Benchmark leaderboard.
The MTEB includes 56 datasets across 8 tasks, such as semantic search, clustering, classification, and re-ranking.

MTEB
MTEB Leaderboard, notice the strong performance of the E5-v2 models

Developers using Vespa Cloud can add these embedding models to their application by referencing the Vespa Cloud Model hub identifier:

<component id="e5" type="hugging-face-embedder">
    <transformer-model model-id="e5-small-v2"/>
</component>

With three lines of configuration added to the Vespa app, Vespa cloud developers can use the embed funcionality for
embedding queries and embedding document fields.

Producing the embeddings closer to the Vespa storage and indexes avoids network transfer-related latency and egress costs,
which can be substantial for high-dimensional vector representations.
In addition, with Vespa Cloud’s auto-scaling feature,
developers do not need to worry about scaling with changes in inference traffic volume.

Vespa Cloud also allows bringing your own models using the HuggingFace Embedder
with model files submitted in the application package. In Vespa Cloud, inference with embedding models is
automatically accelerated with GPU if the application uses Vespa Cloud GPU instances.
Read more on the Vespa Cloud model hub.

Summary

The improved Vespa embedding management options offer a significant leap in capabilities for anybody working with embeddings in online applications,
enabling developers to leverage state-of-the-art models, accelerate inference with GPUs,
and access a broader range of embedding options through the Vespa model hub.
All this functionality is available in Vespa version 8.179.37 and later.

Got questions? Join the Vespa community in Vespa Slack.