A new visual identity for a new era

Two months after announcing Vespa.ai becoming an independent company,
we have reached another small – but important – milestone on our journey into the new era;
unveiling our new logo and visual identity.

New Vespa logo

The overall brand identity reflects Vespa’s personality as a bold, Scandinavian company. The
geometric and forward-leaning logo is inspired by the no-nonsense and unbeatable performance. The
symbol can be interpreted as a simple box, the initial letter V from “Vespa” or simple brackets. But
it can also be seen as a three-dimensional and scalable container of data, thus answering the
concept of “Moving mountains, not data”.

In addition to the new logo, we are updating our typographic palette, ensuring a professional yet
available tone of voice and cross-platform legibility. The defined color scheme picks up on elements
from Scandinavian elements such as heather, rocks, light, and glaciers. Our high contrast brand
icons recognize the 45° angle from the logo, but at the same time the thin hairlines point to
Vespa’s high level of performance and precision. Our photographic style reflects our Scandinavian
origin; Vespa is solid like mountains, agile like water, and complex like a finely tuned ecosystem.

The updated profile will be rolling out in the weeks to come and while this is an important visual
hint of a new era, we are even more excited about the upcoming new and exciting features for both
the Vespa platform and Vespa Cloud that we will be announcing in the
months to come.

🎄Advent of Tensors 2023 🎅

Andreas Eriksen

Andreas Eriksen

Senior Vespa Engineer

Jo Kristian Bergum
Jo Kristian Bergum

Vespa Solutions Architect

Greetings, Vespa enthusiasts! Prepare to embark on a festive journey as we bring you the Advent of Tensors.
In this advent, we’ll be diving into the magical world of Vespa Tensors, combining the joy of the holiday season with the power of expressing distributed real-time tensor computations over evolving datasets.

🌟 What’s Vespa Tensors?

Vespa Tensors, the enchanting framework behind this festive challenge, is not your commodity vector similarity search library.
Developed by elves close to the North Pole, Vespa Tensors adds a touch of magic to your real-time big data serving use cases,
offering powerful tensor operations for all your AI real-time serving workloads. Vespa tensors offer much functionality beyond simple similarity search across a single vector representation, enabling features like multi-vector indexing or feature computations for recommender systems
to name a few AI-powered use-cases using Vespa tensors.

You can learn more about Vespa tensors in the Vespa tensor user guide, or if you prefer a blog
post form: Computing with tensors in Vespa.
All challenges are expressed and solved in the Vespa tensor playground which runs in your browser with zero dependencies.

✨ Daily Challenges for 24 Days

For the next 24 days, we’ll unravel a new tensor coding challenge each day, designed to stretch your skills and explore the fascinating capabilities of Vespa Tensors.
Whether you’re a seasoned Vespa expert or a curious beginner, these challenges are sure to spark your interest. Each day, a new challenge will be unlocked in the table below, and the solution to the previous day’s challenge will be published.

🔍 Explore the Magic of Tensors

Discover the magic behind tensor computations and learn how Vespa Tensors can help solve complex real-time AI serving problems. From expressing gift recommendation systems to facial recognition systems for tracking who’s been nice or naughty, each challenge is a delightful exploration into the capabilities of the Vespa tensor framework.

🚀 Spread the Holiday Cheer in Code

Join us on this fun sleigh ride and spread the holiday cheer by expressing compute over data with tensor expressions.
Whether you’re coding by the fireplace or sipping hot cocoa, let the advent of Tensors add a sprinkle of magic to your advent routine. So, grab your coding hat, put on your festive coding sweater, and get ready for 24 days of tensor computations.
Submit your solution to the form associated with each challenge, and get the chance to win exclusive Vespa swag!

Ps, if you are interested in discussing the challenges with many other Vespa enthusiasts, come join us in the festive Vespa Slack space.

The ✨ 2023 Challenges ✨

Day Challenge Playground LinkSolution Link
1Santa’s Festive Weigh-In! 🎅link
2Santa’s Fitness Challenge! 🏋️‍♂️
3Santa’s Grand Weight Gain Gala! 🎉
4Distance to Dasher 🦌
5Roaming Reindeer 🦌🦌🦌
6Who’s been Naughty or Nice 🎁
7The Great Reindeer Rally! 🦌
8 Let It Snow! ☃️🌨️🌍
9Carol’s Festive 80s Playlist 🎶
10Lost in New York 3: Taxicab distance 🏙️
11Santa’s Sleigh Soberness Test 🍸
12Santa’s Bag Comparison Bonanza! 🎁✨
13Festive Jaccard Jamboree! 🎁🔍✨
14Santa’s Embedding Retrieval System For Gift Matching 🎁🔍✨
15Santa’s International Taxing Quest! 🌍💸
16Santa’s Behavioral Analytics! 📊🎅
17Santa’s Chimney Area Calculator! 🏠📏
18Reduced to Tears 🎁🤔
19Santa’s Face Recognition System🔍✨
20Margins and Mistletoe! 💰✨
21Gift Quality Check with Chebyshev! 🎁📊
22Celsius Conversion Quest! 🌡️🔍✨
23Santa’s Face Recognition System – A/B Testing 🔍✨
24Scrambling Santas 🎅🎅🎅

Hands-On RAG guide for personal data with Vespa and LLamaIndex

This blog post is a hands-on RAG tutorial demonstrating how to use Vespa streaming mode for cost-efficient retrieval of personal data. You can read more about Vespa streaming search in these two blog posts:

This blog post is also available as a runnable notebook where you can have this app up and running on
Vespa Cloud in minutes
Open In Colab

The blog post covers:

  • Configuring Vespa and using Vespa streaming mode with PyVespa.
  • Using Vespa native built-in embedders in combination with streaming mode.
  • Ranking in Vespa, including hybrid retrieval and ranking methods, freshness (recency) features, and Vespa Rank Fusion.
  • Query federation and blending retrieved results from multiple sources/schemas.
  • Connecting LLamaIndex retrievers with a Vespa app to build generative AI pipelines.

TLDR; Vespa streaming mode

Vespa’s streaming search solution lets you make the user a part of the document ID so that Vespa can use it to co-locate the data of each user on a small set of nodes and the same chunk of disk.
Streaming mode allows searching over a user’s data with low latency without keeping any user’s data in memory or paying the cost of managing indexes.

  • There is no accuracy drop for vector search as it uses exact vector search
  • Several orders of magnitude higher write throughput (No expensive index builds to support approximate search)
  • Documents (including vector data) are 100% disk-based, significantly reducing deployment cost
  • Queries are restricted to content by the user ID/(groupname)

Storage cost is the primary cost driver of Vespa streaming mode; no data is in memory. Avoiding memory usage lowers deployment costs significantly.
For example, Vespa Cloud allows storing streaming mode data at below 0.30$ per GB/month. Yes, that is per month.

Getting started with LLamaIndex and PyVespa

The focus is on using the streaming mode feature in combination with multiple Vespa schemas; in our case,
we imagine building RAG over personal mail and calendar data, allowing effortless query federation and blending
of the results from multiple data sources for a given user.

First, we must install dependencies:

! pip3 install pyvespa llama-index

Synthetic Mail & Calendar Data

There are few public email datasets because people care about their privacy, so this notebook uses synthetic data to examine how to use Vespa streaming mode.
We create two generator functions that return Python dicts with synthetic mail and calendar data.

Notice that the dict has three keys:

This is the expected feed format for PyVespa feed operations and
where PyVespa will use these to build a Vespa document v1 API request(s).
The groupname key is only relevant with streaming mode.


from typing import List

def synthetic_mail_data_generator() -> List[dict]:
    synthetic_mails = [
            "id": 1,
            "groupname": "[email protected]",
            "fields": {
                "subject": "LlamaIndex news, 2023-11-14",
                "to": "[email protected]",
                "body": """Hello Llama Friends 🦙 LlamaIndex is 1 year old this week! 🎉 To celebrate, we're taking a stroll down memory 
                    lane on our blog with twelve milestones from our first year. Be sure to check it out.""",
                "from": "[email protected]",
                "display_date": "2023-11-15T09:00:00Z"
            "id": 2,
            "groupname": "[email protected]",
            "fields": {
                "subject": "Dentist Appointment Reminder",
                "to": "[email protected]",
                "body": "Dear Jo Kristian ,\nThis is a reminder for your upcoming dentist appointment on 2023-12-04 at 09:30. Please arrive 15 minutes early.\nBest regards,\nDr. Dentist",
                "from": "[email protected]",
                "display_date": "2023-11-15T15:30:00Z"
            "id": 1,
            "groupname": "[email protected]",
            "fields": {
                "subject": "Wildlife Update: Giraffe Edition",
                "to": "[email protected]",
                "body": "Dear Wildlife Enthusiasts 🦒, We're thrilled to share the latest insights into giraffe behavior in the wild. Join us on an adventure as we explore their natural habitat and learn more about these majestic creatures.",
                "from": "[email protected]",
                "display_date": "2023-11-12T14:30:00Z"
            "id": 1,
            "groupname": "[email protected]",
            "fields": {
                "subject": "Antarctica Expedition: Penguin Chronicles",
                "to": "[email protected]",
                "body": "Greetings Explorers 🐧, Our team is embarking on an exciting expedition to Antarctica to study penguin colonies. Stay tuned for live updates and behind-the-scenes footage as we dive into the world of these fascinating birds.",
                "from": "[email protected]",
                "display_date": "2023-11-11T11:45:00Z"
            "id": 1,
            "groupname": "[email protected]",
            "fields": {
                "subject": "Space Exploration News: November Edition",
                "to": "[email protected]",
                "body": "Hello Space Enthusiasts 🚀, Join us as we highlight the latest discoveries and breakthroughs in space exploration. From distant galaxies to new technologies, there's a lot to explore!",
                "from": "[email protected]",
                "display_date": "2023-11-01T16:20:00Z"
            "id": 1,
            "groupname": "[email protected]",
            "fields": {
                "subject": "Ocean Discovery: Hidden Treasures Unveiled",
                "to": "[email protected]",
                "body": "Dear Ocean Explorers 🌊, Dive deep into the secrets of the ocean with our latest discoveries. From undiscovered species to underwater landscapes, our team is uncovering the wonders of the deep blue.",
                "from": "[email protected]",
                "display_date": "2023-10-01T10:15:00Z"
    for mail in synthetic_mails:
        yield mail  


Similarily, for calendar data

from typing import List

def synthetic_calendar_data_generator() -> List[dict]:
    calendar_data = [
            "id": 1,
            "groupname": "[email protected]",
            "fields": {
                "subject": "Dentist Appointment",
                "to": "[email protected]",
                "body": "Dentist appointment at 2023-12-04 at 09:30 - 1 hour duration",
                "from": "[email protected]",
                "display_date": "2023-11-15T15:30:00Z",
                "duration": 60,
            "id": 2,
            "groupname": "[email protected]",
            "fields": {
                "subject": "Public Cloud Platform Events",
                "to": "[email protected]",
                "body": "The cloud team continues to push new features and improvements to the platform. Join us for a live demo of the latest updates",
                "from": "public-cloud-platform-events",
                "display_date": "2023-11-21T09:30:00Z",
                "duration": 60,
    for event in calendar_data:
        yield event

Definining a Vespa application

PyVespa helps us build the Vespa application package.
A Vespa application package comprises configuration files, code (plugins), and models.

We define two Vespa schemas for our mail and calendar data. PyVespa
offers a programmatic API for creating the schema. Ultimately, the programmatic representation is serialized to files (<schema-name>.sd).

In the following we define the fields and their type. Note that we set mode to streaming,
which enables Vespa streaming mode for this schema.
Other valid modes are indexed and store-only.

mail schema

from vespa.package import Schema, Document, Field, FieldSet, HNSW
mail_schema = Schema(
                    Field(name="id", type="string", indexing=["summary", "index"]),
                    Field(name="subject", type="string", indexing=["summary", "index"]),
                    Field(name="to", type="string", indexing=["summary", "index"]),
                    Field(name="from", type="string", indexing=["summary", "index"]),
                    Field(name="body", type="string", indexing=["summary", "index"]),
                    Field(name="display_date", type="string", indexing=["summary"]),
                    Field(name="timestamp", type="long", indexing=["input display_date", "to_epoch_second", "summary", "attribute"], is_document_field=False),
                    Field(name="embedding", type="tensor<bfloat16>(x[384])",
                        indexing=["\"passage: \" . input subject .\" \". input body", "embed e5", "attribute", "index"],
                FieldSet(name = "default", fields = ["subject", "body", "to", "from"])

In the mail schema, we have six document fields; these are provided by us when we feed documents of type mail to this app.
The fieldset defines
which fields are matched against when we do not mention explicit field names when querying. We can add as many fieldsets as we like without duplicating content.

In addition to the fields within the document, there are two synthetic fields in the schema, timestamp, and embedding,
using Vespa indexing expressions
taking inputs from the document and performing conversions.

  • the timestamp field takes the input display_date and uses the to_epoch_second converter converter to convert the
    display date into an epoch timestamp. This is useful because we can calculate the document’s age and use the freshness(timestamp) rank feature during ranking phases.
  • the embedding tensor field takes the subject and body as input. It feeds that into an embed function that uses an embedding model to map the string input into an embedding vector representation
    using 384-dimensions with bfloat16 precision. Vectors in Vespa are represented as Tensors.

calendar schema

from vespa.package import Schema, Document, Field, FieldSet, HNSW
calendar_schema = Schema(
                    Field(name="duration", type="int", indexing=["summary", "index"]),
                    Field(name="guests", type="array<string>", indexing=["summary", "index"]),
                    Field(name="location", type="string", indexing=["summary", "index"]),
                    Field(name="url", type="string", indexing=["summary", "index"]),
                    Field(name="address", type="string", indexing=["summary", "index"])

The calendar schema inherits from the mail schema, meaning we don’t have to define the embedding field for the
calendar schema.

Configuring embedders

The observant reader might have noticed the e5 argument to the embed expression in the above mail schema embedding field.
The e5 argument references a component of the type hugging-face-embedder. In this
example, we use the e5-small-v2 text embedding model that maps text to 384-dimensional vectors.

from vespa.package import ApplicationPackage, Component, Parameter

vespa_app_name = "assistant"
vespa_application_package = ApplicationPackage(
        schema=[mail_schema, calendar_schema],
        components=[Component(id="e5", type="hugging-face-embedder",
                Parameter("transformer-model", {"url": "https://github.com/vespa-engine/sample-apps/raw/master/simple-semantic-search/model/e5-small-v2-int8.onnx"}),
                Parameter("tokenizer-model", {"url": "https://raw.githubusercontent.com/vespa-engine/sample-apps/master/simple-semantic-search/model/tokenizer.json"})

We share and reuse the same embedding model for both schemas. Note that embedding inference is resource-intensive.


In the last step of configuring the Vespa app, we add ranking profiles by adding rank-profile’s to the schemas. Vespa supports phased ranking and has a rich set of built-in rank-features.

One can also define custom functions with ranking expressions.

from vespa.package import RankProfile, Function, GlobalPhaseRanking, FirstPhaseRanking

keywords_and_freshness = RankProfile(
        name="my_function", expression="nativeRank(subject) + nativeRank(body) + freshness(timestamp)"
    match_features=["nativeRank(subject)", "nativeRank(body)", "my_function", "freshness(timestamp)"],

semantic = RankProfile(
        name="cosine", expression="max(0,cos(distance(field, embedding)))"
    inputs=[("query(q)", "tensor<float>(x[384])"), ("query(threshold)","", "0.75")],
        expression="if(cosine > query(threshold), cosine, -1)",
    match_features=["cosine", "freshness(timestamp)", "distance(field, embedding)", "query(threshold)"],

fusion = RankProfile(
            name="keywords_and_freshness", expression=" nativeRank(subject) + nativeRank(body) + freshness(timestamp)"
            name="semantic", expression="cos(distance(field,embedding))"

    inputs=[("query(q)", "tensor<float>(x[384])"), ("query(threshold)", "", "0.75")],
        expression="if(cosine > query(threshold), cosine, -1)"

Changes in OS support for Vespa

Photo by Claudio Schwarz
on Unsplash

Currently, we support CentOS Stream 8 for open-source Vespa, and we announced that
in the Upcoming changes in OS support for Vespa blog
post in 2022. The choice to use CentOS Stream was made around the time that RedHat announced the EOL for CentOS 8 and the
new CentOS Stream initiative.
Other groups were scrambling to be the successor to CentOS, and the landscape was not settled. This is now about to change.


We are committed to providing Vespa releases that we have high confidence in. Internally, at Vespa.ai,
we have migrated to AlmaLinux 8 and
Red Hat Enterprise Linux 8. CentOS Stream 8 is
also approaching its EOL, which is May 31st, 2024. Because of this, we want to change the supported OS for open-source Vespa.

Vespa is released up to 4 times a week depending on internal testing and battle-proven verification in our production
systems. Each high-confidence version is published as RPMs and a container image.
RPMs are built and published on Copr, and container images are
published on Docker Hub. In this blog post, we will look at options going
forward and announce the selected OS support for the upcoming Vespa releases.


There is a wide selection of Linux distributions out there with different purposes and target groups. For us, we need to choose
an OS that is as close to what we use internally as possible, and that is acceptable for our open-source users. These
criteria limit the options significantly to an enterprise Linux-based distribution.


Binary releases of Vespa is a set of RPMs. This set has to be built somewhere and must be uploaded to a
repository where it can be downloaded by package managers. These RPMs are then installed either on a host machine or in
container images. We will still build our RPMs on Copr, but we
have a choice there to compile for different environments that are either downstream or upstream of RHEL. In the time
between the EOL of CentOS 8 (end 2021) and now, Copr has added support to build for
EPEL 8 and 9. This means that we can build for EPEL 8 and install it on
RHEL 8 and its downstream clones.

Distribution of RPMs is currently done on Copr as the built RPMs are directly put in the repository there. The repositories
have limited capacity, and Copr only guarantees that the latest version is available. It would be nice to have an archive
of more than just the most recent version, but this will rely on vendors offering space and network traffic for the RPMs
to be hosted.

Container images

Given the choice of building RPMs on Copr for EPEL 8, this opens up a few options when selecting a base image for our
container image:

We should be able to select any of the above due to the RHEL ABI compatibility
and the distributions’ respective guarantees to be binary compatible with RHEL.

Red Hat has also announced a free version of RHEL
for developers and small workloads. However, it is not hassle-free, as it requires registration at Red Hat
to be able to use the version. We believe that this will not be well received by the consumers of Vespa


Container Image

Considering the options, we have chosen AlmaLinux 8 as the supported OS going forward. The main reasons for this decision

  • AlmaLinux is used in the Vespa Cloud production systems
  • The OS is available to anyone free of charge
  • We will still be able to leverage the Copr build system for open-source Vespa artifacts

We will use the docker.io/almalinux:8 image as the base for the Vespa container image on Docker Hub.

RPM distribution

Although we will continue to build RPMs on Copr, we are going to switch to a new RPM repository that can keep an archive
of a limited set of historic releases. We have been accepted as an open-source project at
Cloudsmith and will use the
vespa/open-source-rpms repository to distribute our RPMs.
Cloudsmith generously allows qualified open-source projects to store 50 GB and have 200 GB of network traffic. The
vespa-engine.repo repository definition
will be updated shortly, and information about how to install Vespa from RPMs can be found in the
documentation. Within our storage limits, we will be able to store
approximately 50 Vespa releases.

Compatibility for current Vespa installations

The consumers of Vespa container images should not notice any differences when Vespa changes the base of the container
image to AlmaLinux 8. Everything comes preinstalled in the image, and this is tested the same way as it was before. If
you use the Vespa container image as a base of custom images, the differences between the latest AlmaLinux 8 and CentOS
Stream 8 are minimal. We do not expect any changes to be required.

For consumers that install Vespa RPMs in their container images or install directly on host instances, we will continue
to build and publish RPMs for CentOS Stream 8 on Copr until Dec 31st, 2023. RPMs built on EPEL 8 will be forward compatible
with CentOS Stream 8 due to the RHEL ABI compatibility. This
means that you can make the switch by replacing the repository configuration with the one defined in
vespa-engine.repo the next time Vespa
is upgraded. If you do not, no new Vespa versions will be available for upgrade once we stop building for CentOS Stream 8.

Future OS support

Predicting the path of future OS support is not trivial in an environment where the landscape is changing. RedHat
announced closing off the RHEL sources and strengthening
CentOS Stream. Open Enterprise Linux Association has popped up as a response to this, and AlmaLinux
commits to binary compatibility. We expect the landscape to change, and
hopefully, we will have more clarity when deciding on the next OS to support.

Regardless of the landscape, we are periodically publishing a preview on AlmaLinux 9 that can be used at your own risk
here. Please use this for testing purposes only.


We have selected AlmaLinux 8 as the supported OS for Vespa going forward. The change is expected to have no impact on the
consumers of Vespa container images and RPMs. The primary RPM repository has moved to a Cloudsmith-hosted
repository where we can have an archive of releases allowing installation of not just the latest Vespa version.

Anonymized endpoints and token authentication in Vespa Cloud

Morten Tokle

Morten Tokle

Principal Software Systems Engineer

Martin Polden

Martin Polden

Principal Vespa Engineer

When you deploy a Vespa application on Vespa Cloud your application is assigned an endpoint for each container cluster declared in your application package. This is the endpoint you communicate with when you query or feed documents to your application.

Since the launch of Vespa Cloud these endpoints have included many dimensions identifying your exact cluster, on the following format {service}.{instance}.{application}.{tenant}.{zone}.z.vespa-app.cloud. This format allows easy identification of a given endpoint.

However, while this format makes it easy to identify where an endpoint points, it also reveals details of the application that you might want to keep confidential. This is why we are introducing anonymized endpoints in Vespa Cloud. Anonymized endpoints are created on-demand when you deploy your application and have the format {generated-id}.{generated-id}.{scope}.vespa-app.cloud. As with existing endpoints, details of anonymized endpoints and where they point are shown in the Vespa Cloud Console for your application.

Anonymized endpoints are the now the default for all new applications in Vespa Cloud. They have also been enabled for existing applications but with backward compatibility. This means that endpoints on the old format continue to work for now but are marked as deprecated in the Vespa Cloud console. We will continue to support the previous format for existing applications, but we encourage using the new endpoints.

In addition to making your endpoint details confidential, this new endpoint format allows Vespa Cloud to optimize certificate issuing. It allows for much faster deployments of new applications as they no longer have to wait for a new certificate to be published.

No action is needed to enable this feature. You can find the new anonymized endpoints in the Vespa Console for your application or by running the Vespa CLI command vespa status.

In addition to anonymized endpoints, we are introducing support for data plane authenticating using access tokens. Token authentication is intended for cases where mTLS authentication is unavailable or impractical. For example, edge runtimes like Vercel edge runtime are built on the V8 Javascript engine that does not support mTLS authentication. Access tokens are created and defined in the Vespa Cloud console and referenced in the application package. See instructions for creating and referencing tokens in the application package in the security guide.

Note it’s still required to define a data plane certificate for mTLS authentication; mTLS is still the preferred authentication method for data plane access, and applications configuring token-based authentication will have two distinct endpoints.


Application endpoints in Vespa Console – Deprecated legacy mTLS endpoint name and two anonymized endpoints, one with mTLS support and the other with token authentication. Using token-based authentication on the mTLS endpoint is not supported.

Using data plane authentication tokens

Using the token endpoint from the above screenshot, https://ed82e42a.eeafe078.z.vespa-app.cloud/, we can authenticate against it by
adding a standard Authorization HTTP header to the data plane requests. For example as demonstrated below using curl:

curl -H "Authorization: Bearer vespa_cloud_...." https://ed82e42a.eeafe078.z.vespa-app.cloud/


Using the latest release of pyvespa, you can interact with token endpoints by setting an environment variable named VESPA_CLOUD_SECRET_TOKEN. If this environment variable is present, pyvespa will read this and use it when interacting with the token endpoint.

import os
os.environ['VESPA_CLOUD_SECRET_TOKEN'] = "vespa_cloud_...."
from vespa.application import Vespa
vespa = Vespa(url="https://ed82e42a.eeafe078.z.vespa-app.cloud")

In this case, pyvespa will read the environment variable VESPA_CLOUD_SECRET_TOKEN and use that when interacting with the data plane endpoint of your application. There are no changes concerning control-plane authentication, which requires a valid developer/API key.

We do not plan to add token-based authentication to other Vespa data plane tools like vespa-cli or vespa-feed-client, as these are not designed for lightweight edge runtimes.

Edge Runtimes

This is a minimalistic example of using Cloudflare worker, where we have stored the secret Vespa Cloud token using Cloudflare worker functionality for storing secrets. Note that Cloudflare workers also support mTLS.

export default {
    async fetch(request, env, ctx) {
        const secret_token = env.vespa_cloud_secret_key
        return fetch('https://ed82e42a.eeafe078.z.vespa-app.cloud/', 
                     {headers:{'Authorization': `Bearer ${secret_token}`}})

Consult your preferred edge runtime provider documentation on how to store and access secrets.

Security recommendations

It may be easier to use a token for authentication, but we still recommend using mTLS wherever possible. Before using a token for your application, consider the following recommendations.

Token expiration

While the cryptographic properties of tokens are comparable to certificates, it is recommended that tokens have a shorter expiration. Tokens are part of the request headers and not used to set up the connection. This means they are more likely to be included in e.g. log outputs. The default token expiration in Vespa Cloud is 30 days, but it is possible to create tokens with shorter expiration.

Token secret storage

The token value should be treated as a secret, and never be included in source code. Make sure to use a secure way of accessing the tokens and in such a way that they are not exposed in any log output.


Keeping your data safe is a number one priority for us. With these changes, we continue to improve the developer friendliness of Vespa Cloud while maintaining the highest level of security. With anonymized endpoints, we improve deployment time for new applications by several minutes, avoiding waiting for certificate issuing. Furthermore, anonymized endpoints eliminate disclosing tenant and application details in certificates and DNS entries.

Open Sourcing Vespa, Yahoo’s Big Data Processing and Serving Engine

By Jon Bratseth, Distinguished Architect, Vespa

Ever since we open sourced Hadoop in 2006, Yahoo – and now, Oath – has been committed to opening up its big data infrastructure to the larger developer community. Today, we are taking another major step in this direction by making Vespa, Yahoo’s big data processing and serving engine, available as open source on GitHub.

Building applications increasingly means dealing with huge amounts of data. While developers can use the Hadoop stack to store and batch process big data, and Storm to stream-process data, these technologies do not help with serving results to end users. Serving is challenging at large scale, especially when it is necessary to make computations quickly over data while a user is waiting, as with applications that feature search, recommendation, and personalization.

By releasing Vespa, we are making it easy for anyone to build applications that can compute responses to user requests, over large datasets, at real time and at internet scale – capabilities that up until now, have been within reach of only a few large companies.

Serving often involves more than looking up items by ID or computing a few numbers from a model. Many applications need to compute over large datasets at serving time. Two well-known examples are search and recommendation. To deliver a search result or a list of recommended articles to a user, you need to find all the items matching the query, determine how good each item is for the particular request using a relevance/recommendation model, organize the matches to remove duplicates, add navigation aids, and then return a response to the user. As these computations depend on features of the request, such as the user’s query or interests, it won’t do to compute the result upfront. It must be done at serving time, and since a user is waiting, it has to be done fast. Combining speedy completion of the aforementioned operations with the ability to perform them over large amounts of data requires a lot of infrastructure – distributed algorithms, data distribution and management, efficient data structures and memory management, and more. This is what Vespa provides in a neatly-packaged and easy to use engine.

With over 1 billion users, we currently use Vespa across many different Oath brands – including Yahoo.com, Yahoo News, Yahoo Sports, Yahoo Finance, Yahoo Gemini, Flickr, and others – to process and serve billions of daily requests over billions of documents while responding to search queries, making recommendations, and providing personalized content and advertisements, to name just a few use cases. In fact, Vespa processes and serves content and ads almost 90,000 times every second with latencies in the tens of milliseconds. On Flickr alone, Vespa performs keyword and image searches on the scale of a few hundred queries per second on tens of billions of images. Additionally, Vespa makes direct contributions to our company’s revenue stream by serving over 3 billion native ad requests per day via Yahoo Gemini, at a peak of 140k requests per second (per Oath internal data).

With Vespa, our teams build applications that:

  • Select content items using SQL-like queries and text search
  • Organize all matches to generate data-driven pages
  • Rank matches by handwritten or machine-learned relevance models
  • Serve results with response times in the low milliseconds
  • Write data in real-time, thousands of times per second per node
  • Grow, shrink, and re-configure clusters while serving and writing data

To achieve both speed and scale, Vespa distributes data and computation over many machines without any single master as a bottleneck. Where conventional applications work by pulling data into a stateless tier for processing, Vespa instead pushes computations to the data. This involves managing clusters of nodes with background redistribution of data in case of machine failures or the addition of new capacity, implementing distributed low latency query and processing algorithms, handling distributed data consistency, and a lot more. It’s a ton of hard work!

As the team behind Vespa, we have been working on developing search and serving capabilities ever since building alltheweb.com, which was later acquired by Yahoo. Over the last couple of years we have rewritten most of the engine from scratch to incorporate our experience onto a modern technology stack. Vespa is larger in scope and lines of code than any open source project we’ve ever released. Now that this has been battle-proven on Yahoo’s largest and most critical systems, we are pleased to release it to the world.

Vespa gives application developers the ability to feed data and models of any size to the serving system and make the final computations at request time. This often produces a better user experience at lower cost (for buying and running hardware) and complexity compared to pre-computing answers to requests. Furthermore it allows developers to work in a more interactive way where they navigate and interact with complex calculations in real time, rather than having to start offline jobs and check the results later.

Vespa can be run on premises or in the cloud. We provide both Docker images and rpm packages for Vespa, as well as guides for running them both on your own laptop or as an AWS cluster.

We’ll follow up this initial announcement with a series of posts on our blog showing how to build a real-world application with Vespa, but you can get started right now by following the getting started guide in our comprehensive documentation.

Managing distributed systems is not easy. We have worked hard to make it easy to develop and operate applications on Vespa so that you can focus on creating features that make use of the ability to compute over large datasets in real time, rather than the details of managing clusters and data. You should be able to get an application up and running in less than ten minutes by following the documentation.

We can’t wait to see what you’ll build with it!

The basics of Vespa applications

Distributed computation over large data sets in real-time — what we call big data serving — is a complex task. We have worked hard to hide this complexity to make it as easy as possible to create your own production quality Vespa application.
The quick-start guides
take you through the steps of getting Vespa up and running, deploying a basic application, writing data and issuing some queries to it, but without room for explanation.
Here, we’ll explain the basics of creating your own Vespa application.
The blog search and recommendation tutorial
covers these topics in full detail with hands-on instructions.
Update 2021-05-20: Blog tutorials are replaced by the
News search and recommendation tutorial:

Application packages

The configuration, components and models which makes out an application to be run by Vespa is contained in an application package. The application package:

  • Defines which clusters and services should run and how they should be configured
  • Contains the document types the application will use
  • Contains the ranking models to execute
  • Configures how data will be processed during feeding and indexing
  • Configures how queries will be pre- and post-processed

The three mandatory parts of the application specification are the search definition, the services specification, and the hosts specification — all of which have their own file in the application package.
This is enough to set up a basic production ready Vespa applications, like, e.g., the
sample application.
Most applications however, are much larger and may contain machine-learned ranking models and application specific Java components which perform various application specific tasks such as query enrichment and post-search processing.

The schema definition

Data stored in Vespa is represented as a set of documents of a type defined in the application package. An application can have multiple document types. Each search definition describes one such document type: it lists the name and data type of each field found in the document, and configures the behaviour of these. Examples are like whether field values are in-memory or can be stored on disk, and whether they should be indexed or not. It can also contain ranking profiles, which are used to select the most relevant documents among the set of matches for a given query – and it specifies which fields to return.

The services definition

A Vespa application consists of a set of services, such as stateless query and document processing containers and stateful content clusters. Which services to run, where to run those services and the configuration of those services are all set up in services.xml. This includes the search endpoint(s), the document feeding API, the content cluster, and how documents are stored and searched.

The hosts definition

The deployment specification hosts.xml contains a list of all hosts that is part of the application, with an alias for each of them. The aliases are used in services.xml to define which services is to be started on which nodes.

Deploying applications

After the application package has been constructed, it is deployed using vespa-deploy. This uploads the package to the configuration cluster and pushes the configuration to all nodes. After this, the Vespa cluster is now configured and ready for use.

One of the nice features is that new configurations are loaded without service disruption. When a new application package is deployed, the configuration pushes the new generation to all the defined nodes in the application, which consume and effectuate the new configuration without restarting the services. There are some rare cases that require a restart, the vespa-deploy command will notify when this is needed.

Writing data to Vespa

One of the required files when setting up a Vespa application is the search definition. This file (or files) contains a document definition which defines the fields and their data types for each document type. Data is written to Vespa using Vespa’s JSON document format. The data in this format must match the search definition for the document type.

The process of writing data to Vespa is called feeding, and there are multiple tools that can be used to feed data to Vespa for various use cases. For instance there is a REST API for smaller updates and a Java client that can be embedded into other applications.

An important concept in writing data to Vespa is that of document processors. These processors can be chained together to form a processing pipeline to process each document before indexing. This is useful for many use cases, including enrichment by pulling in relevant data from other sources.

Querying Vespa

If you know the id of the document you want, you can fetch it directly using the document API. However, with Vespa you are usually more interested in searching for relevant documents given some query.

Basic querying in Vespa is done through YQL which is an SQL-like language. An example is:

select title,isbn from music where artist contains "kygo"

Here we select the fields “title” and “isbn” from document type “music” where the field called “artist” contains the string “kygo”. Wildcards (*) are supported in the result fields and the document types to return all available fields in all defined document types.

The example above shows how to send a query to Vespa over HTTP. Many applications choose to build the queries in Java components running inside Vespa instead. Such components are called searchers, and can be used to build or modify queries, run multiple queries for each incoming request and filter and modify results. Similar to the document processor chains, you can set up chains of searchers. Vespa contains a set of default Searchers which does various common operations such as stemming and federation to multiple content clusters.

Ranking models

Ranking executes a ranking expression specified in the search definition on all the documents matching a query. When returning specific documents for a query, those with the highest rank score are returned.

A ranking expression is a mathematical function over features (named values).

Features are either sent with the query, attributes of the document, constants in the application package or features computed by Vespa from both the query and document – example:

rank-profile popularity inherits default {  
    first-phase {  
        expression: 0.7 * nativeRank(title, description) + 0.3 * attribute(popularity)  

Here, each document is ranked by the nativeRank function but boosted by a popularity score. This score can be updated at regular intervals, for instance from user feedback, using partial document updates from some external system such as a Hadoop cluster.

In real applications ranking expressions often get much more complicated than this.

For example, a recommendation application may use a deep neural net to compute a recommendation score, or a search application may use a machine-learned gradient boosted decision tree. To support such complex models, Vespa allows ranking expressions to compute over tensors in addition to scalars. This makes it possible to work effectively with large models and parameter spaces.

As complex ranking models can be expensive to compute over many documents, it is often a good idea to use a cheaper function to find good candidates and then rank only those using the full model. To do this you can configure both a first-phase and second-phase ranking expression, where the second-phase function is only computed on the best candidate documents.

Grouping and aggregation

In addition to returning the set of results ordered by a relevance score, Vespa can group and aggregate data over all the documents selected by a query. Common use cases include:

  • Group documents by unique value of some field.
  • Group documents by time and date, for instance sort bug tickets by date of creation into the buckets Today, Past Week, Past Month, Past Year, and Everything else.
  • Calculate the minimum/maximum/average value for a given field.

Groups can be nested arbitrarily and multiple groupings and aggregations can be executed in the same query.

More information

You should now have a basic understanding of the core concepts in building Vespa applications.
To try out these core features in practice, head on over to the
blog search and recommendation tutorial.
Update 2021-05-20: Blog tutorials are replaced by the
News search and recommendation tutorial.

We’ll post some more in-depth blog posts with concrete examples soon.

Blog search application in Vespa

Update 2021-05-20:
This blog post refers to Vespa sample applications that do not exist anymore.
Please refer to the
News search and recommendation tutorial
for an updated version of text and sample applications.


This is the first of a series of blog posts where data from WordPress.com (WP) is used to highlight how Vespa can be used to store, search and recommend blog posts. The data was made available during a Kaggle challenge to predict which blog posts someone would like based on their past behavior. It contains many ingredients that are necessary to showcase needs, challenges and possible solutions that are useful for those interested in building and deploying such applications in production.

The end goal is to build an application where:

  • Users will be able to search and manipulate the pool of blog posts available.
  • Users will get blog post recommendations from the content pool based on their interest.

This part addresses:

  • How to describe the dataset used as well as any information connected to the data.
  • How to set up a basic blog post search engine using Vespa.

The next parts show how to extend this basic search engine application with machine learned models to create a blog recommendation engine.


The dataset contains blog posts written by WP bloggers and actions, in this case ‘likes’, performed by WP readers in blog posts they have interacted with. The dataset is publicly available at Kaggle and was released during a challenge to develop algorithms to help predict which blog posts users would most likely ‘like’ if they were exposed to them. The data includes these fields per blog post:

  • _ post_id _ – unique numerical id identifying the blog post
  • _ date_gmt _ – string representing date of blog post creation in GMT format yyyy-mm-dd hh:mm:ss
  • _ author _ – unique numerical id identifying the author of the blog post
  • _ url _ – blog post URL
  • _ title _ – blog post title
  • _ blog _ – unique numerical id identifying the blog that the blog post belongs to
  • _ tags _ – array of strings representing the tags of the blog posts
  • _ content _ – body text of the blog post, in html format
  • _ categories _ – array of strings representing the categories the blog post was assigned to

For the user actions:

  • _ post_id _ – unique numerical id identifying the blog post
  • _ uid _ – unique numerical id identifying the user that liked post_id
  • _ dt _ – date of the interaction in GMT format yyyy-mm-dd hh:mm:ss

Downloading raw data

For the purposes of this post, it is sufficient to use the first release of training data that consists of 5 weeks of posts as well as all the ‘like’ actions that occurred during those 5 weeks.

This first release of training data is available here – once downloaded, unzip it. The 1,196,111 line trainPosts.json will be our practice document data. This file is around 5GB in size.


Indexing the full data set requires 23GB disk space. We have tested with a Docker container with 10GB RAM. We used similar settings as described in the vespa quick start guide. As in the guide we assume that the $VESPA_SAMPLE_APPS env variable points to the directory with your local clone of the vespa sample apps:

$ docker run -m 10G --detach --name vespa --hostname vespa --privileged --volume $VESPA_SAMPLE_APPS:/vespa-sample-apps --publish 8080:8080 vespaengine/vespa

Searching blog posts

Functional specification:

  • Blog post title, content, tags and categories must all be searchable
  • Allow blog posts to be sorted by both relevance and date
  • Allow grouping of search results by tag or category

In terms of data, Vespa operates with the notion of documents. A document represents a single, searchable item in your system, e.g., a blog post, a photo, or a news article. Each document type must be defined in the Vespa configuration through a search definition. Think of a search definition as being similar to a table definition in a relational database; it consists of a set of fields, each with a given name, a specific type, and some optional properties.

As an example, for this simple blog post search application, we could create the document type blog_post with the following fields:

  • _ url _ – of type uri
  • _ title _ – of type string
  • _ content _ – of type string (string fields can be of any length)
  • _ date_gmt _ – of type string (to store the creation date in GMT format)

The data fed into Vespa must match the structure of the search definition, and the hits returned when searching will be on this format as well.

Application Packages

A Vespa application package is the set of configuration files and Java plugins that together define the behavior of a Vespa system: what functionality to use, the available document types, how ranking will be done and how data will be processed during feeding and indexing. The search definition, e.g., blog_post.sd, is a required part of an application package — the other required files are services.xml and hosts.xml.

The sample application blog search creates a simple but functional blog post search engine. The application package is found in src/main/application.

Services Specification

services.xml defines the services that make up the Vespa application — which services to run and how many nodes per service:

<?xml version='1.0' encoding='UTF-8'?>
<services version='1.0'>

  <container id='default' version='1.0'>
      <node hostalias="node1"/>

  <content id='blog_post' version='1.0'>
      <document mode="index" type="blog_post"/>
      <node hostalias="node1"/>

  • <container> defines the container cluster for document, query and result processing
  • <search> sets up the search endpoint for Vespa queries. The default port is 8080.
  • <document-api> sets up the document endpoint for feeding.
  • <nodes> defines the nodes required per service. (See the reference for more on container cluster setup.)
  • <content> defines how documents are stored and searched
  • <redundancy> denotes how many copies to keep of each document.
  • <documents> assigns the document types in the search definition — the content cluster capacity can be increased by adding node elements — see elastic Vespa. (See also the reference for more on content cluster setup.)
  • <nodes> defines the hosts for the content cluster.

Deployment Specification

hosts.xml contains a list of all the hosts/nodes that is part of the application, with an alias for each of them. Here we use a single node:

<?xml version="1.0" encoding="utf-8" ?>
  <host name="localhost">

Search Definition

The blog_post document type mentioned in src/main/application/service.xml is defined in the search definition. src/main/application/searchdefinitions/blog_post.sd contains the search definition for a document of type blog_post:

search blog_post {

    document blog_post {

        field date_gmt type string {
            indexing: summary

        field language type string {
            indexing: summary

        field author type string {
            indexing: summary

        field url type string {
            indexing: summary

        field title type string {
            indexing: summary | index

        field blog type string {
            indexing: summary

        field post_id type string {
            indexing: summary

        field tags type array<string> {
            indexing: summary

        field blogname type string {
            indexing: summary

        field content type string {
            indexing: summary | index

        field categories type array<string> {
            indexing: summary

        field date type int {
            indexing: summary | attribute


    fieldset default {
        fields: title, content

    rank-profile post inherits default {

        first-phase {
            expression:nativeRank(title, content)



document is wrapped inside another element called search. The name following these elements, here blog_post, must be exactly the same for both.

The field property indexing configures the indexing pipeline for a field, which defines how Vespa will treat input during indexing — see indexing language. Each part of the indexing pipeline is separated by the pipe character ‘’:

Deploy the Application Package

Once done with the application package, deploy the Vespa application — build and start Vespa as in the quick start. Deploy the application:

$ cd /vespa-sample-apps/blog-search
$ vespa-deploy prepare src/main/application && vespa-deploy activate

This prints that the application was activated successfully and also the checksum, timestamp and generation for this deployment (more on that later). Pointing a browser to http://localhost:8080/ApplicationStatus returns JSON-formatted information about the active application, including its checksum, timestamp and generation (and should be the same as the values when vespa-deploy activate was run). The generation will increase by 1 each time a new application is successfully deployed, and is the easiest way to verify that the correct version is active.

The Vespa node is now configured and ready for use.

Feeding Data

The data fed to Vespa must match the search definition for the document type. The data downloaded from Kaggle, contained in trainPosts.json, must be converted to a valid Vespa document format before it can be fed to Vespa. Find a parser in the utility repository. Since the full data set is unnecessarily large for the purposes of this first part of this post, we use only the first 10,000 lines of it, but feel free to load all 1,1M entries:

$ head -10000 trainPosts.json > trainPostsSmall.json
$ python parse.py trainPostsSmall.json > feed.json

Send this to Vespa using one of the tools Vespa provides for feeding. Here we will use the Java feeding API:

$ java -jar $VESPA_HOME/lib/jars/vespa-http-client-jar-with-dependencies.jar --verbose --file feed.json --host localhost --port 8080

Note that in the sample-apps/blog-search directory, there is a file with sample data. You may also feed this file using this method.

Track feeding progress

Use the Metrics API to track number of documents indexed:

$ curl -s 'http://localhost:19112/state/v1/metrics' | tr ',' '\n' | grep -A 2 proton.doctypes.blog_post.numdocs

You can also inspect the search node state by

$ vespa-proton-cmd --local getState  

Fetch documents

Fetch documents by document id using the Document API:

$ curl -s 'http://localhost:8080/document/v1/blog-search/blog_post/docid/1750271' | python -m json.tool

The first query

Searching with Vespa is done using a HTTP GET requests, like:


The only mandatory parameter is the query, using yql=<yql query>. More details can be found in the Search API.

Given the above search definition, where the fields title and content are part of the fieldset default, any document containing the word “music” in one or more of these two fields matches our query below:

$ curl -s 'http://localhost:8080/search/?yql=select+*+from+sources+*+where+default+contains+%22music%22%3B' | python -m json.tool

Looking at the output, please note:

  • The field documentid in the output and how it matches the value we assigned to each put operation when feeding data to Vespa.
  • Each hit has a property named relevance, which indicates how well the given document matches our query, using a pre-defined default ranking function. You have full control over ranking — more about ranking and ordering later. The hits are sorted by this value.
  • When multiple hits have the same relevance score their internal ordering is undefined. However, their internal ordering will not change unless the documents are re-indexed.
  • Add &tracelevel=9 to dump query parsing details

Other examples


Once more a search for the single term “music”, but this time with the explicit field title. This means that we only want to match documents that contain the word “music” in the field title. As expected, you will see fewer hits for this query, than for the previous one.


This is a query for the two terms “music” and “festival”, combined with an AND operation; it finds documents that match both terms — but not just one of them.


This is a single-term query in the special field sddocname for the value “blog_post”. This is a common and useful Vespa trick to get the number of indexed documents for a certain document type (search definition): sddocname is a special and reserved field which is always set to the name of the document type for a given document. The documents are all of type blog_post, and will therefore automatically have the field sddocname set to that value.

This means that the query above really means “Return all documents of type blog_post”, and as such all documents in the index are returned.

Vespa Meetup in Sunnyvale | Vespa Blog

WHAT: Vespa meetup with various presentations from the Vespa team.

Several Vespa developers from Norway are in Sunnyvale, use this opportunity to learn more about the open big data serving engine Vespa and meet the team behind it.

WHEN: Monday, December 4th, 6:00pm – 8:00pm PDT

WHERE: Oath/Yahoo Sunnyvale Campus
Building E, Classroom 9 & 10
700 First Avenue, Sunnyvale, CA 94089

MANDATORY REGISTRATION: https://goo.gl/forms/7kK2vlaipgsSSSH42


6.00 pm: Welcome & Intro

6.15 pm: Vespa tips and tricks

7.00 pm: Tensors in Vespa, intro and usecases

7.45 pm: Vespa future and roadmap

7.50 pm: Q&A

This meetup is a good arena for sharing experience, get good tips, get inside details in Vespa, discuss and impact the roadmap, and it is a great opportunity for the Vespa team to meet our users. Hope to see many of you!

Blog recommendation in Vespa | Vespa Blog

Update 2021-05-20:
This blog post refers to Vespa sample applications that do not exist anymore.
Please refer to the
News search and recommendation tutorial
for an updated version of text and sample applications.


This post builds upon the previous blog search application and extends the basic search engine to include machine learned models to help us recommend blog posts to users that arrive at our application. Assume that once a user arrives, we obtain his user identification number, denoted in here by user_id, and that we will send this information down to Vespa and expect to obtain a blog post recommendation list containing 100 blog posts tailored for that specific user.


Collaborative Filtering

We will start our recommendation system by implementing the collaborative filtering algorithm for implicit feedback described in (Hu et. al. 2008). The data is said to be implicit because the users did not explicitly rate each blog post they have read. Instead, the have “liked” blog posts they have likely enjoyed (positive feedback) but did not have the chance to “dislike” blog posts they did not enjoy (absence of negative feedback). Because of that, implicit feedback is said to be inherently noisy and the fact that a user did not “like” a blog post might have many different reasons not related with his negative feelings about that blog post.

In terms of modeling, a big difference between explicit and implicit feedback datasets is that the ratings for the explicit feedback are typically unknown for the majority of user-item pairs and are treated as missing values and ignored by the training algorithm. For an implicit dataset, we would assume a rating of zero in case the user has not liked a blog post. To encode the fact that a value of zero could come from different reasons we will use the concept of confidence as introduced by (Hu et. al. 2008), which causes the positive feedback to have a higher weight than a negative feedback.

Once we train the collaborative filtering model, we will have one vector representing a latent factor for each user and item contained in the training set. Those vectors will later be used in the Vespa ranking framework to make recommendations to a user based on the dot product between the user and documents latent factors. An obvious problem with this approach is that new users and new documents will not have those latent factors available to them. This is what is called a cold start problem and will be addressed with content-based techniques described in future posts.

Evaluation metrics

The evaluation metric used by Kaggle for this challenge was the Mean Average Precision at 100 (MAP@100). However, since we do not have information about which blog posts the users did not like (that is, we have only positive feedback) and our inability to obtain user behavior to the recommendations we make (this is an offline evaluation, different from the usual A/B testing performed by companies that use recommendation systems), we offer a similar remark as the one included in (Hu et. al. 2008) and prefer recall-oriented measures. Following (Hu et. al. 2008) we will use the expected percentile ranking.

Evaluation Framework

Generate training and test sets

In order to evaluate the gains obtained by the recommendation system when we start to improve it with more accurate algorithms, we will split the dataset we have available into training and test sets. The training set will contain document (blog post) and user action (likes) pairs as well as any information available about the documents contained in the training set. There is no additional information about the users besides the blog posts they have liked. The test set will be formed by a series of documents available to be recommended and a set of users to whom we need to make recommendations. This list of test set documents constitutes the Vespa content pool, which is the set of documents stored in Vespa that are available to be served to users. The user actions will be hidden from the test set and used later to evaluate the recommendations made by Vespa.

To create an application that more closely resembles the challenges faced by companies when building their recommendation systems, we decided to construct the training and test sets in such a way that:

  • There will be blog posts that had been liked in the training set by a set of users and that had also been liked in the test set by another set of users, even though this information will be hidden in the test set. Those cases are interesting to evaluate if the exploitation (as opposed to exploration) component of the system is working well. That is, if we are able to identify high quality blog posts based on the available information during training and exploit this knowledge by recommending those high quality blog posts to another set of users that might like them as well.
  • There will be blog posts in the test set that had never been seen in the training set. Those cases are interesting in order to evaluate how the system deals with the cold-start problem. Systems that are too biased towards exploitation will fail to recommend new and unexplored blog posts, leading to a feedback loop that will cause the system to focus into a small share of the available content.

A key challenge faced by recommender system designers is how to balance the exploitation/exploration components of their system, and our training/test set split outlined above will try to replicate this challenge in our application. Notice that this split is different from the approach taken by the Kaggle competition where the blog posts available in the test set had never been seen in the training set, which removes the exploitation component of the equation.

The Spark job uses trainPosts.json and creates the folders blog-job/training_set_ids and blog-job/test_set_ids containing files with post_id and user_idpairs:

$ cd blog-recommendation; export SPARK_LOCAL_IP=""
$ spark-submit --class "com.yahoo.example.blog.BlogRecommendationApp" \
  --master local[4] ../blog-tutorial-shared/target/scala-*/blog-support*.jar \
  --task split_set --input_file ../trainPosts.json \
  --test_perc_stage1 0.05 --test_perc_stage2 0.20 --seed 123 \
  --output_path blog-job/training_and_test_indices
  • test_perc_stage1: The percentage of the blog posts that will be located only on the test set (exploration component).
  • test_perc_stage2: The percentage of the remaining (post_id, user_id) pairs that should be moved to the test set (exploitation component).
  • seed: seed value used in order to replicate results if required.

Compute user and item latent factors

Use the complete training set to compute user and item latent factors. We will leave the discussion about tuning and performance improvement of the model used to the section about model tuning and offline evaluation. Submit the Spark job to compute the user and item latent factors:

$ spark-submit --class "com.yahoo.example.blog.BlogRecommendationApp" \
  --master local[4] ../blog-tutorial-shared/target/scala-*/blog-support*.jar \
  --task collaborative_filtering \
  --input_file blog-job/training_and_test_indices/training_set_ids \
  --rank 10 --numIterations 10 --lambda 0.01 \
  --output_path blog-job/user_item_cf

Verify the vectors for the latent factors for users and posts:

$ head -1 blog-job/user_item_cf/user_features/part-00000 | python -m json.tool
    "user_id": 270,
    "user_item_cf": {
        "user_item_cf:0": -1.750116e-05,
        "user_item_cf:1": 9.730623e-05,
        "user_item_cf:2": 8.515047e-05,
        "user_item_cf:3": 6.9297894e-05,
        "user_item_cf:4": 7.343942e-05,
        "user_item_cf:5": -0.00017635927,
        "user_item_cf:6": 5.7642872e-05,
        "user_item_cf:7": -6.6685796e-05,
        "user_item_cf:8": 8.5506894e-05,
        "user_item_cf:9": -1.7209566e-05
$ head -1 blog-job/user_item_cf/product_features/part-00000 | python -m json.tool
    "post_id": 20,
    "user_item_cf": {
        "user_item_cf:0": 0.0019320602,
        "user_item_cf:1": -0.004728486,
        "user_item_cf:2": 0.0032499845,
        "user_item_cf:3": -0.006453364,
        "user_item_cf:4": 0.0015929453,
        "user_item_cf:5": -0.00420313,
        "user_item_cf:6": 0.009350027,
        "user_item_cf:7": -0.0015649397,
        "user_item_cf:8": 0.009262732,
        "user_item_cf:9": -0.0030964287

At this point, the vectors with latent factors can be added to posts and users.

Add vectors to search definitions using tensors

Modern machine learning applications often make use of large, multidimensional feature spaces and perform complex operations on those features, such as in large logistic regression and deep learning models. It is therefore necessary to have an expressive framework to define and evaluate ranking expressions of such complexity at scale.

Vespa comes with a Tensor framework, which unify and generalizes scalar, vector and matrix operations, handles the sparseness inherent to most machine learning application (most cases evaluated by the model is lacking values for most of the features) and allow for models to be continuously updated. Additional information about the Tensor framework can be found in the tensor user guide.

We want to have those latent factors available in a Tensor representation to be used during ranking by the Tensor framework. A tensor field user_item_cf is added to blog_post.sd to hold the blog post latent factor:

field user_item_cf type tensor(user_item_cf[10]) {
	indexing: summary | attribute
	attribute: tensor(user_item_cf[10])

field has_user_item_cf type byte {
	indexing: summary | attribute
	attribute: fast-search

A new search definition user.sd defines a document type named user to hold information for users:

search user {
    document user {
        field user_id type string {
            indexing: summary | attribute
            attribute: fast-search

        field has_read_items type array<string> {
            indexing: summary | attribute

        field user_item_cf type tensor(user_item_cf[10]) {
            indexing: summary | attribute
            attribute: tensor(user_item_cf[10])

        field has_user_item_cf type byte {
            indexing: summary | attribute
            attribute: fast-search


  • user_id: unique identifier for the user
  • user_item_cf: tensor that will hold the user latent factor
  • has_user_item_cf: flag to indicate the user has a latent factor

Join and feed data

Build and deploy the application:

Deploy the application (in the Docker container):

$ vespa-deploy prepare /vespa-sample-apps/blog-recommendation/target/application && \
  vespa-deploy activate

Wait for app to activate (200 OK):

$ curl -s --head http://localhost:8080/ApplicationStatus

The code to join the latent factors in blog-job/user_item_cf into blog_post and user documents is implemented in tutorial_feed_content_and_tensor_vespa.pig. After joining in the new fields, a Vespa feed is generated and fed to Vespa directly from Pig :

$ pig -Dvespa.feed.defaultport=8080 -Dvespa.feed.random.startup.sleep.ms=0 \
  -x local \
  -f ../blog-tutorial-shared/src/main/pig/tutorial_feed_content_and_tensor_vespa.pig \
  -param VESPA_HADOOP_JAR=../vespa-hadoop*.jar \
  -param DATA_PATH=../trainPosts.json \
  -param TEST_INDICES=blog-job/training_and_test_indices/testing_set_ids \
  -param BLOG_POST_FACTORS=blog-job/user_item_cf/product_features \
  -param USER_FACTORS=blog-job/user_item_cf/user_features \
  -param ENDPOINT=localhost

A successful data join and feed will output:

Successfully read 1196111 records from: "file:///Users/kraune/github/vespa-engine/sample-apps/trainPosts.json"
Successfully read 341416 records from: "file:///Users/kraune/github/vespa-engine/sample-apps/blog-recommendation/blog-job/training_and_test_indices/testing_set_ids"
Successfully read 323727 records from: "file:///Users/kraune/github/vespa-engine/sample-apps/blog-recommendation/blog-job/user_item_cf/product_features"
Successfully read 6290 records from: "file:///Users/kraune/github/vespa-engine/sample-apps/blog-recommendation/blog-job/user_item_cf/user_features"

Successfully stored 286237 records in: "localhost"

Sample blog post and user:


Set up a rank function to return the best matching blog posts given some user latent factor. Rank the documents using a dot product between the user and blog post latent factors, i.e. the query tensor and blog post tensor dot product (sum of the product of the two tensors) – from blog_post.sd:

rank-profile tensor {
    first-phase {
        expression {
            sum(query(user_item_cf) * attribute(user_item_cf))

Configure the ranking framework to expect that query(user_item_cf) is a tensor, and that it is compatible with the attribute in a query profile type – see search/query-profiles/types/root.xml and search/query-profiles/default.xml:

<query-profile-type id="root" inherits="native">
    <field name="ranking.features.query(user_item_cf)" type="tensor(user_item_cf[10])" />

<query-profile id="default" type="root" />

This configures a ranking feature named query(user_item_cf) with type tensor(user_item_cf[10]), which defines it as an indexed tensor with 10 elements. This is the same as the attribute, hence the dot product can be computed.

Query Vespa with a tensor

Test recommendations by sending a tensor with latenct factors: localhost:8080/search/?yql=select%20*%20from%20sources%20blog_post%20where%20has_user_item_cf%20=%201;&ranking=tensor&ranking.features.query(user_item_cf)=%7B%7Buser_item_cf%3A0%7D%3A0.1%2C%7Buser_item_cf%3A1%7D%3A0.1%2C%7Buser_item_cf%3A2%7D%3A0.1%2C%7Buser_item_cf%3A3%7D%3A0.1%2C%7Buser_item_cf%3A4%7D%3A0.1%2C%7Buser_item_cf%3A5%7D%3A0.1%2C%7Buser_item_cf%3A6%7D%3A0.1%2C%7Buser_item_cf%3A7%7D%3A0.1%2C%7Buser_item_cf%3A8%7D%3A0.1%2C%7Buser_item_cf%3A9%7D%3A0.1%7D

The query string, decomposed:

  • yql=select * from sources blog_post where has_user_item_cf = 1 – this selects all documents of type blog_post which has a latent factor tensor
  • restrict=blog_post – search only in blog_post documents
  • ranking=tensor – use the rank-profile tensor in blog_post.sd.
  • ranking.features.query(user_item_cf) – send the tensor as user_item_cf. As this tensor is defined in the query-profile-type, the ranking framework knows its type (i.e. dimensions) and is able to do a dot product with the attribute of same type. The tensor before URL-encoding:


Query Vespa with user id

Next step is to query Vespa by user id, look up the user profile for the user, get the tensor from it and recommend documents based on this tensor (like the query in previous section). The user profiles is fed to Vespa in the user_item_cf field of the user document type.

In short, set up a searcher to retrieve the user profile by user id – then run the query. When the Vespa Container receives a request, it will create a Query representing it and execute