🎄Advent of Tensors 2023 🎅

Andreas Eriksen

Andreas Eriksen

Senior Vespa Engineer

Jo Kristian Bergum
Jo Kristian Bergum

Vespa Solutions Architect


Greetings, Vespa enthusiasts! Prepare to embark on a festive journey as we bring you the Advent of Tensors.
In this advent, we’ll be diving into the magical world of Vespa Tensors, combining the joy of the holiday season with the power of expressing distributed real-time tensor computations over evolving datasets.

🌟 What’s Vespa Tensors?

Vespa Tensors, the enchanting framework behind this festive challenge, is not your commodity vector similarity search library.
Developed by elves close to the North Pole, Vespa Tensors adds a touch of magic to your real-time big data serving use cases,
offering powerful tensor operations for all your AI real-time serving workloads. Vespa tensors offer much functionality beyond simple similarity search across a single vector representation, enabling features like multi-vector indexing or feature computations for recommender systems
to name a few AI-powered use-cases using Vespa tensors.

You can learn more about Vespa tensors in the Vespa tensor user guide, or if you prefer a blog
post form: Computing with tensors in Vespa.
All challenges are expressed and solved in the Vespa tensor playground which runs in your browser with zero dependencies.

✨ Daily Challenges for 24 Days

For the next 24 days, we’ll unravel a new tensor coding challenge each day, designed to stretch your skills and explore the fascinating capabilities of Vespa Tensors.
Whether you’re a seasoned Vespa expert or a curious beginner, these challenges are sure to spark your interest. Each day, a new challenge will be unlocked in the table below, and the solution to the previous day’s challenge will be published.

🔍 Explore the Magic of Tensors

Discover the magic behind tensor computations and learn how Vespa Tensors can help solve complex real-time AI serving problems. From expressing gift recommendation systems to facial recognition systems for tracking who’s been nice or naughty, each challenge is a delightful exploration into the capabilities of the Vespa tensor framework.

🚀 Spread the Holiday Cheer in Code

Join us on this fun sleigh ride and spread the holiday cheer by expressing compute over data with tensor expressions.
Whether you’re coding by the fireplace or sipping hot cocoa, let the advent of Tensors add a sprinkle of magic to your advent routine. So, grab your coding hat, put on your festive coding sweater, and get ready for 24 days of tensor computations.
Submit your solution to the form associated with each challenge, and get the chance to win exclusive Vespa swag!

Ps, if you are interested in discussing the challenges with many other Vespa enthusiasts, come join us in the festive Vespa Slack space.

The ✨ 2023 Challenges ✨

Day Challenge Playground LinkSolution Link
1Santa’s Festive Weigh-In! 🎅link
2Santa’s Fitness Challenge! 🏋️‍♂️
3Santa’s Grand Weight Gain Gala! 🎉
4Distance to Dasher 🦌
5Roaming Reindeer 🦌🦌🦌
6Who’s been Naughty or Nice 🎁
7The Great Reindeer Rally! 🦌
8 Let It Snow! ☃️🌨️🌍
9Carol’s Festive 80s Playlist 🎶
10Lost in New York 3: Taxicab distance 🏙️
11Santa’s Sleigh Soberness Test 🍸
12Santa’s Bag Comparison Bonanza! 🎁✨
13Festive Jaccard Jamboree! 🎁🔍✨
14Santa’s Embedding Retrieval System For Gift Matching 🎁🔍✨
15Santa’s International Taxing Quest! 🌍💸
16Santa’s Behavioral Analytics! 📊🎅
17Santa’s Chimney Area Calculator! 🏠📏
18Reduced to Tears 🎁🤔
19Santa’s Face Recognition System🔍✨
20Margins and Mistletoe! 💰✨
21Gift Quality Check with Chebyshev! 🎁📊
22Celsius Conversion Quest! 🌡️🔍✨
23Santa’s Face Recognition System – A/B Testing 🔍✨
24Scrambling Santas 🎅🎅🎅

Parent-child joins and tensors for content recommendation

Aaron Nagao

Aaron Nagao

Senior Software Engineer, Verizon Media


A real-world application of parent-child joins and tensor functions to model topic popularity for content recommendation.

Every time a user visits Yahoo.com, we build the best news stream for that user from tens of thousands of candidate articles on a variety of topics such as sports, finance, and entertainment.

To rank these articles, a key feature is the topic’s click-through rate (CTR), a real-time popularity metric aggregated over all articles of that topic. There are two reasons why topic CTR is an important feature for machine-learned ranking:

  1. It helps the cold-start problem by allowing the model to use the popularity of other articles of that topic as a prior for new articles.
  2. It reduces a high-dimensional categorical feature like a topic to a single numerical feature (its mean CTR)—this mean encoding is needed for a model to incorporate topics.

In this blog post, we describe how we model topic CTRs using two modern features of Vespa: parent-child joins and tensor functions. Vespa, the open source big data serving engine created by Yahoo, powers all of our content recommendations in real-time for millions of users at scale.

To model ranking features in Vespa, most features like an article’s length or age are simple properties of the document which are stored within the Vespa document. In contrast, a topic CTR is shared by all documents of that topic.

A simple approach would be to feed the topic CTR to every document about that topic. But this denormalized approach both duplicates data and complicates CTR updates—the updater must first query for all documents of that topic in order to feed the updated CTR to all of them.

The standard solution for this from relational databases is joins through foreign keys, and such joins are supported in Vespa as parent-child relationships.

We model our relationships by storing all topic CTRs in a single “global” document, and each of our articles has a foreign key that points to that global document, as shown in the Figure. This way, when updating CTRs we only need to update the global document and not each individual article, avoiding data duplication.

Global document

When ranking articles, Vespa uses the foreign key to do a real-time join between each article and the global document to retrieve the topic CTRs. Vespa co-locates the global document on the article content node to reduce the latency of this real-time join, also represented in the Figure.

Vespa seamlessly allows us to write updated CTRs to the global document in real-time while concurrently reading the CTRs for use in article ranking.

The schema definition uses reference for this parent-child relationship:

document globalscores {
    field topic_ctrs type tensor<float>(topic{}) {
        indexing: attribute
        attribute: fast-search
    }
}

document article {
    field doc_topics type tensor<float>(topic{}) {
        indexing: attribute | summary
    }

    field ptr type reference<globalscores> {
        indexing: attribute
    }
}

Now that our data is modeled in Vespa, we want to use the CTRs as features in our ranking model.

One challenge is that some articles have 1 topic (so just 1 topic CTR) while some articles have 5 topics (with 5 CTRs). Since machine learning generally requires each article to have the same number of features, we take the average topic CTR and the maximum topic CTR, aggregating a differing number of CTRs into a fixed number of summary statistics.

We compute the average and maximum using Vespa’s Tensor API. A tensor is a vector with labeled dimensions, and Vespa provides API functions like sum, argmax, and * (elementwise multiplication) that operate on input tensors to compute features.

The rank profile code that computes these ranking features is:

import field ptr.topic_ctrs as global_topic_ctrs {}

rank-profile yahoo inherits default {
    # helper functions
    function AVG_CTR(weights, ctrs) {
        # weighted average CTR
        expression: sum(weights * ctrs) / sum(weights)
    }
    function MAX_CTR(weights, ctrs) {
        # weighted max, then use unweighted CTR
        expression: sum( argmax(weights * ctrs) * ctrs )
    }

    # ranking features
    function TOPIC_AVG_CTR() {
        expression: AVG_CTR(attribute(doc_topics), attribute(global_topic_ctrs))
    }
    function TOPIC_MAX_CTR() {
        expression: MAX_CTR(attribute(doc_topics), attribute(global_topic_ctrs))
    }
}

Rank Profile Walkthrough

Suppose an article is about two topics each with an associated weight:

attribute(doc_topics) = <'US': 0.7, 'Sports': 0.9>

And in the global document, the real-time CTR values for all topics are:

topic_ctrs = <'US': 0.08, 'Sports': 0.02, 'Finance': 0.05, ...>

The first import line does a real-time join between the article’s foreign key and the global document, so that the article ranking functions can reference the global CTRs as if they were stored with each article:

global_topic_ctrs = ptr.topic_ctrs = <'US': 0.08, 'Sports': 0.02, 'Finance': 0.05, ...>

For both of our features TOPIC_AVG_CTR and TOPIC_MAX_CTR, the first step uses the * elementwise multiplication operator in Vespa’s Tensor API, which effectively does a lookup of the article’s topics in the global tensor:

weights * ctrs
= attribute(doc_pub) * attribute(global_pub_ctrs)
= <'US': 0.7, 'Sports': 0.9> * <'US': 0.08, 'Sports': 0.02, 'Finance': 0.05, ...>
= <'US': 0.056, 'Sports': 0.018>

Then TOPIC_AVG_CTR computes a weighted average CTR by summing and normalizing by the weights:

TOPIC_AVG_CTR
= sum(weights * ctrs) / sum(weights)
= sum(<'US': 0.056, 'Sports': 0.018>) / sum(<'US': 0.7, 'Sports': 0.9>)
= 0.074 / 1.6
= 0.046

(Note this weighted average of 0.046 is closer to the Sports CTR=0.02 than the US CTR=0.08 because Sports had a higher topic weight.)

And TOPIC_MAX_CTR finds the CTR of the entity with the maximum weighted CTR:

argmax(weights * ctrs)
= argmax(<'US': 0.056, 'Sports': 0.018>)
= <'US': 1>

argmax(weights * ctrs) * ctrs
= <'US': 1> * <'US': 0.08, 'Sports': 0.02, 'Finance': 0.05, ...>
= <'US': 0.08>

(With a final sum to convert the mapped tensor to the scalar 0.08.)

These two examples demonstrate the expressiveness of Vespa’s Tensor API to compute useful features.

Ultimately TOPIC_AVG_CTR and TOPIC_MAX_CTR are two features computed in real-time for every article during ranking. These features can then be added to any machine-learned ranking model—Vespa supports gradient-boosted trees from XGBoost and LightGBM, and neural networks in TensorFlow and ONNX formats.

While we could have stored these topic CTRs in some external store, we would have essentially needed to fetch all of them when ranking as every topic is represented in our pool of articles. Instead, Vespa moves the computation to the data by co-locating the global feature store on the article content node, avoiding network load and reducing system complexity.

We also keep the global CTRs in-memory for even faster performance with the aptly named attribute: fast-search. Ultimately, the Vespa team optimized our use case of parent-child joins and tensors to rank 10,000 articles in just 17.5 milliseconds!

This blog post described just two of the many ranking features used for real-time content recommendation at Verizon Media, all powered by Vespa.

Computing with tensors in Vespa

Decorative image

Photo by Pietro Jengr on Unsplash

In computer science, a tensor is a data structure that generalizes scalars,
vectors, matrices, and higher-order structures into a single construct.

In Vespa we have introduced a tensor formalism that differs from what one
usually finds in most modern machine learning frameworks today. The main
differences are:

  • Named dimensions
  • Unified sparse and dense tensor types
  • A small but powerful set of core functions

In this blog post, we’ll explore these and other aspects of tensors in Vespa.
We will also introduce the recently released tensor
playground, a tool to get familiar with and
explore tensors and tensor expressions in an interactive environment.

Tensors are multi-dimensional arrays of numeric values and can be viewed as a
generalization of vectors and matrices. A tensor with one dimension (a
first-order tensor) is a vector, a two-dimensional (second-order) tensor is a
matrix, and so on. A scalar value is a tensor without any dimensions.

In most frameworks, these dimensions have an implicit ordering. For instance, a
matrix has two dimensions. A multiplication between two matrices, A and B
with sizes (i,j) and (j,k), results in a matrix with size (i,k). Values
along the columns in A and the rows in B are multiplied and summed. Thus
these must have equal size. If not, for instance if B had size (k,j), B
would need a transpose to (j,k) before multiplication.

Implicitly ordered dimensions pose a problem: what do the dimensions represent?
For instance, consider loading a (monochrome) image from a file into a matrix.
It is not immediately clear which dimension represents the height or width, as
that depends on the file format. Additionally, a file containing a set of color
images has two additional dimensions: the number of images, and the color
channels (e.g. RGB). This is called NCHW format. Sometimes it is stored as
NHWC, which is the default representation in TensorFlow. Interestingly, NCHW is
the preferred format on GPUs.

Knowing which dimension is which is essential when performing various
operations; one generally wants to rotate images by working on the height and
width dimensions, not on the channel and height dimensions. However, after a
series of data manipulations, keeping track of dimensions can quickly become
challenging.

However, the tensor does not contain the information itself to help describe the
dimensions. As a result, practitioners often use comments such as the following
to document the dimension order:

# Num x Height x Width x Channel
tensor = torch.tensor(numpy.load("images.npy"))
tensor.shape
> torch.Size([100, 64, 64, 3])

This is obviously error-prone.

In Vespa, we have taken a different approach and introduced named dimensions.
This enables a strong tensor type system. An example of a tensor type in Vespa
which holds the same data as above:

tensor(num[100], height[64], width[64], channel[3])

This type system provides more formal documentation, which makes it easier to
work with for humans. It also introduces the ability for tensors and tensor
operations to be semantically verified. This means we can perform static type
inference for all computation, catching errors at an early compile-time stage
rather than during runtime.

Later in the post, we’ll see how this enables arbitrarily complex computation
with only a minimal, concise set of core operations.

Sparse and dense tensors

The tensor as a multi-dimensional array is often considered to be dense. This
means that all combinations of dimension indexes have a value, even if that
value is 0 or NaN. However, a tensor with many such values could be
represented more efficiently as a sparse tensor, where only the non-empty values
are defined. This can lead to considerable savings in space.

Unfortunately, the internal representation of sparse tensors in most frameworks
makes them incompatible with regular dense tensors. This leads to an entirely
separate set of functions operating on sparse and dense tensors, with functions
to convert between the two.

Vespa supports dense, sparse, and tensors that contain both dense and sparse
dimensions, called a mixed tensor. A dense tensor is a tensor containing only
“indexed” dimensions. An indexed dimension is indexed by integers, like an
array. The following is an example of a dense tensor containing two indexed
dimensions:

tensor(width[128], height[96])

A sparse tensor is conversely a tensor consisting of only “mapped” dimensions. A
mapped dimension is indexed by strings, like a map or hash table. An example of
a sparse tensor containing two mapped dimensions:

tensor(model_id{}, feature{})

A mixed tensor can contain both types:

tensor(model_version_id{}, first_layer_weights[64], second_layer_weights[128])

This particular example effectively works as a lookup table. By providing the
model_version_id, one can extract a dense tensor (called a dense subspace).
For instance, this can be useful for a single tensor to contain the weights for
multiple versions of a model, e.g., a neural network.

All in all, this enables a flexible data representation.

Tensor operations

Most frameworks that operate on and manipulate tensors have an extensive library
of functions. For instance, TensorFlow has hundreds of different operations
organized in various groups according to their function. Examples are
constructors, general manipulation, linear algebra, neural networks, and so on.
Some operations work element-wise, some on entire tensors, and some combine
tensors.

All operations have assumptions on their input tensors. For instance, a matrix
multiplication operation requires two matrix tensors with the same column/row
size. Likewise, a dot product operation requires two vectors with the same
vector length. These two operations are essentially the same: the sum of the
products of elements along a dimension. Yet two different operations are
provided: matmul and dotprod.

A two-dimensional convolution requires an input of four dimensions: the batch
size, color channels, width and height. This ordering is called NCHW, which is
the most efficient for GPU processing. However, the default in TensorFlow is
NHWC. Therefore, the operations working on these tensors need to know the
format, which must be provided by the developer as the format is not part of the
tensor.

Another example is the different operations based on tensor type. For instance,
there are two different operations for concat: one for a dense tensor and one
for sparse.

The inevitable consequence is that most frameworks get an extensive library of
operations. The problem with such large libraries is interoperability and
maintainability.

Assume you train a model on one framework and want to run inference on another.
The common way to represent the model is as a computational graph, where each
vertex is an operation. Evaluating a given graph on another system requires the
implementation of all operations in the graph. To guarantee general
interoperability between two frameworks, all operations in the original
framework must have an implementation. This becomes increasingly less feasible
as frameworks grow to have hundreds or thousands of operations.

In addition, many developers add custom operations. This further decreases
interoperability by requiring binary compatibility.

The tensor framework in Vespa, on the other hand, takes a different approach.
Tensors with named dimensions allow for a strong type system with well-founded
semantics. This makes it possible to define a small set of foundational
mathematical operations in which all other computations can be expressed.

Unary operations work on single tensors; examples include filtering and
reduction. Binary operations combine two input tensors to an output tensor, for
instance, joining, extending, or calculating some relation. By combining these
operations, they can express complex computation.

Vespa provides just 8 core operations to transform tensors:

  • tensor – construct a new tensor from a given expression
  • map – apply a function to all values
  • reduce – aggregate values along a dimension
  • rename – rename a dimension
  • slice – returns a tensor with all cells matching a partial address
  • join – the natural join between two input tensors with a custom expression applied
  • merge – merge two input tensors with a custom expression
  • concat – concatenate two tensors along a given dimension

To aid developers, Vespa additionally provides higher-level, non-core
operations, which are all implemented in terms of these core operations.

This approach enables interoperability as implementing this small set is all
that is needed to realize complete support for tensor computation. Furthermore,
it makes optimization work more efficient. This is because low-level
optimizations are only required on this set of functions. Higher-level
optimizations can work on whichever chunks of these operations are beneficial,
independent of any chunking into higher-level functions humans happen to find
meaningful.

In this section, we’ll describe Vespa’s full tensor formalism, including
tensors and tensor types, and the core set of tensor operations.

Tensor types

A tensor type consists of a value type and a set of dimensions. The value type is a numeric data type: double, float, int8, bfloat16, etc. The default value type is double.

A dimension consists of a name and a type. The type can be either indexed or mapped, and, if indexed, can optionally include a size. The number of dimensions in the tensor is called its order. A tensor without dimensions (0-order) is a single scalar value. A tensor with one dimension (first-order) is a vector. A tensor with two dimensions (second-order) is a matrix.

Note that this notion of dimensions is not to be confused with the concept of dimensions in vector space, where a vector with 3 elements can represent a vector in a 3-dimensional space.

A value in a dimension is addressed by a label. For mapped dimensions, this label is a string. For indexed dimensions this label is an integer index. Tensors with mapped dimensions only hold values for the labels that exist. Indexed dimensions must have values for all indices.

Some examples:

  • tensor() – a scalar value (double)
  • tensor<float>(x[3]) – An indexed vector with 3 floats
  • tensor(x[2], y[3]) – An indexed matrix of size 2-by-3
  • tensor(name{}) – A (sparse) map of named values
  • tensor<int8>(name{}, x[2]) – A map of named vectors of int8

Tensors

A tensor is simply a tensor type and a set of values. Values are held in
cells that are fully defined by the address given by the labels in each
dimension. A tensor has a string representation defined by the type followed
by the cell values.

Some examples:

  • tensor():3.0 – a scalar value (double)
  • tensor<float>(x[3]):[1.0, 2.0, 3.0] – An indexed vector with 3 floats
  • tensor(x[2], y[3]):[[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]] – An indexed matrix of size 2-by-3
  • tensor(name{}):{ {name:foo}:2, {name:bar}:5 } – A (sparse) map of named values
  • tensor(name{}):{ foo:2, bar:5 } – Same as above with inferred dimension (single sparse)
  • tensor<int8>(name{}, x[2]):{foo:[1,2], bar:[3,4]} – A map of named vectors of int8

Tensor functions

Vespa has a small core set of just 8 tensor functions:

  • Creational: tensor
  • Unary: map, reduce, rename, slice
  • Binary: join, merge, concat

These can be combined and used to express potentially complex computation, as
some take general lambda expressions as arguments. Please refer to the tensor
function
reference
for detailed descriptions of these functions.

We’ll provide some examples to demonstrate the expressive power of join and
reduce in the following. These examples are also provided with a link to the
tensor playground. In this interactive
environment, you can experiment with tensor expressions. Feel free to take any
of these examples and play around with them

Vector outer product

Given two tensors representing vectors, A and B:

A: tensor(x[3]):[1,2,3]
B: tensor(y[3]):[4,5,6]

Notice that these tensors have different dimension names. We can multiply
these two tensors together, A * B, and the result is:

tensor(x[3],y[3]):[[4.0, 5.0, 6.0], [8.0, 10.0, 12.0], [12.0, 15.0, 18.0]]

This tensor has type tensor(x[3],y[3]). To see what is happening
here, note that the expression A * B is a convenience form of the underlying
expression:

join(A, B, f(a,b)(a * b))

The join function is the most used function when combining two input tensors.
As input, it takes two tensors and a lambda function (here f(a,b)(a * b)) to
define how to combine matching cells. The resulting tensor is the natural join
between the input tensors, and the cell values are defined by the lambda. The
resulting type is the union of dimensions from