Vespa at Zedge – providing personalization content to millions of iOS, Android & web users

This blog post describes Zedge’s use of Vespa for search and recommender systems to support content discovery for personalization of mobile phones (Android, iOS and Web). Zedge is now using Vespa in production to serve millions of monthly active users. See the architecture below.


What is Zedge?


Zedge’s main product is an app – Zedge Ringtones & Wallpapers – that provides wallpapers, ringtones, game recommendations and notification sounds customized for your mobile device.  Zedge apps have been downloaded more than 300 million times combined for iOS and Android and is used by millions of people worldwide each month. Zedge is traded on NYSE under the ticker ZDGE.

People use Zedge apps for self-expression. Setting a wallpaper or ringtone on your mobile device is in many ways similar to selecting clothes, hairstyle or other fashion statements. In fact people try a wallpaper or ringtone in a similar manner as they would try clothes in a dressing room before making a purchase decision, they try different wallpapers or ringtones before deciding on one they want to keep for a while.

The decision for selecting a wallpaper is not taken lightly, since people interact and view their mobile device screen (and background wallpaper) a lot (hundreds of times per day).

Why Zedge considered Vespa

Zedge apps – for iOS, Android and Web – depend heavily on search and recommender services to support content discovery. These services have been developed over several years and constituted of multiple subsystems – both internally developed and open source – and technologies for both search and recommender serving. In addition there were numerous big data processing jobs to build and maintain data for content discovery serving. The time and complexity of improving search and recommender services and corresponding processing jobs started to become high, so simplification was due.


Vespa seemed like a promising open source technology to consider for Zedge, in particular since it was proven in several ways within Oath (Yahoo):

  1. Scales to handle very large systems , e.g. 

  2. Flickr with billions of images and
  3. Yahoo Gemini Ads Platform with more than one hundred thousand request per second to serve ads to 1 billion monthly active users for services such as Techcrunch, Aol, Yahoo!, Tumblr and Huffpost.
  4. Runs stable and requires very little operations support – Oath has a few hundred – many of them large – Vespa based applications requiring less than a handful operations people to run smoothly. 
  5. Rich set of features that Zedge could gain from using

  6. Built-in tensor processing support could simplify calculation and serving of related wallpapers (images) & ringtones/notifications (audio)
  7. Built-in support of Tensorflow models to simplify development and deployment of machine learning based search and recommender ranking (at that time in development according to Oath).
  8. Search Chains
  9. Help from core developers of Vespa

The Vespa pilot project

Given the content discovery technology need and promising characteristics of Vespa we started out with a pilot project with a team of software engineers, SRE and data scientists with the goals of:

  1. Learn about Vespa from hands-on development 
  2. Create a realistic proof of concept using Vespa in a Zedge app
  3. Get initial answers to key questions about Vespa, i.e. enough to decide to go for it fully

  4. Which of today’s API services can it simplify and replace?
  5. What are the (cloud) production costs with Vespa at Zedge’s scale? (OPEX)
  6. How will maintenance and development look like with Vespa? (future CAPEX)
  7. Which new (innovation) opportunities does Vespa give?

The result of the pilot project was successful – we developed a good proof of concept use of Vespa with one of our Android apps internally and decided to start a project transferring all recommender and search serving to Vespa. Our impression after the pilot was that the main benefit was by making it easier to maintain and develop search/recommender systems, in particular by reducing amount of code and complexity of processing jobs.

Autosuggest for search with Vespa

Since autosuggest (for search) required both low latency and high throughput we decided that it was a good candidate to try for production with Vespa first. Configuration wise it was similar to regular search (from the pilot), but snippet generation (document summary) requiring access to document store was superfluous for autosuggest.

A good approach for autosuggest was to:

  1. Make all document fields searchable with autosuggest of type (in-memory) attribute

  4. (basics)
  5. Avoid snippet generation and using the document store by overriding the document-summary setting in search definitions to only access attributes


The figure above illustrates the autosuggest architecture. When the user starts typing in the search field, we fire a query with the search prefix to the Cloudflare worker – which in case of a cache hit returns the result (possible queries) to the client. In case of a cache miss the Cloudflare worker forwards the query to our Vespa instance handling autosuggest.

Regarding external API for autosuggest we use
Cloudflare Workers
(supporting Javascript on V8 and later perhaps multiple languages with Webassembly)
to handle API queries from Zedge apps in front of Vespa running in Google Cloud.
This setup allow for simple close-to-user caching of autosuggest results.

Without going into details we had several recommender and search services to adapt to Vespa. These services were adapted by writing custom Vespa searchers and in some cases search chains:

The main change compared to our old recommender and related content services was the degree of dynamicity and freshness of serving, i.e. with Vespa more ranking signals are calculated on the fly using Vespa’s tensor support instead of being precalculated and fed into services periodically. Another benefit of this was that the amount of computational (big data) resources and code for recommender & related content processing was heavily reduced.

Continuous Integration and Testing with Vespa

A main focus was to make testing and deployment of Vespa services with continuous integration (see figure below). We found that a combination of Jenkins (or similar CI product or service) with Docker Compose worked nicely in order to test new Vespa applications, corresponding configurations and data (samples) before deploying to the staging cluster with Vespa on Google Cloud. This way we can have a realistic test setup – with Docker Compose – that is close to being exactly similar to the production environment (even at hostname level).


Monitoring of Vespa with Prometheus and Grafana

For monitoring we created a tool that continuously read Vespa metrics, stored them in Prometheus (a time series database) and visualized them them with Grafana. This tool can be found on More information about Vespa metrics and monitoring:



The team quickly got up to speed with Vespa with its good documentation and examples, and it has been running like a clock since we started using it for real loads in production. But this was only our first step with Vespa – i.e. consolidating existing search and recommender technologies into a more homogeneous and easier to maintain form.

With Vespa as part of our architecture we see many possible paths for evolving our search and recommendation capabilities (e.g. machine learning based ranking such as integration with Tensorflow and ONNX).

Best regards,
Zedge Content Discovery Team

Parent-child joins and tensors for content recommendation

Aaron Nagao

Aaron Nagao

Senior Software Engineer, Verizon Media

A real-world application of parent-child joins and tensor functions to model topic popularity for content recommendation.

Every time a user visits, we build the best news stream for that user from tens of thousands of candidate articles on a variety of topics such as sports, finance, and entertainment.

To rank these articles, a key feature is the topic’s click-through rate (CTR), a real-time popularity metric aggregated over all articles of that topic. There are two reasons why topic CTR is an important feature for machine-learned ranking:

  1. It helps the cold-start problem by allowing the model to use the popularity of other articles of that topic as a prior for new articles.
  2. It reduces a high-dimensional categorical feature like a topic to a single numerical feature (its mean CTR)—this mean encoding is needed for a model to incorporate topics.

In this blog post, we describe how we model topic CTRs using two modern features of Vespa: parent-child joins and tensor functions. Vespa, the open source big data serving engine created by Yahoo, powers all of our content recommendations in real-time for millions of users at scale.

To model ranking features in Vespa, most features like an article’s length or age are simple properties of the document which are stored within the Vespa document. In contrast, a topic CTR is shared by all documents of that topic.

A simple approach would be to feed the topic CTR to every document about that topic. But this denormalized approach both duplicates data and complicates CTR updates—the updater must first query for all documents of that topic in order to feed the updated CTR to all of them.

The standard solution for this from relational databases is joins through foreign keys, and such joins are supported in Vespa as parent-child relationships.

We model our relationships by storing all topic CTRs in a single “global” document, and each of our articles has a foreign key that points to that global document, as shown in the Figure. This way, when updating CTRs we only need to update the global document and not each individual article, avoiding data duplication.

Global document

When ranking articles, Vespa uses the foreign key to do a real-time join between each article and the global document to retrieve the topic CTRs. Vespa co-locates the global document on the article content node to reduce the latency of this real-time join, also represented in the Figure.

Vespa seamlessly allows us to write updated CTRs to the global document in real-time while concurrently reading the CTRs for use in article ranking.

The schema definition uses reference for this parent-child relationship:

document globalscores {
    field topic_ctrs type tensor<float>(topic{}) {
        indexing: attribute
        attribute: fast-search

document article {
    field doc_topics type tensor<float>(topic{}) {
        indexing: attribute | summary

    field ptr type reference<globalscores> {
        indexing: attribute

Now that our data is modeled in Vespa, we want to use the CTRs as features in our ranking model.

One challenge is that some articles have 1 topic (so just 1 topic CTR) while some articles have 5 topics (with 5 CTRs). Since machine learning generally requires each article to have the same number of features, we take the average topic CTR and the maximum topic CTR, aggregating a differing number of CTRs into a fixed number of summary statistics.

We compute the average and maximum using Vespa’s Tensor API. A tensor is a vector with labeled dimensions, and Vespa provides API functions like sum, argmax, and * (elementwise multiplication) that operate on input tensors to compute features.

The rank profile code that computes these ranking features is:

import field ptr.topic_ctrs as global_topic_ctrs {}

rank-profile yahoo inherits default {
    # helper functions
    function AVG_CTR(weights, ctrs) {
        # weighted average CTR
        expression: sum(weights * ctrs) / sum(weights)
    function MAX_CTR(weights, ctrs) {
        # weighted max, then use unweighted CTR
        expression: sum( argmax(weights * ctrs) * ctrs )

    # ranking features
    function TOPIC_AVG_CTR() {
        expression: AVG_CTR(attribute(doc_topics), attribute(global_topic_ctrs))
    function TOPIC_MAX_CTR() {
        expression: MAX_CTR(attribute(doc_topics), attribute(global_topic_ctrs))

Rank Profile Walkthrough

Suppose an article is about two topics each with an associated weight:

attribute(doc_topics) = <'US': 0.7, 'Sports': 0.9>

And in the global document, the real-time CTR values for all topics are:

topic_ctrs = <'US': 0.08, 'Sports': 0.02, 'Finance': 0.05, ...>

The first import line does a real-time join between the article’s foreign key and the global document, so that the article ranking functions can reference the global CTRs as if they were stored with each article:

global_topic_ctrs = ptr.topic_ctrs = <'US': 0.08, 'Sports': 0.02, 'Finance': 0.05, ...>

For both of our features TOPIC_AVG_CTR and TOPIC_MAX_CTR, the first step uses the * elementwise multiplication operator in Vespa’s Tensor API, which effectively does a lookup of the article’s topics in the global tensor:

weights * ctrs
= attribute(doc_pub) * attribute(global_pub_ctrs)
= <'US': 0.7, 'Sports': 0.9> * <'US': 0.08, 'Sports': 0.02, 'Finance': 0.05, ...>
= <'US': 0.056, 'Sports': 0.018>

Then TOPIC_AVG_CTR computes a weighted average CTR by summing and normalizing by the weights:

= sum(weights * ctrs) / sum(weights)
= sum(<'US': 0.056, 'Sports': 0.018>) / sum(<'US': 0.7, 'Sports': 0.9>)
= 0.074 / 1.6
= 0.046

(Note this weighted average of 0.046 is closer to the Sports CTR=0.02 than the US CTR=0.08 because Sports had a higher topic weight.)

And TOPIC_MAX_CTR finds the CTR of the entity with the maximum weighted CTR:

argmax(weights * ctrs)
= argmax(<'US': 0.056, 'Sports': 0.018>)
= <'US': 1>

argmax(weights * ctrs) * ctrs
= <'US': 1> * <'US': 0.08, 'Sports': 0.02, 'Finance': 0.05, ...>
= <'US': 0.08>

(With a final sum to convert the mapped tensor to the scalar 0.08.)

These two examples demonstrate the expressiveness of Vespa’s Tensor API to compute useful features.

Ultimately TOPIC_AVG_CTR and TOPIC_MAX_CTR are two features computed in real-time for every article during ranking. These features can then be added to any machine-learned ranking model—Vespa supports gradient-boosted trees from XGBoost and LightGBM, and neural networks in TensorFlow and ONNX formats.

While we could have stored these topic CTRs in some external store, we would have essentially needed to fetch all of them when ranking as every topic is represented in our pool of articles. Instead, Vespa moves the computation to the data by co-locating the global feature store on the article content node, avoiding network load and reducing system complexity.

We also keep the global CTRs in-memory for even faster performance with the aptly named attribute: fast-search. Ultimately, the Vespa team optimized our use case of parent-child joins and tensors to rank 10,000 articles in just 17.5 milliseconds!

This blog post described just two of the many ranking features used for real-time content recommendation at Verizon Media, all powered by Vespa.