Blog search application in Vespa

Update 2021-05-20:
This blog post refers to Vespa sample applications that do not exist anymore.
Please refer to the
News search and recommendation tutorial
for an updated version of text and sample applications.

Introduction

This is the first of a series of blog posts where data from WordPress.com (WP) is used to highlight how Vespa can be used to store, search and recommend blog posts. The data was made available during a Kaggle challenge to predict which blog posts someone would like based on their past behavior. It contains many ingredients that are necessary to showcase needs, challenges and possible solutions that are useful for those interested in building and deploying such applications in production.

The end goal is to build an application where:

  • Users will be able to search and manipulate the pool of blog posts available.
  • Users will get blog post recommendations from the content pool based on their interest.

This part addresses:

  • How to describe the dataset used as well as any information connected to the data.
  • How to set up a basic blog post search engine using Vespa.

The next parts show how to extend this basic search engine application with machine learned models to create a blog recommendation engine.

Dataset

The dataset contains blog posts written by WP bloggers and actions, in this case ‘likes’, performed by WP readers in blog posts they have interacted with. The dataset is publicly available at Kaggle and was released during a challenge to develop algorithms to help predict which blog posts users would most likely ‘like’ if they were exposed to them. The data includes these fields per blog post:

  • _ post_id _ – unique numerical id identifying the blog post
  • _ date_gmt _ – string representing date of blog post creation in GMT format yyyy-mm-dd hh:mm:ss
  • _ author _ – unique numerical id identifying the author of the blog post
  • _ url _ – blog post URL
  • _ title _ – blog post title
  • _ blog _ – unique numerical id identifying the blog that the blog post belongs to
  • _ tags _ – array of strings representing the tags of the blog posts
  • _ content _ – body text of the blog post, in html format
  • _ categories _ – array of strings representing the categories the blog post was assigned to

For the user actions:

  • _ post_id _ – unique numerical id identifying the blog post
  • _ uid _ – unique numerical id identifying the user that liked post_id
  • _ dt _ – date of the interaction in GMT format yyyy-mm-dd hh:mm:ss

Downloading raw data

For the purposes of this post, it is sufficient to use the first release of training data that consists of 5 weeks of posts as well as all the ‘like’ actions that occurred during those 5 weeks.

This first release of training data is available here – once downloaded, unzip it. The 1,196,111 line trainPosts.json will be our practice document data. This file is around 5GB in size.

Requirements

Indexing the full data set requires 23GB disk space. We have tested with a Docker container with 10GB RAM. We used similar settings as described in the vespa quick start guide. As in the guide we assume that the $VESPA_SAMPLE_APPS env variable points to the directory with your local clone of the vespa sample apps:

$ docker run -m 10G --detach --name vespa --hostname vespa --privileged --volume $VESPA_SAMPLE_APPS:/vespa-sample-apps --publish 8080:8080 vespaengine/vespa

Searching blog posts

Functional specification:

  • Blog post title, content, tags and categories must all be searchable
  • Allow blog posts to be sorted by both relevance and date
  • Allow grouping of search results by tag or category

In terms of data, Vespa operates with the notion of documents. A document represents a single, searchable item in your system, e.g., a blog post, a photo, or a news article. Each document type must be defined in the Vespa configuration through a search definition. Think of a search definition as being similar to a table definition in a relational database; it consists of a set of fields, each with a given name, a specific type, and some optional properties.

As an example, for this simple blog post search application, we could create the document type blog_post with the following fields:

  • _ url _ – of type uri
  • _ title _ – of type string
  • _ content _ – of type string (string fields can be of any length)
  • _ date_gmt _ – of type string (to store the creation date in GMT format)

The data fed into Vespa must match the structure of the search definition, and the hits returned when searching will be on this format as well.

Application Packages

A Vespa application package is the set of configuration files and Java plugins that together define the behavior of a Vespa system: what functionality to use, the available document types, how ranking will be done and how data will be processed during feeding and indexing. The search definition, e.g., blog_post.sd, is a required part of an application package — the other required files are services.xml and hosts.xml.

The sample application blog search creates a simple but functional blog post search engine. The application package is found in src/main/application.

Services Specification

services.xml defines the services that make up the Vespa application — which services to run and how many nodes per service:

<?xml version='1.0' encoding='UTF-8'?>
<services version='1.0'>

  <container id='default' version='1.0'>
    <search/>
    <document-api/>
    <nodes>
      <node hostalias="node1"/>
    </nodes>
  </container>

  <content id='blog_post' version='1.0'>
    <search>
      <visibility-delay>1.0</visibility-delay>
    </search>
    <redundancy>1</redundancy>
    <documents>
      <document mode="index" type="blog_post"/>
    </documents>
    <nodes>
      <node hostalias="node1"/>
    </nodes>
    <engine>
      <proton>
        <searchable-copies>1</searchable-copies>
      </proton>
    </engine>
  </content>

</services>
  • <container> defines the container cluster for document, query and result processing
  • <search> sets up the search endpoint for Vespa queries. The default port is 8080.
  • <document-api> sets up the document endpoint for feeding.
  • <nodes> defines the nodes required per service. (See the reference for more on container cluster setup.)
  • <content> defines how documents are stored and searched
  • <redundancy> denotes how many copies to keep of each document.
  • <documents> assigns the document types in the search definition — the content cluster capacity can be increased by adding node elements — see elastic Vespa. (See also the reference for more on content cluster setup.)
  • <nodes> defines the hosts for the content cluster.

Deployment Specification

hosts.xml contains a list of all the hosts/nodes that is part of the application, with an alias for each of them. Here we use a single node:

<?xml version="1.0" encoding="utf-8" ?>
<hosts>
  <host name="localhost">
    <alias>node1</alias>
  </host>
</hosts>

Search Definition

The blog_post document type mentioned in src/main/application/service.xml is defined in the search definition. src/main/application/searchdefinitions/blog_post.sd contains the search definition for a document of type blog_post:

search blog_post {

    document blog_post {

        field date_gmt type string {
            indexing: summary
        }

        field language type string {
            indexing: summary
        }

        field author type string {
            indexing: summary
        }

        field url type string {
            indexing: summary
        }

        field title type string {
            indexing: summary | index
        }

        field blog type string {
            indexing: summary
        }

        field post_id type string {
            indexing: summary
        }

        field tags type array<string> {
            indexing: summary
        }

        field blogname type string {
            indexing: summary
        }

        field content type string {
            indexing: summary | index
        }

        field categories type array<string> {
            indexing: summary
        }

        field date type int {
            indexing: summary | attribute
        }

    }


    fieldset default {
        fields: title, content
    }


    rank-profile post inherits default {

        first-phase {
            expression:nativeRank(title, content)
        }

    }

}

document is wrapped inside another element called search. The name following these elements, here blog_post, must be exactly the same for both.

The field property indexing configures the indexing pipeline for a field, which defines how Vespa will treat input during indexing — see indexing language. Each part of the indexing pipeline is separated by the pipe character ‘’:

Deploy the Application Package

Once done with the application package, deploy the Vespa application — build and start Vespa as in the quick start. Deploy the application:

$ cd /vespa-sample-apps/blog-search
$ vespa-deploy prepare src/main/application && vespa-deploy activate

This prints that the application was activated successfully and also the checksum, timestamp and generation for this deployment (more on that later). Pointing a browser to http://localhost:8080/ApplicationStatus returns JSON-formatted information about the active application, including its checksum, timestamp and generation (and should be the same as the values when vespa-deploy activate was run). The generation will increase by 1 each time a new application is successfully deployed, and is the easiest way to verify that the correct version is active.

The Vespa node is now configured and ready for use.

Feeding Data

The data fed to Vespa must match the search definition for the document type. The data downloaded from Kaggle, contained in trainPosts.json, must be converted to a valid Vespa document format before it can be fed to Vespa. Find a parser in the utility repository. Since the full data set is unnecessarily large for the purposes of this first part of this post, we use only the first 10,000 lines of it, but feel free to load all 1,1M entries:

$ head -10000 trainPosts.json > trainPostsSmall.json
$ python parse.py trainPostsSmall.json > feed.json

Send this to Vespa using one of the tools Vespa provides for feeding. Here we will use the Java feeding API:

$ java -jar $VESPA_HOME/lib/jars/vespa-http-client-jar-with-dependencies.jar --verbose --file feed.json --host localhost --port 8080

Note that in the sample-apps/blog-search directory, there is a file with sample data. You may also feed this file using this method.

Track feeding progress

Use the Metrics API to track number of documents indexed:

$ curl -s 'http://localhost:19112/state/v1/metrics' | tr ',' '\n' | grep -A 2 proton.doctypes.blog_post.numdocs

You can also inspect the search node state by

$ vespa-proton-cmd --local getState  

Fetch documents

Fetch documents by document id using the Document API:

$ curl -s 'http://localhost:8080/document/v1/blog-search/blog_post/docid/1750271' | python -m json.tool

The first query

Searching with Vespa is done using a HTTP GET requests, like:

<host:port>/<search>?<yql=value1>&<param2=value2>...

The only mandatory parameter is the query, using yql=<yql query>. More details can be found in the Search API.

Given the above search definition, where the fields title and content are part of the fieldset default, any document containing the word “music” in one or more of these two fields matches our query below:

$ curl -s 'http://localhost:8080/search/?yql=select+*+from+sources+*+where+default+contains+%22music%22%3B' | python -m json.tool

Looking at the output, please note:

  • The field documentid in the output and how it matches the value we assigned to each put operation when feeding data to Vespa.
  • Each hit has a property named relevance, which indicates how well the given document matches our query, using a pre-defined default ranking function. You have full control over ranking — more about ranking and ordering later. The hits are sorted by this value.
  • When multiple hits have the same relevance score their internal ordering is undefined. However, their internal ordering will not change unless the documents are re-indexed.
  • Add &tracelevel=9 to dump query parsing details

Other examples

yql=select+title+from+sources+*+where+title+contains+%22music%22%3B

Once more a search for the single term “music”, but this time with the explicit field title. This means that we only want to match documents that contain the word “music” in the field title. As expected, you will see fewer hits for this query, than for the previous one.

yql=select+*+from+sources+*+where+default+contains+%22music%22+AND+default+contains+%22festival%22%3B

This is a query for the two terms “music” and “festival”, combined with an AND operation; it finds documents that match both terms — but not just one of them.

yql=select+*+from+sources+*+where+sddocname+contains+%22blog_post%22%3B

This is a single-term query in the special field sddocname for the value “blog_post”. This is a common and useful Vespa trick to get the number of indexed documents for a certain document type (search definition): sddocname is a special and reserved field which is always set to the name of the document type for a given document. The documents are all of type blog_post, and will therefore automatically have the field sddocname set to that value.

This means that the query above really means “Return all documents of type blog_post”, and as such all documents in the index are returned.

Vespa Meetup in Sunnyvale | Vespa Blog

WHAT: Vespa meetup with various presentations from the Vespa team.

Several Vespa developers from Norway are in Sunnyvale, use this opportunity to learn more about the open big data serving engine Vespa and meet the team behind it.

WHEN: Monday, December 4th, 6:00pm – 8:00pm PDT

WHERE: Oath/Yahoo Sunnyvale Campus
Building E, Classroom 9 & 10
700 First Avenue, Sunnyvale, CA 94089

MANDATORY REGISTRATION: https://goo.gl/forms/7kK2vlaipgsSSSH42

Agenda

6.00 pm: Welcome & Intro

6.15 pm: Vespa tips and tricks

7.00 pm: Tensors in Vespa, intro and usecases

7.45 pm: Vespa future and roadmap

7.50 pm: Q&A

This meetup is a good arena for sharing experience, get good tips, get inside details in Vespa, discuss and impact the roadmap, and it is a great opportunity for the Vespa team to meet our users. Hope to see many of you!

Blog recommendation in Vespa | Vespa Blog

Update 2021-05-20:
This blog post refers to Vespa sample applications that do not exist anymore.
Please refer to the
News search and recommendation tutorial
for an updated version of text and sample applications.

Introduction

This post builds upon the previous blog search application and extends the basic search engine to include machine learned models to help us recommend blog posts to users that arrive at our application. Assume that once a user arrives, we obtain his user identification number, denoted in here by user_id, and that we will send this information down to Vespa and expect to obtain a blog post recommendation list containing 100 blog posts tailored for that specific user.

Prerequisites:

Collaborative Filtering

We will start our recommendation system by implementing the collaborative filtering algorithm for implicit feedback described in (Hu et. al. 2008). The data is said to be implicit because the users did not explicitly rate each blog post they have read. Instead, the have “liked” blog posts they have likely enjoyed (positive feedback) but did not have the chance to “dislike” blog posts they did not enjoy (absence of negative feedback). Because of that, implicit feedback is said to be inherently noisy and the fact that a user did not “like” a blog post might have many different reasons not related with his negative feelings about that blog post.

In terms of modeling, a big difference between explicit and implicit feedback datasets is that the ratings for the explicit feedback are typically unknown for the majority of user-item pairs and are treated as missing values and ignored by the training algorithm. For an implicit dataset, we would assume a rating of zero in case the user has not liked a blog post. To encode the fact that a value of zero could come from different reasons we will use the concept of confidence as introduced by (Hu et. al. 2008), which causes the positive feedback to have a higher weight than a negative feedback.

Once we train the collaborative filtering model, we will have one vector representing a latent factor for each user and item contained in the training set. Those vectors will later be used in the Vespa ranking framework to make recommendations to a user based on the dot product between the user and documents latent factors. An obvious problem with this approach is that new users and new documents will not have those latent factors available to them. This is what is called a cold start problem and will be addressed with content-based techniques described in future posts.

Evaluation metrics

The evaluation metric used by Kaggle for this challenge was the Mean Average Precision at 100 (MAP@100). However, since we do not have information about which blog posts the users did not like (that is, we have only positive feedback) and our inability to obtain user behavior to the recommendations we make (this is an offline evaluation, different from the usual A/B testing performed by companies that use recommendation systems), we offer a similar remark as the one included in (Hu et. al. 2008) and prefer recall-oriented measures. Following (Hu et. al. 2008) we will use the expected percentile ranking.

Evaluation Framework

Generate training and test sets

In order to evaluate the gains obtained by the recommendation system when we start to improve it with more accurate algorithms, we will split the dataset we have available into training and test sets. The training set will contain document (blog post) and user action (likes) pairs as well as any information available about the documents contained in the training set. There is no additional information about the users besides the blog posts they have liked. The test set will be formed by a series of documents available to be recommended and a set of users to whom we need to make recommendations. This list of test set documents constitutes the Vespa content pool, which is the set of documents stored in Vespa that are available to be served to users. The user actions will be hidden from the test set and used later to evaluate the recommendations made by Vespa.

To create an application that more closely resembles the challenges faced by companies when building their recommendation systems, we decided to construct the training and test sets in such a way that:

  • There will be blog posts that had been liked in the training set by a set of users and that had also been liked in the test set by another set of users, even though this information will be hidden in the test set. Those cases are interesting to evaluate if the exploitation (as opposed to exploration) component of the system is working well. That is, if we are able to identify high quality blog posts based on the available information during training and exploit this knowledge by recommending those high quality blog posts to another set of users that might like them as well.
  • There will be blog posts in the test set that had never been seen in the training set. Those cases are interesting in order to evaluate how the system deals with the cold-start problem. Systems that are too biased towards exploitation will fail to recommend new and unexplored blog posts, leading to a feedback loop that will cause the system to focus into a small share of the available content.

A key challenge faced by recommender system designers is how to balance the exploitation/exploration components of their system, and our training/test set split outlined above will try to replicate this challenge in our application. Notice that this split is different from the approach taken by the Kaggle competition where the blog posts available in the test set had never been seen in the training set, which removes the exploitation component of the equation.

The Spark job uses trainPosts.json and creates the folders blog-job/training_set_ids and blog-job/test_set_ids containing files with post_id and user_idpairs:

$ cd blog-recommendation; export SPARK_LOCAL_IP="127.0.0.1"
$ spark-submit --class "com.yahoo.example.blog.BlogRecommendationApp" \
  --master local[4] ../blog-tutorial-shared/target/scala-*/blog-support*.jar \
  --task split_set --input_file ../trainPosts.json \
  --test_perc_stage1 0.05 --test_perc_stage2 0.20 --seed 123 \
  --output_path blog-job/training_and_test_indices
  • test_perc_stage1: The percentage of the blog posts that will be located only on the test set (exploration component).
  • test_perc_stage2: The percentage of the remaining (post_id, user_id) pairs that should be moved to the test set (exploitation component).
  • seed: seed value used in order to replicate results if required.

Compute user and item latent factors

Use the complete training set to compute user and item latent factors. We will leave the discussion about tuning and performance improvement of the model used to the section about model tuning and offline evaluation. Submit the Spark job to compute the user and item latent factors:

$ spark-submit --class "com.yahoo.example.blog.BlogRecommendationApp" \
  --master local[4] ../blog-tutorial-shared/target/scala-*/blog-support*.jar \
  --task collaborative_filtering \
  --input_file blog-job/training_and_test_indices/training_set_ids \
  --rank 10 --numIterations 10 --lambda 0.01 \
  --output_path blog-job/user_item_cf

Verify the vectors for the latent factors for users and posts:

$ head -1 blog-job/user_item_cf/user_features/part-00000 | python -m json.tool
{
    "user_id": 270,
    "user_item_cf": {
        "user_item_cf:0": -1.750116e-05,
        "user_item_cf:1": 9.730623e-05,
        "user_item_cf:2": 8.515047e-05,
        "user_item_cf:3": 6.9297894e-05,
        "user_item_cf:4": 7.343942e-05,
        "user_item_cf:5": -0.00017635927,
        "user_item_cf:6": 5.7642872e-05,
        "user_item_cf:7": -6.6685796e-05,
        "user_item_cf:8": 8.5506894e-05,
        "user_item_cf:9": -1.7209566e-05
    }
}
$ head -1 blog-job/user_item_cf/product_features/part-00000 | python -m json.tool
{
    "post_id": 20,
    "user_item_cf": {
        "user_item_cf:0": 0.0019320602,
        "user_item_cf:1": -0.004728486,
        "user_item_cf:2": 0.0032499845,
        "user_item_cf:3": -0.006453364,
        "user_item_cf:4": 0.0015929453,
        "user_item_cf:5": -0.00420313,
        "user_item_cf:6": 0.009350027,
        "user_item_cf:7": -0.0015649397,
        "user_item_cf:8": 0.009262732,
        "user_item_cf:9": -0.0030964287
    }
}

At this point, the vectors with latent factors can be added to posts and users.

Add vectors to search definitions using tensors

Modern machine learning applications often make use of large, multidimensional feature spaces and perform complex operations on those features, such as in large logistic regression and deep learning models. It is therefore necessary to have an expressive framework to define and evaluate ranking expressions of such complexity at scale.

Vespa comes with a Tensor framework, which unify and generalizes scalar, vector and matrix operations, handles the sparseness inherent to most machine learning application (most cases evaluated by the model is lacking values for most of the features) and allow for models to be continuously updated. Additional information about the Tensor framework can be found in the tensor user guide.

We want to have those latent factors available in a Tensor representation to be used during ranking by the Tensor framework. A tensor field user_item_cf is added to blog_post.sd to hold the blog post latent factor:

field user_item_cf type tensor(user_item_cf[10]) {
	indexing: summary | attribute
	attribute: tensor(user_item_cf[10])
}

field has_user_item_cf type byte {
	indexing: summary | attribute
	attribute: fast-search
}

A new search definition user.sd defines a document type named user to hold information for users:

search user {
    document user {
        field user_id type string {
            indexing: summary | attribute
            attribute: fast-search
        }

        field has_read_items type array<string> {
            indexing: summary | attribute
        }

        field user_item_cf type tensor(user_item_cf[10]) {
            indexing: summary | attribute
            attribute: tensor(user_item_cf[10])
        }

        field has_user_item_cf type byte {
            indexing: summary | attribute
            attribute: fast-search
        }
    }
}

Where:

  • user_id: unique identifier for the user
  • user_item_cf: tensor that will hold the user latent factor
  • has_user_item_cf: flag to indicate the user has a latent factor

Join and feed data

Build and deploy the application:

Deploy the application (in the Docker container):

$ vespa-deploy prepare /vespa-sample-apps/blog-recommendation/target/application && \
  vespa-deploy activate

Wait for app to activate (200 OK):

$ curl -s --head http://localhost:8080/ApplicationStatus

The code to join the latent factors in blog-job/user_item_cf into blog_post and user documents is implemented in tutorial_feed_content_and_tensor_vespa.pig. After joining in the new fields, a Vespa feed is generated and fed to Vespa directly from Pig :

$ pig -Dvespa.feed.defaultport=8080 -Dvespa.feed.random.startup.sleep.ms=0 \
  -x local \
  -f ../blog-tutorial-shared/src/main/pig/tutorial_feed_content_and_tensor_vespa.pig \
  -param VESPA_HADOOP_JAR=../vespa-hadoop*.jar \
  -param DATA_PATH=../trainPosts.json \
  -param TEST_INDICES=blog-job/training_and_test_indices/testing_set_ids \
  -param BLOG_POST_FACTORS=blog-job/user_item_cf/product_features \
  -param USER_FACTORS=blog-job/user_item_cf/user_features \
  -param ENDPOINT=localhost

A successful data join and feed will output:

Input(s):
Successfully read 1196111 records from: "file:///Users/kraune/github/vespa-engine/sample-apps/trainPosts.json"
Successfully read 341416 records from: "file:///Users/kraune/github/vespa-engine/sample-apps/blog-recommendation/blog-job/training_and_test_indices/testing_set_ids"
Successfully read 323727 records from: "file:///Users/kraune/github/vespa-engine/sample-apps/blog-recommendation/blog-job/user_item_cf/product_features"
Successfully read 6290 records from: "file:///Users/kraune/github/vespa-engine/sample-apps/blog-recommendation/blog-job/user_item_cf/user_features"

Output(s):
Successfully stored 286237 records in: "localhost"

Sample blog post and user:

Ranking

Set up a rank function to return the best matching blog posts given some user latent factor. Rank the documents using a dot product between the user and blog post latent factors, i.e. the query tensor and blog post tensor dot product (sum of the product of the two tensors) – from blog_post.sd:

rank-profile tensor {
    first-phase {
        expression {
            sum(query(user_item_cf) * attribute(user_item_cf))
        }
    }
}

Configure the ranking framework to expect that query(user_item_cf) is a tensor, and that it is compatible with the attribute in a query profile type – see search/query-profiles/types/root.xml and search/query-profiles/default.xml:

<query-profile-type id="root" inherits="native">
    <field name="ranking.features.query(user_item_cf)" type="tensor(user_item_cf[10])" />
</query-profile-type>

<query-profile id="default" type="root" />

This configures a ranking feature named query(user_item_cf) with type tensor(user_item_cf[10]), which defines it as an indexed tensor with 10 elements. This is the same as the attribute, hence the dot product can be computed.

Query Vespa with a tensor

Test recommendations by sending a tensor with latenct factors: localhost:8080/search/?yql=select%20*%20from%20sources%20blog_post%20where%20has_user_item_cf%20=%201;&ranking=tensor&ranking.features.query(user_item_cf)=%7B%7Buser_item_cf%3A0%7D%3A0.1%2C%7Buser_item_cf%3A1%7D%3A0.1%2C%7Buser_item_cf%3A2%7D%3A0.1%2C%7Buser_item_cf%3A3%7D%3A0.1%2C%7Buser_item_cf%3A4%7D%3A0.1%2C%7Buser_item_cf%3A5%7D%3A0.1%2C%7Buser_item_cf%3A6%7D%3A0.1%2C%7Buser_item_cf%3A7%7D%3A0.1%2C%7Buser_item_cf%3A8%7D%3A0.1%2C%7Buser_item_cf%3A9%7D%3A0.1%7D

The query string, decomposed:

  • yql=select * from sources blog_post where has_user_item_cf = 1 – this selects all documents of type blog_post which has a latent factor tensor
  • restrict=blog_post – search only in blog_post documents
  • ranking=tensor – use the rank-profile tensor in blog_post.sd.
  • ranking.features.query(user_item_cf) – send the tensor as user_item_cf. As this tensor is defined in the query-profile-type, the ranking framework knows its type (i.e. dimensions) and is able to do a dot product with the attribute of same type. The tensor before URL-encoding:

    {
     {user_item_cf:0}:0.1,
     {user_item_cf:1}:0.1,
     {user_item_cf:2}:0.1,
     {user_item_cf:3}:0.1,
     {user_item_cf:4}:0.1,
     {user_item_cf:5}:0.1,
     {user_item_cf:6}:0.1,
     {user_item_cf:7}:0.1,
     {user_item_cf:8}:0.1,
     {user_item_cf:9}:0.1
    }

Query Vespa with user id

Next step is to query Vespa by user id, look up the user profile for the user, get the tensor from it and recommend documents based on this tensor (like the query in previous section). The user profiles is fed to Vespa in the user_item_cf field of the user document type.

In short, set up a searcher to retrieve the user profile by user id – then run the query. When the Vespa Container receives a request, it will create a Query representing it and execute

Blog recommendation with neural network models

Update 2021-05-20:
This blog post refers to Vespa sample applications that do not exist anymore.
Please refer to the
News search and recommendation tutorial
for an updated version of text and sample applications.

Introduction

The main objective of this post is to show how to deploy neural network models in Vespa using our Tensor Framework. In fact, any model that can be represented by a series of Tensor operations can be deployed in Vespa. Neural networks is just a popular example. In addition, we will introduce the multi-phase ranking model available in Vespa that can be used to run more expensive models in a phase based on a reduced number of documents returned by previous phases. This feature allow us to run models that would be prohibitively expensive to use if we had to run them at query-time across all the documents indexed in Vespa.

Model Training

In this section, we will define a neural network model, show how we created a suitable dataset to train the model and train the model using TensorFlow.

The neural network model

In the previous blog post, we computed latent factors for each user and each document and then used a dot-product between user and document vectors to rank the documents available for recommendation to a specific user. In this tutorial we will train a 2-layer fully connected neural network model that will take the same user (u) and document (d) latent factors as input and will output the probability of that specific user liking the document.

More technically, our previous rank function r was given by

r(u,d)=u∗d

while in this tutorial it will be given by

r(u,d,θ)=f(u,d,θ)

where f represents the neural network model described below and θ is the neural network parameter values that we need to learn from training data.

The specific form of the neural network model used here is

p = sigmoid(h1×W2+b2)
h1 = ReLU(x×W1+b1)

where x=[u,d] is the concatenation of the user and document latent factor, ReLU is the rectifier activation function, sigmoid represents the sigmoid function, p is the output of the model and in this case can be interpreted as the probability of the user u liking a blog post d. The parameters of the model are represented by θ=(W1,W2,b1,b2).

Training data

For the training dataset, we will start with the (user_id, post_id) rows from the “training_set_ids” generated previously. Then, we remove every row for which there is no latent factors for the user_id or post_id contained in that row. This gives us a dataset with only positive feedback (label = 1), since each row represents one instance of a user_id liking a post_id.

In order to train our model, we need to generate negative feedback (label = 0). So, for each row (user_id, post_id) in the current dataset we will generate N negative feedback rows by randomly sampling post_id_fake from the pool of post_id’s available in the current set, so that for each (user_id, post_id) row with label = 1 we will increase the dataset with N (user_id, post_id_fake) rows with label = 0.

Find code to generate the dataset in the utility scripts.

Training with TensorFlow

With the training data in hand, we have split it into 80% training set and 20% validation set and used TensorFlow to train the model. The script used can be found in the utility scripts and executed by

$ python vespaModel.py --product_features_file_path vespa_tutorial_data/user_item_cf_cv/product.json \
                       --user_features_file_path vespa_tutorial_data/user_item_cf_cv/user.json \
                       --dataset_file_path vespa_tutorial_data/nn_model/training_set.txt

The progress of your training can be visualized using Tensorboard

$ tensorboard --logdir runs/*/summaries/

##
Model deployment in Vespa

Two Phase Ranking

When a query is sent to Vespa, it will scan all documents available and select the ones (possibly all) that match the query. When the set of documents matching a query is found, Vespa must decide the order of these documents. Unless explicit sorting is used, Vespa decides this order by calculating a number for each document, the rank score, and sorts the documents by this number.

The rank score can be any function that takes as arguments parameters sent by the query, document attributes defined in search definitions and global parameters not directly linked to query or document parameters. One example of rank score is the output of the neural network model defined in this tutorial. The model takes the latent factor u associated with a specific user_id (query parameter), the latent factor dd associated with document post_id (document attribute) and learned model parameters (global parameters not related to a specific query nor document) and returns the probability of user u to like document d.

However, even though Vespa is designed to carry out such calculations optimally, complex expressions becomes expensive when they must be calculated over every one of a large set of matching documents. To relieve this, Vespa can be configured to run two ranking expressions – a smaller and less accurate one on all hits during the matching phase, and a more expensive and accurate one only on the best hits during the reranking phase. In general this allows a more optimal usage of the cpu budget by dedicating more of the total cpu towards the best candidate hits.

The reranking phase, if specified, will by default be run on the 100 best hits on each search node, after matching and before information is returned upwards to the search container. The number of hits to rerank can be turned up or down as needed. Below is a toy example showing how to configure first and second phase ranking expressions in the rank profile section of search definitions where the second phase rank expression is run on the 200 best hits from first phase on each search node.

search myapp {

    …

    rank-profile default inherits default {

        first-phase {
            expression: nativeRank + query(deservesFreshness) * freshness(timestamp)
        }

        second-phase {
            expression {
                0.7 * ( 0.7*fieldMatch(title) + 0.2*fieldMatch(description) + 0.1*fieldMatch(body) ) +
                0.3 * attributeMatch(keywords)
            }
            rerank-count: 200
        }
    }
}

Constant Tensor files

Once the model has been trained in TensorFlow, export the model parameters (W1,W2,b1,b2) to the application folder as Tensors according to the Vespa Document JSON format.

The complete code to serialize the model parameters using Vespa Tensor format can be found in the utility scripts but the following code snipped shows how to serialize the hidden layer weights W1:

serializer.serialize_to_disk(variable_name = "W_hidden", dimension_names = ['input', 'hidden'])

Note that Vespa currently requires dimension names for all the Tensor dimensions (in this case W1 is a matrix, therefore dimension is 2).

In the following section, we will use the following code in the blog_post search definition in order to be able to use the constant tensor W_hidden in our ranking expression.

    constant W_hidden {
        file: constants/W_hidden.json
        type: tensor(input[20],hidden[40])
    }

A constant tensor is data that is not specific to a given document type. In the case above we define W_hidden to be a tensor with two dimensions (matrix), where the first dimension is named input and has size 20 and second dimension is named hidden and has size 40. The data were serialized to a JSON file located at constants/W_hidden.json relative to the application package folder.

Vespa ranking expressions

In order to evaluate the neural network model trained with TensorFlow in the previous section, we need to translate the model structure to a Vespa ranking expression to be defined in the blog_post search definition. To honor a low-latency response, we will take advantage of the Two Phase Ranking available in Vespa and define the first phase ranking to be the same ranking function used in the previous blog post, which is a dot-product between the user and latent factors. After the documents have been sorted by the first phase ranking function, we will rerank the top 200 document from each search node using the second phase ranking given by the neural network model presented above.

Note that we define two ranking profiles in the search definition below. This allow us to decide which ranking profile to use at query time. We defined a ranking profile named tensor which only applies the dot-product between user and document latent factors for all matching documents and a ranking profile named nn_tensor, which rerank the top 200 documents using the neural network model discussed in the previous section.

We will walk through each part of the blog_post search definition, see blog_post.sd.

As always, we start the a search definition with the following line

We define the document type blog_post the same way we have done in the previous tutorial.

    document blog_post {

      # Field definitions
      # Examples:

      field date_gmt type string {
          indexing: summary
      }
      field language type string {
          indexing: summary
      }

      # Remaining fields as found in previous tutorial

    }

We define a ranking profile named tensor which rank all the matching documents by the dot-product between the document latent factor and the user latent factor. This is the same ranking expression used in the previous tutorial, which include code to retrieve the user latent factor based on the user_id sent by the query to Vespa.

    # Simpler ranking profile without
    # second-phase ranking
    rank-profile tensor {
      first-phase {
          expression {
              sum(query(user_item_cf) * attribute(user_item_cf))
          }
      }
    }

Since we want to evaluate the neural network model we have trained, we need to define where to find the model parameters (W1,W2,b1,b2). See the previous section for how to write the TensorFlow model parameters to Vespa Tensor format.

    # We need to specify the type and the location
    # of the files storing tensor values for each
    # Variable in our TensorFlow model. In this case,
    # W_hidden, b_hidden, W_final, b_final

    constant W_hidden {
        file: constants/W_hidden.json
        type: tensor(input[20],hidden[40])
    }

    constant b_hidden {
        file: constants/b_hidden.json
        type: tensor(hidden[40])
    }

    constant W_final {
        file: constants/W_final.json
        type: tensor(hidden[40], final[1])
    }

    constant b_final {
        file: constants/b_final.json
        type: tensor(final[1])
    }

Now, we specify a second rank-profile called nn_tensor that will use the same first phase as the rank-profile tensor but will rerank the top 200 documents using the neural network model as second phase. We refer to the Tensor Reference document for more information regarding the Tensor operations used in the code below.

    # rank profile with neural network model as
    # second phase
    rank-profile nn_tensor {

        # The input to the neural network is the
        # concatenation of the document and query vectors.

        macro nn_input() {
            expression: concat(attribute(user_item_cf), query(user_item_cf), input)
        }

        # Computes the hidden layer

        macro hidden_layer() {
            expression: relu(sum(nn_input * constant(W_hidden), input) + constant(b_hidden))
        }

        # Computes the output layer

        macro final_layer() {
            expression: sigmoid(sum(hidden_layer * constant(W_final), hidden) + constant(b_final))
        }


        # First-phase ranking:
        # Dot-product between user and document latent factors

        first-phase {
            expression: sum(query(user_item_cf) * attribute(user_item_cf))
        }

        # Second-phase ranking:
        # Neural network model based on the user and latent factors

        second-phase {
            rerank-count: 200
            expression: sum(final_layer)
        }

    }

}

Offline evaluation

We will now query Vespa and obtain 100 blog post recommendations for each user_id in our test set. Below, we query Vespa using the tensor ranking function which contain the simpler ranking expression involving the dot-product between user and document latent factors.

pig -x local -f tutorial_compute_metric.pig \
  -param VESPA_HADOOP_JAR=vespa-hadoop.jar \
  -param TEST_INDICES=blog-job/training_and_test_indices/testing_set_ids \
  -param ENDPOINT=$(hostname):8080
  -param NUMBER_RECOMMENDATIONS=100
  -param RANKING_NAME=tensor
  -param OUTPUT=blog-job/cf-metric

We perform the same query routine below, but now using the ranking-profile nn_tensor which reranks the top 200 documents using the neural network model.

pig -x local -f tutorial_compute_metric.pig \
  -param VESPA_HADOOP_JAR=vespa-hadoop.jar \
  -param TEST_INDICES=blog-job/training_and_test_indices/testing_set_ids \
  -param ENDPOINT=$(hostname):8080
  -param NUMBER_RECOMMENDATIONS=100
  -param RANKING_NAME=nn_tensor
  -param OUTPUT=blog-job/cf-metric

The tutorial_compute_metric.pig script can be found in our repo.

Comparing the recommendations obtained by those two ranking profiles and our test set, we see that by deploying a more complex and accurate model in the second phase ranking, we increased the number of relevant documents (documents read by the user) retrieved from 11948 to 12804 (more than 7% increase) and those documents retrieved appeared higher up in the list of recommendations, as shown by the expected percentile ranking metric introduced in the Vespa tutorial pt. 2 which decreased from 37.1% to

Introducing TensorFlow support | Vespa Blog

In previous blog posts we have talked about Vespa’s tensor API which enables some advanced ranking capabilities. The primary use case is for machine learned ranking, where you train your models using some machine learning framework, convert the models to Vespa’s tensor format, and deploy them to Vespa. This works well, but converting trained models to Vespa form is cumbersome.

We are now happy to announce a new feature that makes this process a lot easier: TensorFlow import. With this feature you can directly deploy models you’ve trained in TensorFlow to Vespa, and use these models during ranking. This means that the models are executed in parallel over multiple threads and machines for a single query, which makes it possible to evaluate the model over any number of data items and still bound the total response time. In addition the data items to evaluate with the TensorFlow model can be selected dynamically with a query, and with a cheaper first-phase rank function if needed. Since the TensorFlow models are evaluated on the nodes storing the data, we avoid sending any data over the wire for evaluation.

In this post we’d like to introduce this new feature by discussing how it works, some assumptions behind working with TensorFlow and Vespa, and how to use the feature.

Vespa is optimized to evaluate models repeatedly over many data items (documents).  To do this efficiently, we do not evaluate the model using the TensorFlow inference engine. TensorFlow adds a non-trivial amount of overhead and instrumentation which it uses to manage potentially large scale computations. This is significant in our case, since we need to evaluate models on a micro-second scale. Hence our approach is to extract the parameters (weights) into Vespa tensors, and use the model specification in the TensorFlow graph to generate efficient Vespa tensor expressions.

Importing TensorFlow models is as simple as saving the TensorFlow model using the
SavedModel API,
adding those files to the Vespa application package, and referencing the model using the new TensorFlow ranking feature.
For instance, if your files are in models/my_model in the application package:

first-phase {
    expression: sum(tensorflow(“my_model/saved”))
}

The above expressions runs the model, and sums it to a single scalar value to use in ranking.  One thing you will have to provide is the input(s), or feed, to the graph. Vespa expects you to provide a macro with the same name as the input placeholder. In the macro you can specify where the input should come from, be it a parameter sent along with the query, a document field (possibly in a parent document) or a constant.

As mentioned, Vespa evaluates the imported models once per document. Depending on the requirements of the application, this can impose some natural limitations on the size and complexity of the models that can be evaluated. However, Vespa has a number of other search and rank features that can be used to reduce the search space before running the machine learned models. Typically, one would use the search and first ranking phases to select a relatively small number of candidate documents, which are then given their final rank score in the more computationally expensive second phase model evaluation.

Also note that TensorFlow import is new to Vespa, and we currently only support a subset of the TensorFlow operations. While the supported operations should suffice for many relevant use cases, there are some that are not supported yet due to potentially being too expensive to evaluate per document. For instance, convolutional networks and recurrent networks (LSTMs etc) are not supported. We are continually working to add functionality, if you find that we have some glaring omissions, please let us know.

Going forward we are focusing on further improving performance of our tensor framework for important use cases. We’ll follow up this post with one showing how the performance of evaluation in Vespa compares with TensorFlow serving. We will also add more supported frameworks and our next target is ONNX.

You can read more about this feature in the ranking with TensorFlow model in Vespa documentation. We are excited to announce the TensorFlow support, and we’re eager to hear what you are building with it.

Parent-child in Vespa | Vespa Blog

Parent-child relationships let you model hierarchical relations in your data. This blog post talks about why and how we added this feature to Vespa, and how you can use it in your own applications. We’ll show some performance numbers and discuss practical considerations.

Introduction

The shortest possible background

Traditional relational databases let you perform joins between tables. Joins enable efficient normalization of data through foreign keys, which means any distinct piece of information can be stored in one place and then referred to (often transitively), rather than to be duplicated everywhere it might be needed. This makes relational databases an excellent fit for a great number of applications.

However, if we require scalable, real-time data processing with millisecond latency our options become more limited. To see why, and to investigate how parent-child can help us, we’ll consider a hypothetical use case.

A grand business idea

Let’s assume we’re building a disruptive startup for serving the cutest possible cat picture advertisements imaginable. Advertisers will run multiple campaigns, each with their own set of ads. Since they will (of course) pay us for this privilege, campaigns will have an associated budget which we have to manage at serving time. In particular, we don’t want to serve ads for a campaign that has spent all its money, as that would be free advertising. We must also ensure that campaign budgets are frequently updated when their ads have been served.

Our initial, relational data model might look like this:

Advertiser:
    id: (primary key)
    company_name: string
    contact_person_email: string

Campaign:
    id: (primary key)
    advertiser_id: (foreign key to advertiser.id)
    name: string
    budget: int

Ad:
    id: (primary key)
    campaign_id: (foreign key to campaign.id)
    cuteness: float
    cat_picture_url: string

This data normalization lets us easily update the budgets for all ads in a single operation, which is important since we don’t want to serve ads for which there is no budget. We can also get the advertiser name for all individual ads transitively via their campaign.

Scaling our expectations

Since we’re expecting our startup to rapidly grow to a massive size, we want to make sure we can scale from day one. As the number of ad queries grow, we ideally want scaling up to be as simple as adding more server capacity.

Unfortunately, scaling joins beyond a single server is a significant design and engineering challenge. As a consequence, most of the new data stores released in the past decade have been of the “NoSQL” variant (which might also be called “non-relational databases”). NoSQL’s horizontal scalability is usually achieved by requiring an application developer to explicitly de-normalize all data. This removes the need for joins altogether. For our use case, we have to store budget and advertiser name across multiple document types and instances (duplicated data here marked with bold text):

Advertiser:
    id: (primary key)
    company_name: string
    contact_person_email: string

Campaign:
    id: (primary key)
    advertiser_company_name : string
    name: string
    budget: int

Ad:
    id: (primary key)
    campaign_budget : int
    campaign_advertiser_company_name : string
    cuteness: float
    cat_picture_url: string

Now we can scale horizontally for queries, but updating the budget of a campaign requires updating all its ads. This turns an otherwise O(1) operation into O(n), and we likely have to implement this update logic ourselves as part of our application. We’ll be expecting thousands of budget updates to our cat ad campaigns per second. Multiplying this by an unknown number is likely to overload our servers or lose us money. Or both at the same time.

A pragmatic middle ground

In the middle between these two extremes of “arbitrary joins” and “no joins at all” we have parent-child relationships. These enable a subset of join functionality, but with enough restrictions that they can be implemented efficiently at scale. One core restriction is that your data relationships must be possible to represented as a directed, acyclic graph (DAG).

As it happens, this is the case with our cat picture advertisement use case; Advertiser is a parent to 0-n _Campaign_s, each of which in turn is a parent to 0-n _Ad_s. Being able to represent this natively in our application would get us functionally very close to the original, relational schema.

We’ll see very shortly how this can be directly mapped to Vespa’s parent-child feature support.

Parent-child support in Vespa

Creating the data model

Vespa’s fundamental data model is that of documents. Each document belongs to a particular schema and has a user-provided unique identifier. Such a schema is known as a document type and is specified in a search definition file. A document may have an arbitrary number of fields of different types. Some of these may be indexed, some may be kept in memory, all depending on the schema. A Vespa application may contain many document types.

Here’s how the Vespa equivalent of the above denormalized schema could look (again bolding where we’re duplicating information):

advertiser.sd:
    search advertiser {
        document advertiser {
            field company_name type string {
                indexing: attribute | summary
            }
            field contact_person_email type string {
                indexing: summary
            }
        }
    }

campaign.sd:
    search campaign {
        document campaign {
            field advertiser_company_name type string {
                indexing: attribute | summary
            }
            field name type string {
                indexing: attribute | summary
            }
            field budget type int {
                indexing: attribute | summary
            }
        }
    }

ad.sd:
    search ad {
        document ad {
            field campaign_budget type int {
                indexing: attribute | summary attribute: fast-search
            }
            field campaign_advertiser_company_name type string {
                indexing: attribute | summary
            }
            field cuteness type float {
                indexing: attribute | summary attribute: fast-search
            }
            field cat_picture_url type string {
                indexing: attribute | summary
            }
        }
    }

Note that since all documents in Vespa must already have a unique ID, we do not need to model the primary key IDs explicitly.

We’ll now see how little it takes to change this to its normalized equivalent by using parent-child.

Parent-child support adds two new types of declared fields to Vespa; references and imported fields.

A reference field contains the unique identifier of a parent document of a given document type. It is analogous to a foreign key in a relational database, or a pointer in Java/C++. A document may contain many reference fields, with each potentially referencing entirely different documents.

We want each ad to reference its parent campaign, so we add the following to ad.sd:

    field campaign_ref type reference<campaign> {
        indexing: attribute
    }

We also add a reference from a campaign to its advertiser in campaign.sd:

    field advertiser_ref type reference<advertiser> {
        indexing: attribute
    }

Since a reference just points to a particular document, it cannot be directly used in queries. Instead, imported fields are used to access a particular field within a referenced document. Imported fields are virtual; they do not take up any space in the document itself and they cannot be directly written to by put or update operations.

Add to search campaign in campaign.sd:

    import field advertiser_ref.company_name as campaign_company_name {}

Add to search ad in ad.sd:

    import field campaign_ref.budget as ad_campaign_budget {}

You can import a parent field which itself is an imported field. This enables transitive field lookups.

Add to search ad in ad.sd:

    import field campaign_ref.campaign_company_name as ad_campaign_company_name {}

After removing the now redundant fields, our normalized schema looks like this:

advertiser.sd:
    search advertiser {
        document advertiser {
            field company_name type string {
                indexing: attribute | summary
            }
            field contact_person_email type string {
                indexing: summary
            }
        }
    }

campaign.sd:
    search campaign {
        document campaign {
            field advertiser_ref type reference<advertiser> {
                indexing: attribute
            }
            field name type string {
                indexing: attribute | summary
            }
            field budget type int {
                indexing: attribute | summary
            }
        }
        import field advertiser_ref.company_name as campaign_company_name {}
    }

ad.sd:
    search ad {
        document ad {
            field campaign_ref type reference<campaign> {
                indexing: attribute
            }
            field cuteness type float {
                indexing: attribute | summary attribute: fast-search
            }
            field cat_picture_url type string {
                indexing: attribute | summary
            }
        }
        import field campaign_ref.budget as ad_campaign_budget {}
        import field campaign_ref.campaign_company_name as ad_campaign_company_name {}
    }

Feeding with references

When feeding documents to Vespa, references are assigned like any other string field:

[
    {
        "put": "id:test:advertiser::acme",
        "fields": {
            "company_name": "ACME Inc. cats and rocket equipment",
            "contact_person_email": "[email protected]"
        }
    },
    {
        "put": "id:acme:campaign::catnip",
        "fields": {
            "advertiser_ref": "id:test:advertiser::acme",
            "name": "Most excellent catnip deals",
            "budget": 500
        }
    },
    {
        "put": "id:acme:ad::1",
        "fields": {
            "campaign_ref": "id:acme:campaign::catnip",
            "cuteness": 100.0,
            "cat_picture_url": "/acme/super_cute.jpg"
        }
    }
]

We can efficiently update the budget of a single campaign, immediately affecting all its child ads:

[
    {
        "update": "id:test:campaign::catnip",
        "fields": {
            "budget": {
                "assign": 450
            }
        }
    }
]

Querying using imported fields

You can use imported fields in queries as if they were a regular field. Here are some examples using YQL:

Find all ads that still have a budget left in their campaign:

select * from ad where ad_campaign_budget > 0

Find all ads that have less than $500 left in their budget and belong to an advertiser whose company name contains the word “ACME”:

select * from ad where ad_campaign_budget < 500 and ad_campaign_company_name contains "ACME"

Note that imported fields are not part of the default document summary, so you must add them explicitly to a separate summary if you want their values returned as part of a query result:

document-summary my_ad_summary {
    summary ad_campaign_budget type int {}
    summary ad_campaign_company_name type string {}
    summary cuteness type float {}
    summary cat_picture_url type string {}
}

Add summary=my_ad_summary as a query HTTP request parameter to use it.

Global documents

One of the primary reasons why distributed, generalized joins are so hard to do well efficiently is that performing a join on node A might require looking at data that is only found on node B (or node C, or D…). Vespa gets around this problem by requiring that all documents that may be joined against are always present on every single node. This is achieved by marking parent documents as global in the services.xml declaration. Global documents are automatically distributed to all nodes in the cluster. In our use case, both advertisers and campaigns are used as parents:

<documents>
    <document mode="index" type="advertiser" global="true"/>
    <document mode="index" type="campaign" global="true"/>
    <document mode="index" type="ad"/>
</documents>

You cannot deploy an application containing reference fields pointing to non-global document types. Vespa verifies this at deployment time.

Performance

Feeding of campaign budget updates

Scenario: feed 2 million ad documents 4 times to a content cluster with one node, each time with a different ratio between ads and parent campaigns. Treat 1:1 as baseline (i.e. 2 million ads, 2 million campaigns). Measure relative speedup as the ratio of how many fewer campaigns must be fed to update the budget in all ads.

Results

  • 1 ad per campaign: 35000 campaign puts/second
  • 10 ads per campaign: 29000 campaign puts/second, 8.2x relative speedup
  • 100 ads per campaign: 19000 campaign puts/second, 54x relative speedup
  • 1000 ads percampaign: 6200 campaign puts/second, 177x relative speedup

Note that there is some cost associated with higher fan-outs due to internal management of parent-child mappings, so the speedup is not linear with the fan-out.

Searching on ads based on campaign budgets

Scenario: we want to search for all ads having a specific budget value. First measure with all ad budgets denormalized, then using an imported budget field from the ads’ referenced campaign documents. As with the feeding benchmark, we’ll use 1, 10, 100 and 1000 ads per campaign with a total of 2 million ads combined across all campaigns. Measure average latency over 5 runs.

In each case, the budget attribute is declared as fast-search, which means it has a B-tree index. This allows for efficent value and range searches.

Results

  • 1 ad per campaign: denormalized 0.742 ms, imported 0.818 ms, 10.2% slowdown
  • 10 ads per campaign: denormalized 0.986 ms, imported 1.186 ms, 20.2% slowdown
  • 100 ads per campaign: denormalized 0.830 ms, imported 0.958 ms, 15.4% slowdown
  • 1000 ads per campaign: denormalized 0.936 ms, imported 0.922 ms, 1.5% speedup

The observed speedup for the biggest fan-out is likely an artifact of measurement noise.

We can see that although there is generally some

Introducing ONNX support | Vespa Blog

ONNX (Open Neural Network eXchange) is an open format for the sharing of neural network and other machine learned models between various machine learning and deep learning frameworks. As the open big data serving engine, Vespa aims to make it simple to evaluate machine learned models at serving time at scale. By adding ONNX support in Vespa in addition to our existing TensorFlow support, we’ve made it possible to evaluate models from all the commonly used ML frameworks with low latency over large amounts of data.

Introduction

With the rise of deep learning in the last few years, we’ve naturally enough seen an increase of deep learning frameworks as well: TensorFlow, PyTorch/Caffe2, MxNet etc. One reason for these different frameworks to exist is that they have been developed and optimized around some characteristic, such as fast training on distributed systems or GPUs, or efficient evaluation on mobile devices. Previously, complex projects with non-trivial data pipelines have been unable to pick the best framework for any given subtask due to lacking interoperability between these frameworks. ONNX is a solution to this problem.

ONNX

ONNX is an open format for AI models, and represents an effort to push open standards in AI forward. The goal is to help increase the speed of innovation in the AI community by enabling interoperability between different frameworks and thus streamlining the process of getting models from research to production.

There is one commonality between the frameworks mentioned above that enables an open format such as ONNX, and that is that they all make use of dataflow graphs in one way or another. While there are differences between each framework, they all provide APIs enabling developers to construct computational graphs and runtimes to process these graphs. Even though these graphs are conceptually similar, each framework has been a siloed stack of API, graph and runtime. The goal of ONNX is to empower developers to select the framework that works best for their project, by providing an extensible computational graph model that works as a common intermediate representation at any stage of development or deployment.

Vespa is an open source project which fits well within such an ecosystem, and we aim to make the process of deploying and serving models to production that have been trained on any framework as smooth as possible. Vespa is optimized toward serving and evaluating over potentially very large datasets while still responding in real time. In contrast to other ML model serving options, Vespa can more efficiently evaluate models over many data points. As such, Vespa is an excellent choice when combining model evaluation with serving of various types of content.

Our ONNX support is quite similar to our TensorFlow support. Importing ONNX models is as simple as adding the model to the Vespa application package (under “models/”) and referencing the model using the new ONNX ranking feature:

    expression: sum(onnx("my_model.onnx"))

The above expression runs the model and sums it to a single scalar value to use in ranking. You will have to provide the inputs to the graph. Vespa expects you to provide a macro with the same name as the input tensor. In the macro you can specify where the input should come from, be it a document field, constant or a parameter sent along with the query. More information can be had in the documentation about ONNX import.

Internally, Vespa converts the ONNX operations to Vespa’s tensor API. We do the same for TensorFlow import. So the cost of evaluating ONNX and TensorFlow models are the same. We have put a lot of effort in optimizing the evaluation of tensors, and evaluating neural network models can be quite efficient.

ONNX support is also quite new to Vespa, so we do not support all current ONNX operations. Part of the reason we don’t support all operations yet is that some are potentially too expensive to evaluate per document, such as convolutional neural networks and recurrent networks (LSTMs etc). ONNX also contains an extension, ONNX-ML, which contains additional operations for non-neural network cases. Support for this extension will come later at some point. We are continually working to add functionality, so please reach out to us if there is something you would like to have added.

Going forward we are continually working on improving performance as well as supporting more of the ONNX (and ONNX-ML) standard. You can read more about ranking with ONNX models in the Vespa documentation. We are excited to announce ONNX support. Let us know what you are building with it!

Introducing JSON queries | Vespa Blog

We recently introduced a new addition to the Search API – JSON queries. The search request can now be executed with a POST request, which includes the query-parameters within its payload. Along with this new query we also introduce a new parameter SELECT with the sub-parameters WHERE and GROUPING, which is equivalent to YQL.

The new query

With the Search APIs newest addition, it is now possible to send queries with HTTP POST. The query-parameters has been moved out of the URL and into a POST request body – therefore, no more URL-encoding. You also avoid getting all the queries in the log, which can be an advantage.

This is how a GET query looks like:

GET /search/?param1=value1¶m2=value2&...

The general form of the new POST query is:

POST /search/ { param1 : value1, param2 : value2, ... }

The dot-notation is gone, and the query-parameters are now nested under the same key instead.

Let’s take this query:

GET /search/?yql=select+%2A+from+sources+%2A+where+default+contains+%22bad%22%3B&ranking.queryCache=false&ranking.profile=vespaProfile&ranking.matchPhase.ascending=true&ranking.matchPhase.maxHits=15&ranking.matchPhase.diversity.minGroups=10&presentation.bolding=false&presentation.format=json&nocache=true

and write it in the new POST request-format, which will look like this:

POST /search/ { "yql": "select \* from sources \* where default contains \"bad\";", "ranking": { "queryCache": "false", "profile": "vespaProfile", "matchPhase": { "ascending": "true", "maxHits": 15, "diversity": { "minGroups": 10 } } }, "presentation": { "bolding": "false", "format": "json" }, "nocache": true }

With Vespa running (see Quick Start or
Blog Search Tutorial),
you can try building POST-queries with the new querybuilder GUI at http://localhost:8080/querybuilder/, which can help you build queries with e.g. autocompletion of YQL:

image

The Select-parameter

The SELECT-parameter is used with POST queries and is the JSON equivalent of YQL queries, so they can not be used together. The query-parameter will overwrite SELECT, and decide the query’s querytree.

Where

The SQL-like syntax is gone and the tree-syntax has been enhanced. If you’re used to the query-parameter syntax you’ll feel right at home with this new language. YQL is a regular language and is parsed into a query-tree when parsed in Vespa. You can now build that tree in the WHERE-parameter with JSON. Lets take a look at the yql: select * from sources * where default contains foo and rank(a contains "A", b contains "B");, which will create the following query-tree:

image

You can build the tree above with the WHERE-parameter, like this:

{
    "and" : [
        { "contains" : ["default", "foo"] },
        { "rank" : [
            { "contains" : ["a", "A"] },
            { "contains" : ["b", "B"] }
        ]}
    ]
}

Which is equivalent with the YQL.

Grouping

The grouping can now be written in JSON, and can now be written with structure, instead of on the same line. Instead of parantheses, we now use curly brackets to symbolise the tree-structure between the different grouping/aggregation-functions, and colons to assign function-arguments.

A grouping, that will group first by year and then by month, can be written as such:

| all(group(time.year(a)) each(output(count())
         all(group(time.monthofyear(a)) each(output(count())))

and equivalentenly with the new GROUPING-parameter:

"grouping" : [
    {
        "all" : {
            "group" : "time.year(a)",
            "each" : { "output" : "count()" },
            "all" : {
                "group" : "time.monthofyear(a)",
                "each" : { "output" : "count()" },
            }
        }
    }
]

Wrapping it up

In this post we have provided a gentle introduction to the new Vepsa POST query feature, and the SELECT-parameter. You can read more about writing POST queries in the Vespa documentation. More examples of the POST query can be found in the Vespa tutorials.

Please share experiences. Happy searching!

Vespa 7 is released! | Vespa Blog

This week we rolled the major version of Vespa over from 6 to 7.

The releases we make public already run a large number of high traffic production applications on our Vespa cloud,
and the 7 versions are no exception.

There are no new features on version 7 since we release all new features incrementally on minors.
Instead, the major version change is used to mark the point where we remove legacy features marked as deprecated
and change some default settings.
We only do this on major version changes, as Vespa uses semantic versioning.

Before upgrading,
go through the list of changes in the release notes
to make sure your application and usage is ready.
Upgrading can be done by following the regular
live upgrade procedure.

Vespa use case: shopping | Vespa Blog

Imagine you are tasked with creating a shopping website. How would you proceed? What tools and technologies would you choose? You need a technology that allows you to create data-driven navigational views as well as search and recommend products. It should be really fast, and able to scale easily as your site grows, both in number of visitors and products. And because good search relevance and product recommendation drives sales, it should be possible to use advanced features such as machine-learned ranking to implement such features.

Vespa – the open source big data serving engine – allows you to implement all these use cases in a single backend. As it  is a general engine for low latency computation it can be hard to know where to start. To help with that, we have provided a detailed shopping use case with a sample application.

This sample application contains a fully-functional shopping-like front-end with reasonably advanced functionality right out of the box, including sample data. While this is an example of a searchable product catalog, with customization it could be used for other application types as well, such as video and social sites.

The features highlighted in this use case are:

  • Grouping – used for instance in search to aggregate the results of the query into categories, brands, item ratings and price ranges.
  • Partial update – used in liking product reviews.
  • Custom document processors – used to intercept the feeding of product reviews to update the product itself.
  • Custom handlers and configuration – used to power the front-end of the site.
image

The goal with this is to start a new series of example applications that each showcase different features of Vespa, and show them in context of practical applications. The use cases can be used as starting points for new applications, as they contain fully-functional Vespa application packages, including sample data for getting started quickly.

image

The use cases come in addition to the quick start guide, which gives a very basic introduction to get up and running with Vespa, and the tutorial which is much more in-depth. With the use case series we want to fill the gap between these two with something closer to the practical problems user want to solve with Vespa.

Take a look for yourself. More information can be found at https://docs.vespa.ai/en/use-case-shopping.html