Internship at Vespa | Vespa Blog

Mathias Chunnoo

Mathias Chunnoo

Intern I, Summer of 2021

Ole-Magnus Vian Norum

Ole-Magnus Vian Norum

Intern II, Summer of 2021


Photo by Charles Deluvio on Unsplash

Over the course of the summer we the interns have gotten to explore the Vespa
engine and the workings of the company. At the start of our internship we got
an introduction to the company and the tools that the they used. To get
familiar with the vespa engine we went through the getting started tutorial,
where we made a news recommendation system.

During our internship we got to work on many different things, but
our two main projects were to use Vespa to implement two sample
applications for searching through the Vespa documentation. These two sample
apps were named
search-as-you-type
and
search-suggestions.

Search-as-you-type

The search-as-you-type application aims to implement an interface where results
are displayed live while the user is typing in the search bar. This requires
the search to both generate hits on incomplete words and to retrieve these hits
as close to instantly as possible. A substring search would fit the need for
incomplete word searches, however for large corpora, this would not meet the
performance requirement. Our solution instead uses n-grams (groups of n
characters) to simulate a substring-like search. The idea is to search for
n-grams and rank the hits where the n-grams are bunched up together higher than
the hits where the n-grams are spread throughout the document. After trying
various configurations we found that 3-grams in combination with Vespas
nativeRank fit our needs
very well. In addition we combined this with index search such that if the
search string consists of complete words the indexed search hits would rank
higher than the n-gram search hits.

schema doc {
    field gram_content type string {
        indexing: input content | index | summary
        match {
            gram
            gram-size: 3
        }
        summary: dynamic
    }
...
    document doc {
...
        field content type string {
            indexing: index | summary
            summary: dynamic
            stemming: best
        }
...
    rank-profile weighted_doc_rank inherits default {
        rank-properties {
            $contentWeight: 10.0
            $gramContentWeight: 1.0
        }
        first-phase {
            expression {
                query(contentWeight) * nativeRank(content)
                + query(gramContentWeight) * nativeRank(gram_content)
            }
        }
    }
}

After the Vespa application was in place, we needed an actual search bar for
the search as you type to take place. This was implemented by incorporating a
simple static web page java server into the Vespa application and writing some
javascript to query the Vespa application every time a character was entered
into the search bar. In addition, a debounce-function was used to avoid race
condition due to simultaneous query requests.

Search-suggestion

The idea behind the search-suggestion application is to suggest possible search
terms to the user before they have typed out their whole query. In our
implementation these suggestions comes either from the document texts the users are
searching through or from previous searches performed by other users.

The first iteration of the search-suggestion application fed new search terms
by “put”, which thus resulted in storing multiples of the same terms. To
calculate relevance and single out terms, the generated hits where grouped and
counted. This was not a scalable solution as more data for the same search
terms would result in a linear increase in both storage space and process time
for each query. To solve this we switched to feeding by “update”. In other
words, adding unseen terms and incrementing their “query_count” variable when
processing previously seen terms.

[
  {
    "update": "id:term:term::example",
    "create": true,
    "fields": {
      "term": {
        "assign": "example"
      },
      "corpus_count": {
        "assign": 181
      },
      "document_count": {
        "assign": 40
      }
    }
  },
  {
    "update": "id:term:term::example",
    "create": true,
    "fields": {
      "term": { "assign": "example" },
      "query_count": { "increment": 1 }
    }
  }
]

Since we were going to use queries written by users as suggestions, we had to
implement some form of moderation as to what could be suggested. To solve this
problem we made a list of allowed terms and used a document processor to filter
out any documents that contained terms not in the list. We chose to generate
the allowed-list by listing every word used in the document text. This made it
so that all relevant terms could be suggested, and things that could be seen as
offensive or otherwise irrelevant would not come up as suggestions, as they was
not contained in the document text and thus would be blocked by the document
processor and not fed.

For the first iteration of the search-suggestion, the application used Vespas
streaming search with prefix matching to search for documents with matching
prefixes. After a presentation of the application and some discussion it was
believed that streaming search would not be scalable as the number of
concurrent users increased. To test this belief we did a benchmark of the
application using fbench (benchmark results).
As suspected the performance of streaming search drastically decreased as
the number of concurrent users increased. We decided to change the application
to use index prefix search, and after a comparison benchmark test it was
confirmed that this implementation scaled much better than streaming search.

Like we did with the search-as-you-type application, we incorporated a static
web page java server into the Vespa application and wrote some javascript for
querying suggestions on every input and showing these suggestions in a dropdown
under the search bar.

We also took the two sample applications we made and integrated them in to an
already existing sample application
vespa-documentation-search
which is deployed on to Vespa Cloud. As of now this deployment is also used for
search suggestion on the Vespa documentation sites in the search bar.

AWS Lambda

One of the goals of the search-suggestion application was to favor searches
that where previously searched for. To accomplish this we decided to create a AWS
Lambda function which would read query logs and feed search term from these
back into the Vespa application. The reason for this was that the query logs
where stored in AWS S3 buckets and that this would make it possible to
continuously trigger the Lambda function and process future query logs. The
biggest problem we faced when writing this Lambda function was decompressing
the logs. Vespa stores its query logs compressed with zstd-compression and
finding a zstd-library usable in a AWS Lambda context was not straight forward.
Initially a lot of time was spent learning AWS SAM and deploying a docker image
to the Lambda function, as this would let us use native C++ libraries in the
Lambda. However, we later found a library which compiled to web assembly and
would let us decompress filed with just a Node.js Lambda function.

Other projects

While working on the main projects, we also got to work on other smaller side
projects. One of these side projects was to implement a visualization view of
proton’s memory usage. This view was
created using react and incorporated into the Vespa Console. We also got to
work with Vespa’s performance tests and moved some of the private performance
tests over to the public opensource repository by changing the document sets so
that they did not use private or sensitive data.

The experience at Vespa

At the start of the internship it felt a bit daunting to make our own sample
applications, given that the vespa engine was something completely new to us.
Even after the getting started tutorial many things were still unclear, but as we
started working on the sample applications, more and more things became clearer as
we had to dive in to the documentation and previously written code to be able to
make the applications from scratch.

Since there is so much you can do with Vespa it was hard at times to find the answers
to our questions in the documentation. Not necessarily because it did not exist, but
because we did not know what to search for to find the right documents. This led to us
some times using Vespas public github-repositories to find answers to our questions.
Even though we some times did not find the answers we were looking for, we never felt
lost as there were always someone ready to help us out when we got stuck.

During our internship at Vespa we got to learn a lot about the vespa search
engine and information retrieval in general. We have especially learned some
different methods of doing search with incomplete queries and query processing.
We have also gotten a feel for how it is to work in a software company through
daily stand-ups and presentations of our projects.

Working here have given us insight into the workflow and github etiquette of a
software company. The internship has given us experience with working in a
team of developers and how to collaborate through github effectively. We
have touched upon various technologies from writing user interfaces in React to
writing performance tests in Ruby. We have also gotten to work with and learn
about important platforms and services like Amazon Web Service and Docker, which
is commonly used in companies but not taught in schools.

Even after the internship there are still many things that we have not touched
upon or learned about, and we wish we could explore more. To go deeper in
to the lower levels of the code and the working of Vespa, and to learn more about
search from the experienced people working at Vespa. We grew attached to the
projects we worked on and wished we had more time to fine tune, and improve
them to get an even better search in the Vespa documentation.

We really enjoyed our stay here, with a nice staff who have an incredible
expertise regarding search and information retrieval, from whom we have learned a
lot. The experience at Vespa has been really pleasant and educational, and is some thing
that has and will benefit us in the future.

Summer internship at Vespa | Vespa Blog

Erlend Solbakken Nikolaisen

Intern I, Summer of 2022


Photo by Arnold Francisca on Unsplash

Through the summer as an intern at Vespa I got the opportunity to learn new technologies and experience how it is to work for a software company. At the start of my internship I got an introduction to the company and was told about the projects I was going to work on during the internship.

During the internship I worked on two projects. The first one was to recreate the Vespa Query Builder using React, and the second one was to create a solution for visualizing the traces made by the Vespa engine. Both of these projects have been implemented into a client.

Query Builder

The Query Builder is a tool for creating Vespa queries. The tool is a website that helps with creating the queries by allowing the user to select query-options from drop-down menus. The old website consisted of a HTML website with some old, hard to read, JavaScript code, and a backend handler written in Java. This made the website separate and hard to integrate with other tools.

My assignment was to recreate the Query Builder using React making it a complete JavaScript application. Before I started on the assignment I spent a day learning about React, and then I dove into the deep end and started on creating the application, learning more as I went. The old JavaScript code was difficult to read and did not merge well with React so many functionalities had to be recreated from scratch.

The finished application looks very much like the old one, but since it is created in React it is much simpler to implement it in other React applications. I did update the UI some by adding tooltips to buttons to make the application simpler to use.

Trace Visualizer

The Trace Visualizer is supposed to make it easier to identify bottlenecks in queries. The idea is to remove the need to comb through in search for where the problems could be. The solution consists of an application to input and transform the Vespa trace and the third-party tool Jaeger to visualize the transformed trace.

I started by looking at and comparing several existing solutions for visualizing traces and chose to use Jaeger because it was the simplest to use and was the best fit for the use-case. Because Jaeger did not support the traces created by Vespa the traces had to be transformed into a format that Jaeger could use. One of the formats Jaeger supports, and the one I used, is similar to OpenTelemtry´s trace definition with spans being the smallest unit of work (more information here: OpenTelemetry tracing.

The first iteration of the transformation tool could handle simple traces from Vespa and transform them into traces that could be imported into Jaeger. The hardest part was figuring out how to best traverse the Vespa trace to find the relevant information that Jaeger would need. The Vespa trace always seemed to have more special cases that needed to be handled differently just when I thought I had found them all. The Vespa trace could also be much more complicated and the first iteration could not handle them.

Vespa traceTransformed trace
{
  "trace": {
    "children": [
      ...
      {
        "timestamp": 4,
        "message": "Invoke searcher ..."
      },
      {
        "timestamp": 5,
        "children": [
          {
            "timestamp": 5,
            "message": Invoke searcher ..."
          },
          {
            "timestamp": 6,
            "message": "Return searcher ..."
          }
        ]
      },
      {
        "timestamp": 8,
        "message": "Retunr searcher ..."
      }
      ...
      {
        "start_time": "2022-07-28 13:49:47.816 UTC",
        "trace": [
          {
            "traces": [
              {
                "timestamp_ms": 0.051936,
                "event": "Start query setup"
              }
              ...
              {
                "timestamp_ms": 1.045379,
                "event": "Complete query setup"
              }
            ]
          }
        ]
      }
    ]
  }
}
{
  "data": [
    {
      "traceID": "db187cb870b90c0ad8cc235fed504c16",
      "spans": [
        {
          "traceID": "db187cb870b90c0ad8cc235fed504c16",
          "spanID": "8182dc73c8bd68ed",
          "operationName": "default",
          "references": [],
          "startTime": 1656923873159000,
          "duration": 2000,
          "tags": [],
          "logs": [],
          "processID": "p0"
        },
        {
          "traceID": "db187cb870b90c0ad8cc235fed504c16",
          "spanID": "52bc94897ad844b6",
          "operationName": "Invoke searcher ...",
          "references": [
            {
              "refType": "CHILD_OF",
              "traceID": "db187cb870b90c0ad8cc235fed504c16",
              "spanID": "8182dc73c8bd68ed"
            }
          ],
          "startTime": 1656923873159000,
          "duration": 1,
          "tags": [],
          "logs": [],
          "processID": "p1"
        },
        ...
        {
          "traceID": "db187cb870b90c0ad8cc235fed504c16",
          "spanID": "d94b2b388d92864d",
          "operationName": "Return searcher ...",
          "references": [
            {
              "refType": "CHILD_OF",
              "traceID": "db187cb870b90c0ad8cc235fed504c16",
              "spanID": "d671eeb306d4784b"
            }
          ],
          "startTime": 1656923873159000,
          "duration": 100,
          "tags": [],
          "logs": [],
          "processID": "p7"
        }
      ]
    }
  ]
}

To make the tool capable of handling the more complicated traces I first refactored much of the code to make it easier to use and then I created a recursive function to handle the more complex structure that the traces could have. I also implemented better naming of the spans in the transformed trace to make it easier so see what was happening in each span. By using regex on the description of the work the span is doing it is possible to find the process that work is being done on and use this as the name of the span.

Jaeger UI

There is some further work to be done with the naming of spans as a few can get names that do not reflect the work contained in the span. The timings and durations of spans are also a bit imprecise. This imprecision is small and does not have any impact on the use of the tool to find bottlenecks. The imprecision happens because the Vespa trace mostly uses milliseconds for timestamps with some parts using microseconds and Jaeger always using microseconds there can be some problems with the timings because of imprecision.

My experience at Vespa

At the start of my internship I was excited to find how it would be to work for a software company and get insight into the workflow. I felt that I was warmly welcomed and was well introduced to the work environment.

At the beginning of my internship it was a bit daunting to have to learn both a bit about how the Vespa engine worked and how to use React and JavaScript. It was all completely new to me and at the beginning felt a bit insurmountable, but I always had colleagues that seemed eager to help me with problems.

I really enjoyed my time working at Vespa with knowledgeable colleagues who could always help me when I was stuck and have taught me alot. My experience at Vespa has been very enjoyable and educational and has and will continue to benefit me in the future.

Summer Internship at Vespa | Vespa Blog

This summer, two young men have revolutionized the field of information retrieval! Or at least they tried… Read on for the tale of this year’s summer interns, and see the fruits of our labor in the embedder auto-training sample app.

Automatic Embedder Training with an LLM

Our main project this summer has been developing a system for automatically improving relevance for semantic search. Semantic search utilizes machine-learned text embedders trained on large amounts of annotated data to improve search relevance.

Embedders can be fine-tuned on a specific dataset to improve relevance further for the dataset in question. This requires annotated training data, which traditionally has been created by humans. However, this process is laborious and time-consuming – can it be automated?

Enter large language models! LLMs like ChatGPT have been trained on an enormous amount of data from a multitude of sources, and appear to understand a great deal about the world. Our hypothesis was that it would be possible to use an LLM to generate training data for an embedder.

Query generation

Diagram depicting the query generation pipeline

Training data for text embedders used for information retrieval consists of two parts: queries and query relevance judgments (qrels). Qrels indicate which documents are relevant for which queries, and are used for training and to rate retrieval performance during evaluation. Our LLM of choice, ChatGPT (3.5-turbo-4k), works by providing it with a system prompt and a list of messages containing instructions and data. We used the system prompt to inform ChatGPT of its purpose and provide it with rules informing how queries should be generated.

Generating queries requires a system prompt, example document-query pairs, and a document to generate queries for. Our system generates the system prompt, and optionally generates additional qrels, resulting in the three-step process illustrated by the diagram above.

In the beginning, we handcrafted system prompts while trying to get ChatGPT to generate queries similar to existing training data. After some trial and error, we found that we got better results if we specified rules describing what queries should look like. Later, we devised a way for ChatGPT to generate these rules itself, in an effort to automate the process.

Using the system prompt alone did not appear to yield great results, though. ChatGPT would often ignore the prompt and summarize the input documents instead of creating queries for them. To solve this, we used a technique called few-shot prompting. It works by essentially faking a conversation between the user and ChatGPT, showing the LLM how it’s supposed to answer. Using the aforementioned message list, we simply passed the LLM a couple of examples before showing it the document to generate queries for. This increased the quality of the output drastically at the cost of using more tokens.

After generating queries, we optionally generate additional qrels. This can be necessary for training if the generated queries are relevant for multiple documents in the dataset, because the training script assumes that all matched documents not in the qrels aren’t relevant. Generating qrels works by first querying Vespa with a query generated by ChatGPT, then showing the returned documents and the generated query to ChatGPT and asking it to judge whether or not each document is relevant.

Training and evaluation

We utilized SentenceTransformers for training, and we initialized from the E5 model. We started off by using scripts provided by SimLM, which got us up and running quickly, but eventually wanted more control of our training loop.

The training script requires a list of positive (matching) documents and a list of negative (non-matching) documents for each query. The list of positive documents is given by the generated qrels. We assemble a list of negative documents for each query by querying Vespa and marking each returned document not in the qrels as a negative.

After training we evaluated the model with trec_eval and the nDCG@10 metric. The resulting score was compared to previous trainings, and to a baseline evaluation of the model.

We encapsulated the entire training and evaluation procedure into a single Bash script that let us provide the generated queries and qrels as input, and get the evaluation of the trained model as output.

Results

The results we got were varied. We had the most successful training on the NFCorpus dataset, where we consistently got an evaluation higher than the baseline. Interestingly we initially got the highest evaluation when training on just 50 queries! We eventually figured out that this was caused by using the small version of the E5 model – using the base version of the model gave us the highest evaluation when training on 400 queries.

Training on other datasets was unfortunately unsuccessful. We tried training on both the FiQA and the NQ dataset, tweaking various parameters, but weren’t able to get an evaluation higher than their baselines.

Limitations and future work

The results we got for NFCorpus are a promising start, and previous research also shows this method to have promise. The next step is to figure out how to apply our system to datasets other than NFCorpus. There’s a wide variety of different options to try:

  • Tweaking various training parameters, e.g. number of epochs and learning rate
  • Different training methods, e.g. knowledge distillation
  • Determining query relevance with a fine-tuned cross-encoder instead of with ChatGPT-generated qrels
  • More data, both in terms of more documents and generating more queries
  • Using a different model than E5

We currently make some assumptions about the datasets we train on that don’t always hold. Firstly, we do few-shot prompting when generating queries by fetching examples from existing training data, but this system is perhaps most useful for datasets without that data. Secondly, we use the ir_datasets package to prepare and manage datasets, but ideally we’d want to fetch documents from e.g. Vespa itself.

Most of our training was done on the relatively small NFCorpus dataset because of the need to refeed all documents, after each training, to generate new embeddings. This becomes a big bottleneck on large datasets. Implementing frozen embeddings, which allows reusing document embeddings between trainings, would solve this problem.

Side quests

The easiest way to learn Vespa is to use it. Before starting on the main project, we spent some time trying out the various interactive tutorials. We also worked on various side projects which were related to the main project in some way.

Embedding service

We created a sample app to create embeddings from arbitrary text, using the various models in the Vespa model hub. This was a great way to learn about Vespa’s stateless Java components and how Vespa works in general.

Pyvespa

Pyvespa is a Python API that enables fast prototyping of Vespa applications. Pyvespa is very useful when working in Python, like we did for our machine learning experiments, but it does not support all of Vespa’s features. In addition, there were some issues with how Pyvespa handled certificates that prevented us from using Pyvespa in combination with an app deployed from the Vespa CLI.

We were encouraged to implement fixes for these problems ourselves. Our main changes were to enable Pyvespa to use existing certificates generated with the Vespa CLI, as well as adding a function to deploy an application from disk to Vespa Cloud via Pyvespa, allowing us to use all the features of Vespa from Python (this feature already existed for deploying to Docker, but not for deploying to Vespa Cloud). This was very satisfying, as well as a great learning experience.

Our experience at Vespa

We’ve learned a lot during our summer at Vespa, especially about information retrieval and working with LLMs. We’ve also learned a lot about programming and gotten great insight into the workings of a professional software company.

Contributing to an open-source project, especially such a large one as Vespa, has been very exciting. Vespa is powerful, which is awesome, but as new users, there was quite a lot to take in. The project is well documented, however, and includes a great number of sample apps and example use cases, meaning we were usually able to find out how to solve problems on our own. Whenever we got really stuck, there was always someone to ask and talk to. A big shout out to all of our colleagues, and a special thanks to Kristian Aune and Lester Solbakken for their support and daily follow-up during our internship.

Working at Vespa has been a great experience, and we’ve really enjoyed our time here.