Q&A from “The Great Search Engine Debate – Elasticsearch, Solr or Vespa?” Meetup

On January 28th, 2021, at 17:00 CET,
Charlie Hull from OpenSource Connections hosted

The Great Search Engine Debate – Elasticsearch, Solr or Vespa? –
a meetup on Haystack LIVE!,
with Anshum Gupta, VP of Apache Lucene, Josh Devins from Elastic and Jo Kristian Bergum from Vespa.

So many great questions were asked that there was no time to go through them all.
This blog post addresses the Vespa-related questions,
with quicklinks into the recording for easy access.
We have also extracted the unanswered questions from the chat log, linking to Vespa resources.
Please let us know if this is useful.
Feel free to follow up with the Vespa Team using the resources at
https://vespa.ai/support,
Gitter live chat.
You will also find us in the #vespa channel of Relevance Slack.
You can also find Charlie’s summary post at
Solr vs Elasticsearch vs Vespa – what did we learn at The Great Search Engine Debate?.


All three speakers were asked to do a pitch and closing words.
Three things that make you recommend your technology –
see the Vespa pitch and Vespa top three –
summary:

  1. Vespa has a great toolbox for modern retrieval, state-of-the-art retrieval/ranking with Machine Learning
  2. Vespa’s indexing architecture allows true partial updates at scale, with high indexing volume – when combined with #1, one can have realtime updated models to make decisions in real time, on updated information
  3. Vespa’s scalability and true elastic content cluster. You don’t have to pre-determine the number of shards. Can go from 1 node to 100 nodes, just add nodes.

Resources: ranking,
reads and writes,
elastic Vespa


Use case differentiator, I am curious if the participants could walk through:
let’s say I have an index with text for search, but also a couple dozen features I intend to use in LTR.
I want to update two of the dozen features across several billion documents because I changed my feature extraction.
How does the engine deal with this?

[ quicklink ].
Common and widely used Vespa use case.
True partial updates of attribute fields which are in-memory, update and evaluate in place –
no need to read the entire document and apply the update and write it to a new index segment
like in Solr/Elasticsearch which builds on Lucene.
Vespa can do 50,000 numeric partial updates per second per node.
Ranking will immediately see the update and use value in computations (search, rank, sorting, faceting).

Resources: ranking,
reads and writes,
elastic Vespa


Much of the popularity around ES and Solr arises from the fact that they are very “approachable” technologies.
It’s simple for a beginner to get started indexing and searching documents at a basic level,
and most importantly, understanding and influencing which documents are returned.
My impression is that the technical entry level for Vespa is much more advanced.
Would you agree or disagree? How would you recommend starting out with Vespa?

[ quicklink ].
Learned a lot from Elasticsearch on developer friendliness,
maybe at 80% ease of use. With Vespa, it’s easy to go from laptop to full cloud deployment.
Use Docker to run Vespa on your laptop.
Use Vespa application package to go from laptop to full size – it is the same config.

Resources: application packages,
getting started,
cloud.vespa.ai


I have a question regarding Vespa: How is the support for non-English languages regarding tokenizers, stemmers, etc.?
I’m especially interested in German, Russian, Polish, Czech and Hungarian.
How big would be the effort to adapt Solr / OpenNLP resources to use them with Vespa?

[ quicklink ].
Vespa integrates with Apache OpenNLP,
so any language supported by it, Vespa supports it.
It’s easy to integrate with new linguistic libraries and we’ve already received CJK contributions to Vespa.

Resources: linguistics


Which search engine is best for a write-heavy application?
Based on my experience, Elasticsearch read performance is impacted when there are heavy writes.

[ quicklink ].
Vespa moved away from indexing architecture similar to Elasticsearch and Solr,
where it used small immutable index segments that were later merged.
Vespa has a mutable in-memory index in front of immutable index segments.
All IO writes are sequential. No shards. Attributes fields are searchable, in-place updateable.
Efficient use of OS buffer cache for random reads from search.
Real-time indexing with Solr and Elasticsearch creates many immutable segments
which all need to be searched (single threaded execution as well),
so latency is definitively impacted more than with Vespa which has a memory index + larger immutable index.

Resources: reads and writes,
attributes,
proton


“Join” is always a problem with SOLR/Elasticsearch. How does Vespa handle it?

[ quicklink ].
Supported scalable join is implemented using parent/child relationship.
The parent is a global document – distributed across all nodes in the cluster.
Child documents access attribute in-memory fields imported from parent documents.
Can also use the stateless container, deploy a custom Java searcher, do joins on top of multiple searches.

Resources: parent-child,
Vespa overview,
attributes


Can people talk a bit more about kubernetes integrations?

[ quicklink ].
Yes, one can run Vespa on K8s.

Resources: vespa-quick-start-kubernetes


How does Vespa compare to FAISS?

[ quicklink ].
FAISS uses HSNW like Vespa.
FAISS can only nearest neighbor search returning the ID of the vector, very fast.
In Vespa, combine with query filters,
not like the Open Distro for Elasticsearch k-NN plugin
that does post-processing step after retrieving the nearest neighbors.
With a restrictive filter, like last day, might end up with zero documents.
Vespa combines ANN search and filters.

Vespa has hybrid evaluation;
Term-at-a-time (TAAT) which is much more cache friendly, and document-at-a-time (DAAT).
Can evaluate part of the query tree using TAAT,
then search in the HNSW graph using the documents eligible as an input filter.
Including a filter makes ANN a bit slower, but the value it adds makes it worth it.

FAISS is faster as it does not have an HTTP api and distribution layer with realtime updates –
FAISS is a library, batch oriented.

Resources: using-approximate-nearest-neighbor-search-in-real-world-applications,
approximate nearest neighbor, hnsw,
feature tuning


Since Vespa has a different approach, is there anything Vespa is learning from Elastic/Solr/Lucene?
Also the other way around, Elastic/Solr learning from Vespa?

[ quicklink].
Both are great engines! Vespa’s toolbox is bigger.
Learned how Elasticsearch became popular:
developer friendliness, nice APIs, great support for analytics, great for handling immutable data.
Lucene has had a large developer crowd for 20 years.


If I remember correctly, FAISS or similar libraries support indexing/searching with the power of GPU,
how does this compare to Vespa/Elastic/Solr?

[ quicklink ].
Vespa is CPU only, but looking at GPU as pretrained language models grow larger.
GPU easier to use in indexing than serving.
We are trying to find models that run efficiently on GPU. Vespa is written in C++,
making use of OpenBLAS and special instructions to get the most out of CPUs.

Resources: github.com/vespa-engine/vespa/issues/14406


Given large language model dominance, in 5 years, how much do we need to support manual relevance tuning operations?
Should that be our focus? Or will search engines just be initial retrieval before sending docs to eg. BERT?

[ quicklink ].
BERT and pretrained language models helps machines understand text better than before,
dramatic progress on ranking, roughly 2x BM25 on multiple Information retrieval datasets.
However more than just text matching and ranking, like click models and site popularity.
In Vespa, ranking with BERT locally on the content nodes,
can combine scoring from language model into LTR framework, taking other signals into account.
There are ways to use BERT that could lead to close to random ranking,
e.g. using BERT as a representation model without fine-tuning for the retrieval task
where there are many many negative (irrelevant) documents.

However, good zero-shot transfer capabilities for interaction based models
has demonstrated strong ranking accuracy on other data sets.
See Pretrained Transformers for Text Ranking: BERT and Beyond.

Resources: from-research-to-production-scaling-a-state-of-the-art-machine-learning-system


Can you speak about the history of Vespa? All top contributors work at Verizon/Yahoo.
Are you aware of prominent Vespa users beside Verizon? Who’s behind Vespa Cloud?
Is there a (larger) ecommerce shop using Vespa in production already?

[ quicklink ].
cloud.vespa.ai is run by Verizon Media.
In Verizon Media, Vespa is used for search and recommendation (including APAC e-commerce) + Gemini ad serving stack.
Vespa’s background is from Fast Search and Transfer, founded in 1997 from NTNU in Trondheim, Norway.

Resources: vespa.ai


What are your plans for growing your communities?
Where should we go to ask questions and have discussion?

[ quicklink ].
#vespa on Stack Overflow,
Gitter channel,
#vespa channel of Relevance Slack.
Asking the extended Vespa team to document use cases / blog posts.

Resources: docs.vespa.ai,
vespa.ai/support


What type of node? Helps me understand 50k/node number

Single value update assign of an int field on a c5d.2xlarge, 8 v-cpu, 16GB, 200G SSD. 49K updates/s.


How does vespa handle search query contain both dense vector + scalar fields?
I.e. internally, it first retrieves top-k doc and then to the filters?

See the How does Vespa compare to FAISS? question above –
filter first, maybe using TAAT for speed, then top-k.
This to ensure low latency and non-empty result sets.


Which engine supports the usage of KNN clustering together with vector similarity queries?

Vespa supports approximate nearest neighbor search using HNSW,
but can also be combined with pre-computed KNN clustering
where vectors have been assigned a cluster id at indexing time.
Using the Vespa ranking framework,
one can combine (approximate) nearest neighbor queries with any other computations.
Using tensors and operations on these, custom algorithms can be built.

Resources: tensor user guide,
approximate nearest neighbor HNSW,
ranking


Which engine would you use for real-time systems with emphasis on queries latency?

The Vespa Team has helped implementation of numerous applications
with millisecond latency requirements and update rates in thousands per second per node in Verizon Media.
When the feed operation is ack’ed, the operation is visible.
There is no index refresh delay or immutable batch indexing
as in engines like Solr or Elasticsearch using the batch oriented Lucene library.
Vespa also allows using multiple searcher threads per query to scale latency versus throughput,
functionality which is not exposed in Solr or Elasticsearch.


Does Vespa support IBM ICU libraries? (language processing question as well)

Yes, used in sorting.


For what kind of problem would you recommend Elastic or Solr for (over Vespa)?

See the question above for anything Vespa is learning from Elastic/Solr/Lucene?

Resources: vespa-elastic-solr


Can any of the search engine beat Redis when it comes to read performance? Do we have any benchmarking?

The Vespa Team has not compared Vespa with Redis, as they are built for different use cases.
Vespa is built for Big Data Serving with distributed computations over large, mutable data sets.
Use Redis for in-memory database, cache, and message broker.


All 3 search engines rely on OS file caching for read optimizations.
How does that work in kubernetes when multiple processes/pods/containers are racing against each other for that?

The Vespa Team has not tested specifically for K8s and we would love to learn from the community when tested!
We run multiple Docker multi-process containers on bare-metal and AWS hosts, memory is isolated, but the IO is shared.
We hence monitor IO, but not more than that.


I’d love to read more about the TAAT/DAAT mix and ANN, I don’t follow that yet.
Any chance you can drop a code link or doc link?

See feature-tuning.
We will see if we can publish a paper or article on this subject.


With regard to GPU vs CPU this is also asking “How do you execute code on multi-arch cluster?”.
If you’re on K8s, you may just be calling out across the nodes.
Something like the nboost proxy is an interesting example

Moving computation to where the data lives is the mantra for both Vespa and the Map Reduce paradigm (Hadoop).
This allows scaling latency and throughput without moving data across the wire.
Vespa integrates with many machine learning techniques and allows,
e.g. using the pre-trained language model relevancy score in combination with other core ranking features

How I learned Vespa by thinking in Solr

Photo by Albert on Unsplash

Vespa is a modern search platform that offers structured search (with a SQL-like language), inverted index-based text search, and approximate nearest neighbors (ANN) based vector search. I have been mainly interested in Vespa for its vector search capabilities. However, the last couple of times I had considered Vespa for my application, I had been put off by what seemed like a pretty steep learning curve compared with Solr and Elasticsearch, two other platforms I had successfully used in the past. I realize now that the steepness is not an illusion, but it is justified, because Vespa offers many capabilities and customization opportunities. Unfortunately, that doesn’t help when you are just looking to get it integrated into your application without having to spend too much time figuring it out.

I had a little downtime between projects recently, so I decided to finally bite the bullet and spend time learning to use Vespa. I had in mind a tiny application, a minimal viable product (MVP) if you will, that I wanted to implement using Vespa. The application covered the features I most cared about, and would provide me a template for when I would use it in a “real” project.

Going through the steps outlined in the Vespa Quick Start page,
I realized that there were some loose correspondences with Solr. I believe that thinking of Vespa concepts and operations in terms of Solr analogies helped me get my MVP up and running quickly, although it might equally have been due to the awesome support from folks on the #vespa channel on the Relevance Slack workspace. In any case, I share this insight here in the hope that it might be useful for other people who know Solr and are looking to learn Vespa. Obviously, such an approach glosses over many Vespa features, but from my experience, it is easier to learn these additional features incrementally once you have something running.

At a high level, getting a Solr platform integrated into your application involves the following steps, although not necessarily in the order provided. For example, you may want to destroy and recreate the core multiple times when designing and building your index. Similarly, you would stop and restart the server multiple times during normal operation.

  1. Installing Solr
  2. Starting the server
  3. Checking server status
  4. Creating a core
  5. Configuring the core
  6. Deploying the Configuration
  7. Populating the index
  8. Querying the index
  9. Stopping the server
  10. Deleting the core

So now let’s look at Vespa’s equivalents.

Installing Vespa

Vespa is packaged as a Docker image, so the only prerequisite software needed is docker. It is also available as a CentOS RPM, but the docker image version is recommended. Minimum hardware requirement is 6 GB RAM and about 50 GB disk. I used a t2.xlarge Amazon EC2 instance (4 vCPU, 16 GB RAM) with 100 GB disk, running Ubuntu 18.04. This was adequate for my experiments (described later in the post).

To install docker, I used the docker installation instructions for Ubuntu. Instructions for docker on other Linux distributions, as well as MacOS, are also available on the docker site.

To install Vespa, you need to clone the vespa-engine/sample-apps repository. The repository contains the installation instructions for docker to spin up a Vespa image, as well as many different types of sample applications that can be used as references to build your own.

$ git clone https://github.com/vespa-engine/vespa.git
$ docker run --detach --name vespa --hostname vespa-container \
       --volume sample-apps:/vespa-sample-apps \
       --publish 8080:8080 vespaengine/vespa

At this point, the Vespa image is installed and running in the docker container. The Vespa server provides a number of services that are accessible on port 8080 of the machine (on which docker is running).

Starting the Vespa Configuration Server

If you just installed Vespa using the commands above, the Vespa Configuration Server is already running in the docker container. However, if you have stopped the docker container (or the machine), then you can restart Vespa with the following command. This starts the Vespa configuration server. Once you deploy applications into Vespa (described below), this will also start up services for applications that were deployed already.

Checking server status

Vespa takes a while to start and respond to requests after the initial start, as well as after deploying an application. This is probably because it has to populate in-memory data structures that it needs to support the different services it offers. The following command hits an internal Vespa status page that returns HTTP status code 200 (OK) when Vespa is ready to handle requests.

$ docker exec vespa bash -c \
    "curl -s --head http://localhost:19071/ApplicationStatus"

Create a Vespa application

A Solr core encapsulates a single physical index, with its own configuration. The Vespa analog for a Solr core is an application. Unlike Solr cores, a Vespa application can consist of multiple physical indexes, but like a Solr core, it is governed by a single set of configuration files.

To create a Vespa application, the recommended approach is to find a sample application from the sample-apps repository (downloaded in the installation step) that is closest to your needs, and clone it as a sibling of the other applications in sample-apps. Choose a descriptive name for your project since you will be using it in subsequent requests to the Vespa server, similar to how you use the core name in Solr.

The application referenced in the Vespa Quick Start is the album-recommendation-selfhosted, so I cloned that into my application. I titled it, somewhat unimaginatively, as vespa-poc.

Configure the Vespa application

The configuration files are everything under the directory tree src/main/application in your cloned project. Unlike Solr, where about the only things you need to customize for your configuration are the managed-schema and solrconfig.xml, in case of Vespa, you have an entire directory structure of configuration files!

For those with a Java background, the directory structure is vaguely reminiscent of a Maven project. Like Maven, which relies on convention over configuration, the directory structure of the Vespa configuration serves as clues to the purpose of each of the different configuration files. Here is the directory structure for my application configuration.

src
└── main
    └── application
        ├── hosts.xml
        ├── schemas
        │   └── doc.sd
        ├── search
        │   └── query-profiles
        │       ├── default.xml
        │       └── types
        │           └── root.xml
        └── services.xml

The configuration files are as follows.

  • hosts.xml – defines the current hostname (node1). I did not make any change to this.
  • schemas/*.sd – schema files, similar to Solr’s managed-schema. Vespa requires each document type in the index to be defined using a YAML-like language in its own schema (.sd) file. In my case I have a single document type called “doc”, which contains one ID field, two text fields, and one vector field. Similar to Solr’s field properties (indexable, storable, etc), each field has attribute, summary, and index properties. In addition, the .sd file defines some rank profiles that can be referenced from queries. Here is what my doc.sd file looks like.
  • services.xml – maps the different document types in this application. The only change I made is to define a document type “doc” (as defined above).
  • search/query-profiles/* – the default.xml and types/root.xml define a variable that is used in a nearest neighbor query, which I will talk about in more detail later.

Deploy the Vespa application

Once the configuration files are created, the application can be deployed using the following command. If there are problems in the configurations, the command will report the problem and fail. It is safe to rerun the command after fixing the configuration error.

$ docker exec vespa bash -c "/opt/vespa/bin/vespa-deploy prepare \     
       /vespa-sample-apps/vespa-poc/src/main/application/ && \
       /opt/vespa/bin/vespa-deploy activate"

Vespa will take a little time to respond to the first requests following this command, similar to the lag we saw after starting Vespa. We can check for Vespa readiness similarly by querying the ApplicationStatus service as described in the “Checking server status” section.

Populating the Vespa Indexes

Vespa offers a document endpoint, similar to the Solr update endpoint, to populate its indexes with documents. They must be of the document types declared in the configuration step above. The endpoint URL contains the application name and the document type name(s) to enable Vespa to route an incoming document to the correct index. An endpoint URL to insert a document of document type “doc” and document id 1 for the “vespa-poc” application is shown below. There is more information on the Vespa Documents page.

http://localhost:8080/document/v1/vespa-poc/doc/docid/1

The request payload is a flat dictionary of field name-value pairs, converted to JSON format, and sent to Vespa using HTTP PUT (or POST). The docid can be any value that uniquely identifies the document, in my example I have used a (synthetic) field that is just a monotonically increasing sequence number.

Data for my MVP experiment came from the CORD-19 dataset,
a collection of scientific papers around COVID-19, provided by Allen AI.
Note that my experiment is completely different from the CORD-19 Search application,
which incidentally is a great example of what you can do with datasets such as CORD-19 and Vespa.

The version of CORD-19 I used contained around 300,000 papers and their associated SPECTER document vectors. For my index I used the paper ID, title, abstract, and the SPECTER vector. An example of the request payload is shown below, and here is the Python code to parse this information out of CORD-19, compose the payload and issue HTTP POST queries to the Vespa document endpoint.


{
    "cord_uid": "xhyu5r5x",
    "doc_title": "High red blood cell composition in clots is associated with successful recanalization during intra-arterial thrombectomy.",
    "doc_abstract": "We evaluated the composition of individual clots retrieved during intra-arterial thrombectomy in relation to recanalization success, stroke subtype, and the presence of clot signs on initial brain images …",
    "specter_embedding": {
        "values": [ 
            0.35455647110939026, 
            -5.337108612060547, 
            2.201319932937622, 
            … 
        ]
    }
}

Vespa also offers its HTTP Client,
which is a faster and more robust alternative for batch inserts. To use this, the records to be inserted need to be JSON serialized into a flat file, and the path to the flat file passed to the HTTP Client.

Querying the Index

The Vespa query language (YQL) is very similar to the SQL query language, so there will not be much of a learning curve for most people reading this blog. The query endpoint is available at:

http://localhost:8080/search/

It accepts both HTTP GET and POST requests. The YQL specifies the fields to return, the document types to look up, and the query condition. Additional information such as the rank profile (i.e. what scoring function to use for which field), the number of results to return, offset, etc., are provided as additional fields. These fields are specified as HTTP request parameters for GET requests or sent as a JSON-formatted dictionary of name-value pairs in the request body for POST requests. The Query API page contains more information about the available parameters.

Like configuration and population, queries are application dependent. In my case, my objectives were to set up:

  1. Text search – return results based on a text search on the title field.
  2. Vector search – return results based on vector search on a document’s SPECTER embedding, similar to a Solr More Like This (MLT) search.

The first objective is super-simple and can be achieved quite easily using a simple YQL query as shown in this code.

The second one is slightly more complicated – I use the GET endpoint to retrieve the SPECTER embedding from a given document by document ID, then use the nearestNeighbor function to retrieve its 10 nearest neighbors in the vector space, as shown in this code.

One can think of the nearestNeighbor call as roughly analogous to a Solr function query or a SQL user-defined function (UDF). It takes two parameters, the first of which is a field defined within the document type, and the second is a variable representing the query vector. The query vector needs to be defined in the query-profiles/types/root.xml file.

Obviously, this just barely scratches the surface of the various query types that are