Q&A from “The Great Search Engine Debate – Elasticsearch, Solr or Vespa?” Meetup
On January 28th, 2021, at 17:00 CET,
Charlie Hull from OpenSource Connections hosted
The Great Search Engine Debate – Elasticsearch, Solr or Vespa? –
a meetup on Haystack LIVE!,
with Anshum Gupta, VP of Apache Lucene, Josh Devins from Elastic and Jo Kristian Bergum from Vespa.
So many great questions were asked that there was no time to go through them all.
This blog post addresses the Vespa-related questions,
with quicklinks into the recording for easy access.
We have also extracted the unanswered questions from the chat log, linking to Vespa resources.
Please let us know if this is useful.
Feel free to follow up with the Vespa Team using the resources at
https://vespa.ai/support,
Gitter live chat.
You will also find us in the #vespa channel of Relevance Slack.
You can also find Charlie’s summary post at
Solr vs Elasticsearch vs Vespa – what did we learn at The Great Search Engine Debate?.
All three speakers were asked to do a pitch and closing words.
Three things that make you recommend your technology –
see the Vespa pitch and Vespa top three –
summary:
- Vespa has a great toolbox for modern retrieval, state-of-the-art retrieval/ranking with Machine Learning
- Vespa’s indexing architecture allows true partial updates at scale, with high indexing volume – when combined with #1, one can have realtime updated models to make decisions in real time, on updated information
- Vespa’s scalability and true elastic content cluster. You don’t have to pre-determine the number of shards. Can go from 1 node to 100 nodes, just add nodes.
Resources: ranking,
reads and writes,
elastic Vespa
Use case differentiator, I am curious if the participants could walk through:
let’s say I have an index with text for search, but also a couple dozen features I intend to use in LTR.
I want to update two of the dozen features across several billion documents because I changed my feature extraction.
How does the engine deal with this?
[ quicklink ].
Common and widely used Vespa use case.
True partial updates of attribute fields which are in-memory, update and evaluate in place –
no need to read the entire document and apply the update and write it to a new index segment
like in Solr/Elasticsearch which builds on Lucene.
Vespa can do 50,000 numeric partial updates per second per node.
Ranking will immediately see the update and use value in computations (search, rank, sorting, faceting).
Resources: ranking,
reads and writes,
elastic Vespa
Much of the popularity around ES and Solr arises from the fact that they are very “approachable” technologies.
It’s simple for a beginner to get started indexing and searching documents at a basic level,
and most importantly, understanding and influencing which documents are returned.
My impression is that the technical entry level for Vespa is much more advanced.
Would you agree or disagree? How would you recommend starting out with Vespa?
[ quicklink ].
Learned a lot from Elasticsearch on developer friendliness,
maybe at 80% ease of use. With Vespa, it’s easy to go from laptop to full cloud deployment.
Use Docker to run Vespa on your laptop.
Use Vespa application package to go from laptop to full size – it is the same config.
Resources: application packages,
getting started,
cloud.vespa.ai
I have a question regarding Vespa: How is the support for non-English languages regarding tokenizers, stemmers, etc.?
I’m especially interested in German, Russian, Polish, Czech and Hungarian.
How big would be the effort to adapt Solr / OpenNLP resources to use them with Vespa?
[ quicklink ].
Vespa integrates with Apache OpenNLP,
so any language supported by it, Vespa supports it.
It’s easy to integrate with new linguistic libraries and we’ve already received CJK contributions to Vespa.
Resources: linguistics
Which search engine is best for a write-heavy application?
Based on my experience, Elasticsearch read performance is impacted when there are heavy writes.
[ quicklink ].
Vespa moved away from indexing architecture similar to Elasticsearch and Solr,
where it used small immutable index segments that were later merged.
Vespa has a mutable in-memory index in front of immutable index segments.
All IO writes are sequential. No shards. Attributes fields are searchable, in-place updateable.
Efficient use of OS buffer cache for random reads from search.
Real-time indexing with Solr and Elasticsearch creates many immutable segments
which all need to be searched (single threaded execution as well),
so latency is definitively impacted more than with Vespa which has a memory index + larger immutable index.
Resources: reads and writes,
attributes,
proton
“Join” is always a problem with SOLR/Elasticsearch. How does Vespa handle it?
[ quicklink ].
Supported scalable join is implemented using parent/child relationship.
The parent is a global document – distributed across all nodes in the cluster.
Child documents access attribute in-memory fields imported from parent documents.
Can also use the stateless container, deploy a custom Java searcher, do joins on top of multiple searches.
Resources: parent-child,
Vespa overview,
attributes
Can people talk a bit more about kubernetes integrations?
[ quicklink ].
Yes, one can run Vespa on K8s.
Resources: vespa-quick-start-kubernetes
How does Vespa compare to FAISS?
[ quicklink ].
FAISS uses HSNW like Vespa.
FAISS can only nearest neighbor search returning the ID of the vector, very fast.
In Vespa, combine with query filters,
not like the Open Distro for Elasticsearch k-NN plugin
that does post-processing step after retrieving the nearest neighbors.
With a restrictive filter, like last day, might end up with zero documents.
Vespa combines ANN search and filters.
Vespa has hybrid evaluation;
Term-at-a-time (TAAT) which is much more cache friendly, and document-at-a-time (DAAT).
Can evaluate part of the query tree using TAAT,
then search in the HNSW graph using the documents eligible as an input filter.
Including a filter makes ANN a bit slower, but the value it adds makes it worth it.
FAISS is faster as it does not have an HTTP api and distribution layer with realtime updates –
FAISS is a library, batch oriented.
Resources: using-approximate-nearest-neighbor-search-in-real-world-applications,
approximate nearest neighbor, hnsw,
feature tuning
Since Vespa has a different approach, is there anything Vespa is learning from Elastic/Solr/Lucene?
Also the other way around, Elastic/Solr learning from Vespa?
[ quicklink].
Both are great engines! Vespa’s toolbox is bigger.
Learned how Elasticsearch became popular:
developer friendliness, nice APIs, great support for analytics, great for handling immutable data.
Lucene has had a large developer crowd for 20 years.
If I remember correctly, FAISS or similar libraries support indexing/searching with the power of GPU,
how does this compare to Vespa/Elastic/Solr?
[ quicklink ].
Vespa is CPU only, but looking at GPU as pretrained language models grow larger.
GPU easier to use in indexing than serving.
We are trying to find models that run efficiently on GPU. Vespa is written in C++,
making use of OpenBLAS and special instructions to get the most out of CPUs.
Resources: github.com/vespa-engine/vespa/issues/14406
Given large language model dominance, in 5 years, how much do we need to support manual relevance tuning operations?
Should that be our focus? Or will search engines just be initial retrieval before sending docs to eg. BERT?
[ quicklink ].
BERT and pretrained language models helps machines understand text better than before,
dramatic progress on ranking, roughly 2x BM25 on multiple Information retrieval datasets.
However more than just text matching and ranking, like click models and site popularity.
In Vespa, ranking with BERT locally on the content nodes,
can combine scoring from language model into LTR framework, taking other signals into account.
There are ways to use BERT that could lead to close to random ranking,
e.g. using BERT as a representation model without fine-tuning for the retrieval task
where there are many many negative (irrelevant) documents.
However, good zero-shot transfer capabilities for interaction based models
has demonstrated strong ranking accuracy on other data sets.
See Pretrained Transformers for Text Ranking: BERT and Beyond.
Resources: from-research-to-production-scaling-a-state-of-the-art-machine-learning-system
Can you speak about the history of Vespa? All top contributors work at Verizon/Yahoo.
Are you aware of prominent Vespa users beside Verizon? Who’s behind Vespa Cloud?
Is there a (larger) ecommerce shop using Vespa in production already?
[ quicklink ].
cloud.vespa.ai is run by Verizon Media.
In Verizon Media, Vespa is used for search and recommendation (including APAC e-commerce) + Gemini ad serving stack.
Vespa’s background is from Fast Search and Transfer, founded in 1997 from NTNU in Trondheim, Norway.
Resources: vespa.ai
What are your plans for growing your communities?
Where should we go to ask questions and have discussion?
[ quicklink ].
#vespa on Stack Overflow,
Gitter channel,
#vespa channel of Relevance Slack.
Asking the extended Vespa team to document use cases / blog posts.
Resources: docs.vespa.ai,
vespa.ai/support
What type of node? Helps me understand 50k/node number
Single value update assign of an int field on a c5d.2xlarge, 8 v-cpu, 16GB, 200G SSD. 49K updates/s.
How does vespa handle search query contain both dense vector + scalar fields?
I.e. internally, it first retrieves top-k doc and then to the filters?
See the How does Vespa compare to FAISS? question above –
filter first, maybe using TAAT for speed, then top-k.
This to ensure low latency and non-empty result sets.
Which engine supports the usage of KNN clustering together with vector similarity queries?
Vespa supports approximate nearest neighbor search using HNSW,
but can also be combined with pre-computed KNN clustering
where vectors have been assigned a cluster id at indexing time.
Using the Vespa ranking framework,
one can combine (approximate) nearest neighbor queries with any other computations.
Using tensors and operations on these, custom algorithms can be built.
Resources: tensor user guide,
approximate nearest neighbor HNSW,
ranking
Which engine would you use for real-time systems with emphasis on queries latency?
The Vespa Team has helped implementation of numerous applications
with millisecond latency requirements and update rates in thousands per second per node in Verizon Media.
When the feed operation is ack’ed, the operation is visible.
There is no index refresh delay or immutable batch indexing
as in engines like Solr or Elasticsearch using the batch oriented Lucene library.
Vespa also allows using multiple searcher threads per query to scale latency versus throughput,
functionality which is not exposed in Solr or Elasticsearch.
Does Vespa support IBM ICU libraries? (language processing question as well)
Yes, used in sorting.
For what kind of problem would you recommend Elastic or Solr for (over Vespa)?
See the question above for anything Vespa is learning from Elastic/Solr/Lucene?
Resources: vespa-elastic-solr
Can any of the search engine beat Redis when it comes to read performance? Do we have any benchmarking?
The Vespa Team has not compared Vespa with Redis, as they are built for different use cases.
Vespa is built for Big Data Serving with distributed computations over large, mutable data sets.
Use Redis for in-memory database, cache, and message broker.
All 3 search engines rely on OS file caching for read optimizations.
How does that work in kubernetes when multiple processes/pods/containers are racing against each other for that?
The Vespa Team has not tested specifically for K8s and we would love to learn from the community when tested!
We run multiple Docker multi-process containers on bare-metal and AWS hosts, memory is isolated, but the IO is shared.
We hence monitor IO, but not more than that.
I’d love to read more about the TAAT/DAAT mix and ANN, I don’t follow that yet.
Any chance you can drop a code link or doc link?
See feature-tuning.
We will see if we can publish a paper or article on this subject.
With regard to GPU vs CPU this is also asking “How do you execute code on multi-arch cluster?”.
If you’re on K8s, you may just be calling out across the nodes.
Something like the nboost proxy is an interesting example
Moving computation to where the data lives is the mantra for both Vespa and the Map Reduce paradigm (Hadoop).
This allows scaling latency and throughput without moving data across the wire.
Vespa integrates with many machine learning techniques and allows,
e.g. using the pre-trained language model relevancy score in combination with other core ranking features