Introducing TensorFlow support | Vespa Blog

In previous blog posts we have talked about Vespa’s tensor API which enables some advanced ranking capabilities. The primary use case is for machine learned ranking, where you train your models using some machine learning framework, convert the models to Vespa’s tensor format, and deploy them to Vespa. This works well, but converting trained models to Vespa form is cumbersome.

We are now happy to announce a new feature that makes this process a lot easier: TensorFlow import. With this feature you can directly deploy models you’ve trained in TensorFlow to Vespa, and use these models during ranking. This means that the models are executed in parallel over multiple threads and machines for a single query, which makes it possible to evaluate the model over any number of data items and still bound the total response time. In addition the data items to evaluate with the TensorFlow model can be selected dynamically with a query, and with a cheaper first-phase rank function if needed. Since the TensorFlow models are evaluated on the nodes storing the data, we avoid sending any data over the wire for evaluation.

In this post we’d like to introduce this new feature by discussing how it works, some assumptions behind working with TensorFlow and Vespa, and how to use the feature.

Vespa is optimized to evaluate models repeatedly over many data items (documents).  To do this efficiently, we do not evaluate the model using the TensorFlow inference engine. TensorFlow adds a non-trivial amount of overhead and instrumentation which it uses to manage potentially large scale computations. This is significant in our case, since we need to evaluate models on a micro-second scale. Hence our approach is to extract the parameters (weights) into Vespa tensors, and use the model specification in the TensorFlow graph to generate efficient Vespa tensor expressions.

Importing TensorFlow models is as simple as saving the TensorFlow model using the
SavedModel API,
adding those files to the Vespa application package, and referencing the model using the new TensorFlow ranking feature.
For instance, if your files are in models/my_model in the application package:

first-phase {
    expression: sum(tensorflow(“my_model/saved”))
}

The above expressions runs the model, and sums it to a single scalar value to use in ranking.  One thing you will have to provide is the input(s), or feed, to the graph. Vespa expects you to provide a macro with the same name as the input placeholder. In the macro you can specify where the input should come from, be it a parameter sent along with the query, a document field (possibly in a parent document) or a constant.

As mentioned, Vespa evaluates the imported models once per document. Depending on the requirements of the application, this can impose some natural limitations on the size and complexity of the models that can be evaluated. However, Vespa has a number of other search and rank features that can be used to reduce the search space before running the machine learned models. Typically, one would use the search and first ranking phases to select a relatively small number of candidate documents, which are then given their final rank score in the more computationally expensive second phase model evaluation.

Also note that TensorFlow import is new to Vespa, and we currently only support a subset of the TensorFlow operations. While the supported operations should suffice for many relevant use cases, there are some that are not supported yet due to potentially being too expensive to evaluate per document. For instance, convolutional networks and recurrent networks (LSTMs etc) are not supported. We are continually working to add functionality, if you find that we have some glaring omissions, please let us know.

Going forward we are focusing on further improving performance of our tensor framework for important use cases. We’ll follow up this post with one showing how the performance of evaluation in Vespa compares with TensorFlow serving. We will also add more supported frameworks and our next target is ONNX.

You can read more about this feature in the ranking with TensorFlow model in Vespa documentation. We are excited to announce the TensorFlow support, and we’re eager to hear what you are building with it.

Introducing ONNX support | Vespa Blog

ONNX (Open Neural Network eXchange) is an open format for the sharing of neural network and other machine learned models between various machine learning and deep learning frameworks. As the open big data serving engine, Vespa aims to make it simple to evaluate machine learned models at serving time at scale. By adding ONNX support in Vespa in addition to our existing TensorFlow support, we’ve made it possible to evaluate models from all the commonly used ML frameworks with low latency over large amounts of data.

Introduction

With the rise of deep learning in the last few years, we’ve naturally enough seen an increase of deep learning frameworks as well: TensorFlow, PyTorch/Caffe2, MxNet etc. One reason for these different frameworks to exist is that they have been developed and optimized around some characteristic, such as fast training on distributed systems or GPUs, or efficient evaluation on mobile devices. Previously, complex projects with non-trivial data pipelines have been unable to pick the best framework for any given subtask due to lacking interoperability between these frameworks. ONNX is a solution to this problem.

ONNX

ONNX is an open format for AI models, and represents an effort to push open standards in AI forward. The goal is to help increase the speed of innovation in the AI community by enabling interoperability between different frameworks and thus streamlining the process of getting models from research to production.

There is one commonality between the frameworks mentioned above that enables an open format such as ONNX, and that is that they all make use of dataflow graphs in one way or another. While there are differences between each framework, they all provide APIs enabling developers to construct computational graphs and runtimes to process these graphs. Even though these graphs are conceptually similar, each framework has been a siloed stack of API, graph and runtime. The goal of ONNX is to empower developers to select the framework that works best for their project, by providing an extensible computational graph model that works as a common intermediate representation at any stage of development or deployment.

Vespa is an open source project which fits well within such an ecosystem, and we aim to make the process of deploying and serving models to production that have been trained on any framework as smooth as possible. Vespa is optimized toward serving and evaluating over potentially very large datasets while still responding in real time. In contrast to other ML model serving options, Vespa can more efficiently evaluate models over many data points. As such, Vespa is an excellent choice when combining model evaluation with serving of various types of content.

Our ONNX support is quite similar to our TensorFlow support. Importing ONNX models is as simple as adding the model to the Vespa application package (under “models/”) and referencing the model using the new ONNX ranking feature:

    expression: sum(onnx("my_model.onnx"))

The above expression runs the model and sums it to a single scalar value to use in ranking. You will have to provide the inputs to the graph. Vespa expects you to provide a macro with the same name as the input tensor. In the macro you can specify where the input should come from, be it a document field, constant or a parameter sent along with the query. More information can be had in the documentation about ONNX import.

Internally, Vespa converts the ONNX operations to Vespa’s tensor API. We do the same for TensorFlow import. So the cost of evaluating ONNX and TensorFlow models are the same. We have put a lot of effort in optimizing the evaluation of tensors, and evaluating neural network models can be quite efficient.

ONNX support is also quite new to Vespa, so we do not support all current ONNX operations. Part of the reason we don’t support all operations yet is that some are potentially too expensive to evaluate per document, such as convolutional neural networks and recurrent networks (LSTMs etc). ONNX also contains an extension, ONNX-ML, which contains additional operations for non-neural network cases. Support for this extension will come later at some point. We are continually working to add functionality, so please reach out to us if there is something you would like to have added.

Going forward we are continually working on improving performance as well as supporting more of the ONNX (and ONNX-ML) standard. You can read more about ranking with ONNX models in the Vespa documentation. We are excited to announce ONNX support. Let us know what you are building with it!

Introducing JSON queries | Vespa Blog

We recently introduced a new addition to the Search API – JSON queries. The search request can now be executed with a POST request, which includes the query-parameters within its payload. Along with this new query we also introduce a new parameter SELECT with the sub-parameters WHERE and GROUPING, which is equivalent to YQL.

The new query

With the Search APIs newest addition, it is now possible to send queries with HTTP POST. The query-parameters has been moved out of the URL and into a POST request body – therefore, no more URL-encoding. You also avoid getting all the queries in the log, which can be an advantage.

This is how a GET query looks like:

GET /search/?param1=value1¶m2=value2&...

The general form of the new POST query is:

POST /search/ { param1 : value1, param2 : value2, ... }

The dot-notation is gone, and the query-parameters are now nested under the same key instead.

Let’s take this query:

GET /search/?yql=select+%2A+from+sources+%2A+where+default+contains+%22bad%22%3B&ranking.queryCache=false&ranking.profile=vespaProfile&ranking.matchPhase.ascending=true&ranking.matchPhase.maxHits=15&ranking.matchPhase.diversity.minGroups=10&presentation.bolding=false&presentation.format=json&nocache=true

and write it in the new POST request-format, which will look like this:

POST /search/ { "yql": "select \* from sources \* where default contains \"bad\";", "ranking": { "queryCache": "false", "profile": "vespaProfile", "matchPhase": { "ascending": "true", "maxHits": 15, "diversity": { "minGroups": 10 } } }, "presentation": { "bolding": "false", "format": "json" }, "nocache": true }

With Vespa running (see Quick Start or
Blog Search Tutorial),
you can try building POST-queries with the new querybuilder GUI at http://localhost:8080/querybuilder/, which can help you build queries with e.g. autocompletion of YQL:

image

The Select-parameter

The SELECT-parameter is used with POST queries and is the JSON equivalent of YQL queries, so they can not be used together. The query-parameter will overwrite SELECT, and decide the query’s querytree.

Where

The SQL-like syntax is gone and the tree-syntax has been enhanced. If you’re used to the query-parameter syntax you’ll feel right at home with this new language. YQL is a regular language and is parsed into a query-tree when parsed in Vespa. You can now build that tree in the WHERE-parameter with JSON. Lets take a look at the yql: select * from sources * where default contains foo and rank(a contains "A", b contains "B");, which will create the following query-tree:

image

You can build the tree above with the WHERE-parameter, like this:

{
    "and" : [
        { "contains" : ["default", "foo"] },
        { "rank" : [
            { "contains" : ["a", "A"] },
            { "contains" : ["b", "B"] }
        ]}
    ]
}

Which is equivalent with the YQL.

Grouping

The grouping can now be written in JSON, and can now be written with structure, instead of on the same line. Instead of parantheses, we now use curly brackets to symbolise the tree-structure between the different grouping/aggregation-functions, and colons to assign function-arguments.

A grouping, that will group first by year and then by month, can be written as such:

| all(group(time.year(a)) each(output(count())
         all(group(time.monthofyear(a)) each(output(count())))

and equivalentenly with the new GROUPING-parameter:

"grouping" : [
    {
        "all" : {
            "group" : "time.year(a)",
            "each" : { "output" : "count()" },
            "all" : {
                "group" : "time.monthofyear(a)",
                "each" : { "output" : "count()" },
            }
        }
    }
]

Wrapping it up

In this post we have provided a gentle introduction to the new Vepsa POST query feature, and the SELECT-parameter. You can read more about writing POST queries in the Vespa documentation. More examples of the POST query can be found in the Vespa tutorials.

Please share experiences. Happy searching!

Introducing NLP with Transformers on Vespa

It really isn’t an exaggeration to claim that the field of NLP has been
revolutionized in the last year or so by the introduction of the Transformer
and related models such as the Bidirectional Encoder Representations from
Transformers (BERT). Indeed, BERT has since it’s release dominated various
leaderboards on NLP related tasks such as MS
MARCO. Extending beyond research, a
growing number of companies have shown considerable interest in adopting these
models for production.

One of the reasons for this is the ease of getting started. This is in large
part due to Hugging Face and it’s open-source
Transformers library. With this
library it’s easy to start with any of the thousand or so pretrained base
models, and fine-tune it to a specific task such as text classification,
translation, summarization, text generation or question/answering. This is an
attractive proposition considering that some of these base models are immense,
requiring huge amounts of data and computational resources to train. The cost
of training can sometimes run into the millions of dollars. In contrast, taking
a base model and fine-tuning it requires much less effort, making powerful NLP
capabilities available to a larger community.

Recently it has also become easier to deploy and serve these models in
production. The Transformers library has added functionality to export models to
ONNX,
allowing for greater flexibility in model serving since this is largely
independent from whether or not the model was trained on Tensorflow or PyTorch.
We’ve been working a lot lately on being able to evaluate Transformer models in
Vespa, so in this blog post we thought we would share a bit on how we perceive
the benefits of inference on Vespa, show how to use a transformer model in
ranking with a small sample application, and discuss future directions we are
working toward.

Why Vespa?

A common approach to serve machine learned models in general is to set up a
model server and call out to this service from somewhere in your serving stack.
This is fine for tasks that evaluate a single data point for each query, for
instance classification, text generation or translation. However, for certain
application types such as search and recommendation this can become a scalability
bottleneck, as these
applications need to evaluate the model with a potentially large number of
items. One can quickly reach network saturation due to the multiplicative
effect of number of queries per second, data points per query, and
representation size.

Evaluating models on an external model server

One of the guiding principles in Vespa is to move the computation to the data
rather than the other way around. Vespa is a distributed application that
consists of a set of stateless nodes and a set of stateful content nodes which
contains the data. A query is first processed on the stateless layer before
being fanned out to the content nodes. The content nodes handle data-dependent
computation and each return their results back to the stateless layer where the
globally best results are determined.

Evaluating models on the content nodes

So when deploying a Vespa application, the machine learned models are
automatically deployed to all content nodes, and evaluated there for each
query. This alleviates the cost of query time data transportation. Also, as
Vespa takes care of distributing data to all content nodes and redistributing
elastically, one can scale up computationally by adding more content nodes thus
distributing computation as well. Additionally, this reduces system complexity
as there are fewer production services to maintain. This last point is
something which one should not discount.

One of the really unique features of Vespa is the flexibility one has to
combine results from various features and string models together. For instance,
one could use a small, fast model in an early phase, and a more complex and
computationally expensive model that only runs on the most promising
candidates. From a text search perspective that could be BM25 combined with a
Transformer model. For instance:

rank-profile bm25_and_transformer {
    first-phase {
        expression: bm25(content)
    }
    second-phase {
        rerank-count: 10
        expression: onnx("bert.onnx")
    }
}

This is an example of how to instruct Vespa to calculate the BM25 score as a
first stage and send the top 10 candidates to the BERT model. Note that this is
per content node, so with 10 content nodes, the BERT model is running
effectively on 100 data points.

Evaluation of models from different platforms such as Tensorflow, PyTorch,
XGBoost and LightGBM can be freely combined as well, even within the same
expression. To efficiently search for potential candidates one can use WAND.
Recently we’ve also added approximate nearest
neighbors,
giving the option of a highly performant nearest neighbor search which can
naturally be based on textual representation as well.

In summary, Vespa offers ease of deployment, flexibility in combining many
types of models and computations out of the box without any plugins or
extensions, efficient evaluation without moving data around and a less complex
system to maintain. This makes Vespa an attractive platform.

Ranking with Transformers

For a taste of how to use Transformer models with Vespa we’ve added a small
sample application:
https://github.com/vespa-engine/sample-apps/tree/master/transformers.
In this sample app we use the MS MARCO dataset which combines both queries,
content and relevance judgements. For the purposes of this sample, we won’t
fine-tune the model and will just use the base model as-is. Our goal is to set
up a Vespa application that indexes the documents and scores content based on a
BM25 stage followed by a Transformer stage. The sample app contains a README
that goes through all the steps, but here we’ll discuss some of the highlights.

One decision that needs to be made is which Transformer model to use. It’s
worth mentioning that large models have a significant computational cost which
has a direct impact on performance and the scalability of the application. So
to keep latency down we use a fairly small model (“nboost/pt-tinybert-msmarco”)
for this sample application. We download and export the model to ONNX using the
Transformers library. Our export script builds upon the official
convert_graph_to_onnx.py rather than using it directly because we want to use
the equivalent of the Transformer AutoModelForSequenceClassification, and the
official export script does not export the additional tensors required for the
linear transformation on top of the base model. The script puts the exported
model into the “models” directory of the Vespa application package where it
will ultimately be imported and distributed automatically to all content nodes.

We also need to create the data feed. As part of evaluating any Transformer
model, text needs to be tokenized. The tokenizer is part of the model as the
model is dependent upon stable tokenization during both training and inference.
For the purposes of this sample app, we have not implemented a tokenizer in
Vespa, meaning that we handle tokenization outside of Vespa. So in the
conversion of MS MARCO data to a Vespa feed, we also use the model’s tokenizer
to generate tokens for each piece of content. This means that when querying
Vespa, we currently need to send in the tokenized representation of the query
as well. In a follow-up post we will show how to port a tokenizer and use that
during document and query processing in Vespa.

Putting these together, we need to decide which fields to index for each piece
of content as well as how to compute each result. This means defining a
document schema which includes setting up expressions for how candidates for
retrieval should be calculated. The fields we set up for this sample
application are:

field id type string {
    indexing: summary | attribute
}
field title type string {
    indexing: index | summary
}
field url type string {
    indexing: index
}
field body type string {
    indexing: index
}
field tokens type tensor(d0[128]) {
    indexing: attribute
}
</pre>

The `id`, `title`, `url` and `body` fields come directly from MS MARCO. The
`tokens` field stores the token sequence from the tokenizer mentioned above. Note
that we’ve decided upon a  sequence length of 128 here to keep sizes small. The
cost of evaluating Transformer type models is generally quadratic in relation
to the sequence length, so keeping them short has significant gains in
performance. This means that we only store the first 128 tokens for each
document. The documents in MS MARCO are significantly larger than that however,
and a common way of handling that is to instead index up each paragraph, or
perhaps even each sentence, for every document. However, we have not done that
explicitly in this application.

We also need to define how to compute each result. Evaluating the model is
fairly easy in Vespa:

rank-profile transformer {
    first-phase {
        expression: bm25(title) + bm25(body)
    }
    second-phase {
        rerank-count: 10
        expression: onnx("rankmodel.onnx", "default", "output_1")
    }
}

The first-phase expression tells Vespa to calculate the BM25 score of the query
against the `title` and `body` fields. We use this as a first pass to avoid
evaluating the model on every document. The second-phase instructs Vespa to
evaluate `"rankmodel.onnx"` (the one exported from the
Transformers library) and calculate `"output_1"` with the top 10
candidates from the previous stage. Note that this isn’t the actual expressions
used in the sample app where the output from the model is sent through a linear
transformation for sequence classification.

Most transformer models have three inputs: `input_ids`, `token_type_ids` and an
`attention_mask`. The first is the token sequence for input and in this case is
the combined sequence of tokens from both the query and document. When Vespa
imports the model it looks for functions with the same names as the inputs to
the model. So a simplified version of the `input_ids` function can be as follows:

# Create input sequence: CLS + query + SEP + document + SEP + 0's
function input_ids() {
    expression {
        tensor(d0[1],d1[128])(
            if (d1 == 0,
                TOKEN_CLS,   # 101
            if (d1 < input_length + 1,
                query(input){d0:0, d1:(d1-1)},
            if (d1 == input_length + 1 || d1 == 127,
                TOKEN_SEP,   # 102
            if (d1 < document_length + input_length + 2,
                attribute(tokens){d0:(d1-input_length-2)},
                TOKEN_NONE   # 0
        )))))
    }
}
</pre>

This constructs the input tensor (of size 1x128) by extracting tokens from the
query or the document based on the dimension iterators. The values `input_length`
and `document_length` are themselves functions that return lengths of the input
and document respectively. Note that this function in the actual sample app is
a bit more complex to cater for documents shorter than 128 tokens. After
`input_ids` is calculated, it is fairly trivial to find the other two.

One consideration is that Vespa evaluates the model once per candidate. This
means that the latency is directly proportional to the number of candidates to
evaluate the model on. The default number of threads per query is set to 1, but
[this is easily
tuned](https://docs.vespa.ai/en/reference/services-content.html#requestthreads-persearch).
This allows for lower latency when evaluating multiple candidates. Note that
using a larger number of threads per query might have a negative impact when
handling many queries in parallel, so this is something that must be tuned on a
per application basis.

So, when it comes to setting up Vespa, that is basically it. In summary:

1. Put the model in the application package under a "models" directory.
2. Define a document schema.
3. Describe how to score each document.

After feeding the documents to Vespa, we are ready to query. We use the queries
in MS MARCO and tokenize them using the same tokenizer as the input, resulting
in a query looking something like this:

http://localhost:8080/search/?hits=10&ranking=transformer&
yql=select+%2A+from+sources+%2A+where+content+CONTAINS+
%22what%22+or+content+CONTAINS+%22are%22+or+content+CONTAINS+
%22turtle%22+or+content+CONTAINS+%22beans%22%3B&
ranking.features.query(input)=%5B2054%2C2024%2C13170%2C13435%5D

Here, the YQL statement sets up an OR query for “what are turtle beans”. This
means Vespa will match all documents that have at least one occurrence of each
of these terms, and rank them according to their BM25 score. The
`ranking.features.query(input)` defines an input tensor that represents the token
sequence of this sentence, in this case `[2054, 2024, 13170, 13435]`. Both these
input parameters are url-encoded. For this query, the top rated result (aptly
titled “What are Turtle Beans”) receives a positive class probability from this
model of 0.92. This value

Introducing Vespa CLI | Vespa Blog

Martin Polden

Martin Polden

Principal Vespa Engineer


Historically, the primary methods for deploying and interacting with Vespa
applications
has been to use Vespa APIs directly or via
our Maven plugin.

While these methods are effective, neither of them are seamless. Using the APIs
typically involves copying dense terminal commands from the Vespa documentation,
and assumes that the user has access to a variety of terminal tools. The Maven
plugin assumes the user has a Java development toolchain installed and
configured, which is unnecessary for some use-cases.

We therefore decided to build an official command-line tool that supports both
self-hosted Vespa installations and Vespa Cloud, focusing on ease of use.

Vespa CLI

Vespa CLI is a zero-dependency tool built with Go, available for Linux, macOS
and Windows.

Using the initial release of Vespa CLI you can:

  • Clone our sample applications
  • Deploy your application to a Vespa installation running locally or remote
  • Deploy your application to a dev zone in Vespa Cloud
  • Feed and query documents
  • Send custom requests with automatic authentication

To install Vespa CLI, choose one of the following methods:

To learn how to use Vespa CLI check out our getting started guides:

Vespa CLI is open source under the same license as Vespa itself and its source
code is part of the Vespa
repository. If you
encounter problems or want to provide feedback on Vespa CLI, feel free to file
a GitHub issue.

Introducing Lucene Linguistics | Vespa Blog

This post is about an idea that was born at the Berlin Buzzwords 2023 conference and its journey towards the production-ready implementation of the new Apache Lucene-based Vespa Linguistics component.
The primary goal of the Lucene linguistics is to make it easier to migrate existing search applications from Lucene-based search engines to Vespa.
Also, it can help improve your current Vespa applications.
More on that next!

Context

Even though these days all the rage is about the modern neural-vector-embeddings-based retrieval (or at least that was the sentiment in the Berlin Buzzwords conference), the traditional lexical search is not going anywhere:
search applications still need tricks like filtering, faceting, phrase matching, paging, etc.
Vespa is well suited to leverage both traditional and modern techniques.

At Vinted we were working on the search application migration from Elasticsearch to Vespa.
The application over the years has grown to support multiple languages and for each we have crafted custom Elasticsearch analyzers with dictionaries for synonyms, stopwords, etc.
Vespa has a different approach towards lexical search than Elasticsearch, and we were researching ways to transfer all that accumulated knowledge without doing the “Big Bang” migration.

And here comes a part with a chat with the legend himself, Jo Kristian Bergum, on the sunny roof terrace at the Berlin Buzzwords 2023 conference.
Among other things, I’ve asked if it is technically possible to implement a Vespa Linguistics component on top of the Apache Lucene library.
With Jo’s encouragement, I’ve got to work and the same evening there was a working proof of concept.
This was huge!
It gave a promise that it is possible to convert almost any Elasticsearch analyzer into the Vespa Linguistics configuration and in this way solve one of the toughest problems for the migration project.

Show me the code!

In case you just want to get started with the Lucene Linguistics the easiest way is to explore the demo apps.
There are 4 apps:

  • Minimal: example of the bare minimum configuration that is needed to set up Lucene linguistics;
  • Advanced: demonstrates the “usual” things that can be expected when leveraging Lucene linguistics.
  • Going-Crazy: plenty of contrived features that real-world apps might require.
  • Non-Java: an app without Java code.

To learn more: read the documentation.

Architecture

The scope of the Lucene linguistics component is ONLY the tokenization of the text.
Tokenization removes any non-word characters, and splits the string into tokens on each word boundary, e.g.:

“Vespa is awesome!” => [“vespa”, “is”, “awesome”]

In the Lucene land, the Analyzer class is responsible for the tokenization.
So, the core idea for Lucene linguistics is to implement the Vespa Tokenizer interface that wraps a configurable Lucene Analyzer.

For building a configurable Lucene Analyzer there is a handy class called CustomAnalyzer.
The CustomAnalyzer.Builder has convenient methods for configuring Lucene text analysis components such as CharFilters, Tokenizers, and TokenFilters into an Analyzer.
It can be done by calling methods with signatures:

public Builder addCharFilter(String name, Map<String, String> params)
public Builder withTokenizer(String name, Map<String, String> params)
public Builder addTokenFilter(String name, Map<String, String> params)

All the parameters are of type String, so they can easily be stored in a configuration file!

When it comes to discovery of the text analysis components, it is done using the Java Service Provider Interface (SPI).
In practical terms, this means that when components are prepared in a certain way then they become available without explicit coding! You can think of it as plugins.

The trickiest bit was to configure Vespa to load resource files required for the Lucene components.
Luckily, there is a CustomAnalyzer.Builder factory method that accepts a Path parameter.
Even more luck comes from the fact that Path is the type exposed by the Vespa configuration definition language!
With all that in place, it was possible to load resource files from the application package just by providing a relative path to files.
Voila!

All that was nice, but it made simple application packages more complicated than they needed to be:
a directory with at least a dummy file was required!
The requirement stemmed from the fact that in Vespa configuration parameters of type Path were mandatory.
This means that if your component can use a parameter of the Path type, it must be used.
Clearly, that requirement can be a bit too strict.

Luckily, the Vespa team quickly implemented a change that allowed for configuration of Path type to be declared optional.
For the Lucene linguistics it meant 2 things:

  1. Base component configuration became simpler.
  2. When no path is set up, the CustomAnalyzer loads resource files from the classpath of the application package, i.e. even more flexibility in where to put resource files.

To wrap it up:
Lucene Linguistics accepts a configuration in which custom Lucene analysis components can be fully configured.

Languages and analyzers

The Lucene linguistics supports 40 languages out-of-the-box.
To customize the way the text is analyzed there are 2 options:

  1. Configure the text analysis in services.xml.
  2. Extend a Lucene Analyzer class in your application package and register it as a Component.

In case there is no analyzer set up, then the Lucene StandardAnalyzer is used.

Lucene linguistics component configuration

It is possible to configure Lucene linguistics directly in the services.xml file.
This option works best if you’re already knowledgeable with Lucene text analysis components.
A configuration for the English language could look something like this:

<component id="linguistics"
           class="com.yahoo.language.lucene.LuceneLinguistics"
           bundle="my-vespa-app">
  <config name="com.yahoo.language.lucene.lucene-analysis">
    <configDir>linguistics</configDir>
    <analysis>
      <item key="en">
        <tokenizer>
          <name>standard</name>
        </tokenizer>
        <tokenFilters>
          <item>
            <name>stop</name>
            <conf>
              <item key="words">en/stopwords.txt</item>
              <item key="ignoreCase">true</item>
            </conf>
          </item>
          <item>
            <name>englishMinimalStem</name>
          </item>
        </tokenFilters>
      </item>
    </analysis>
  </config>
</component>

The above analyzer uses the standard tokenizer, then stop token filter loads stopwords from the en/stopwords.txt file that must be placed in your application package under the linguistics directory; and then the englishMinimalStem is used to stem tokens.

Component registry

The Lucene linguistics takes in a ComponentRegistry of the Analyzer class.
This option works best for projects that contain custom Java code because your IDE will help you build an Analyzer instance.
Also, JUnit is your friend when it comes to testing.

In the example below, the SimpleAnalyzer class coming with Lucene is wrapped as a component and set to be used for the English language.

<component id="en"
           class="org.apache.lucene.analysis.core.SimpleAnalyzer"
           bundle="my-vespa-app" />

Mental model

With that many options using Lucene linguistics might seem a bit complicated.
However, the mental model is simple: priority for conflict resolution.
The priority of the analyzers in the descending order is:

  1. Lucene linguistics component configuration;
  2. Component that extend the Lucene Analyzer class;
  3. Default analyzers per language;
  4. StandardAnalyzer.

This means that e.g. if both a configuration and a component are specified for a language, then an analyzer from the configuration is used because it has a higher priority.

Asymmetric tokenization

Going against suggestions you can achieve an asymmetric tokenization for some language.
The trick is to, e.g. index with stemming turned on and query with stemming turned off.
Under the hood a pair of any two Lucene analyzers can do the job.
However, it becomes your problem to set up analyzers that produce matching tokens.

Differences from Elasticsearch

Even though Lucene does the text analysis, not everything that you do in Elasticsearch is easily translatable to the Lucene Linguistics.
E.g. The multiplex token filter is just not available in Lucene.
This means that you have to implement that token filter yourself (probably by looking into how Elasticsearch implemented it here).

However, Vespa has advantages over Elasticsearch when leveraging Lucene text analysis.
The big one is that you configure and deploy linguistics components with your application package.
This is a lot more flexible than maintaining an Elasticsearch plugin.
Let’s consider an example: a custom stemmer.

In Elasticsearch land you either create a plugin or (if the stemmer is generic enough) you can try to contribute it to Apache Lucene (or Elasticsearch itself), so that it transitively comes with Elasticsearch in the future.
Maintaining Elasticsearch plugins is a pain because it needs to be built for each and every Elasticsearch version, and then a custom installation script is needed in both production and in development setups.
Also, what if you run Elasticsearch as a managed service in the cloud where custom plugins are not supported at all?

In Vespa you can do the implementation directly in your application package.
Nothing special needs to be done for deployment.
No worries (fingers-crossed) for Vespa version changes.
If your component needs to be used in many Vespa applications, your options are:

  1. Deploy your component into some maven repository
  2. Commit the prebuild bundle file into each application under the /components directory.
    Yeah, that sounds exactly how you do with regular Java applications, and it is.
    Vespa Cloud also has no problems running your application package with a custom stemmer.

Summary

With the new Lucene-based Linguistics component Vespa expands its capabilities for lexical search by reaching into the vast Apache Lucene ecosystem.
Also, it is worth mentioning that people experienced with other Lucene-based search engines such as Elasticsearch or Solr, should feel right at home pretty quickly.
The fact that the toolset and the skill-set are largely transferable lowers the barrier of adopting Vespa.
Moreover, given that the underlying text analysis technology is the same makes migration of the text analysis process to Vespa mostly a mechanical translation task.
Give it a try!