Vespa Product Updates, January 2019: Parent/Child, Large File Config Download, and a Simplified Feeding Interface

In last month’s Vespa update, we mentioned ONNX integration, precise transaction log pruning, grouping on maps, and improvements to streaming search performance. Largely developed by Yahoo engineers, Vespa is an open source big data processing and serving engine. It’s in use by many products, such as Yahoo News, Yahoo Sports, Yahoo Finance, and Oath Ads Platforms. Thanks to feedback and contributions from the community, Vespa continues to evolve.

This month, we’re excited to share the following updates with you:

Parent/Child

We’ve added support for multiple levels of parent-child document references. Documents with references to parent documents can now import fields, with minimal impact on performance. This simplifies updates to parent data as no denormalization is needed and supports use cases with many-to-many relationships, like Product Search. Read more in parent-child.

File URL references in application packages

Serving nodes sometimes require data files which are so large that it doesn’t make sense for them to be stored and deployed in the application package. Such files can now be included in application packages by using the URL reference. When the application is redeployed, the files are automatically downloaded and injected into the components who depend on them.

Batch feed in java client

The new SyncFeedClient provides a simplified API for feeding batches of data with high performance using the Java HTTP client. This is convenient when feeding from systems without full streaming support such as Kafka and DynamoDB.

We welcome your contributions and feedback (tweet or email) about any of these new features or future improvements you’d like to see.

Vespa Product Updates, May 2019: Deploy Large Machine Learning Models, Multithreaded Disk Index Fusion, Ideal State Optimizations, and Feeding Improvements

Kristian Aune

Kristian Aune

Head of Customer Success, Vespa


In last month’s Vespa update, we mentioned Tensor updates, Query tracing and coverage. Largely developed by Yahoo engineers, Vespa is an open source big data processing and serving engine. It’s in use by many products, such as Yahoo News, Yahoo Sports, Yahoo Finance, and the Verizon Media Ad Platform. Thanks to feedback and contributions from the community, Vespa continues to evolve.

For May, we’re excited to share the following feature updates with you:

Multithreaded disk index fusion

Content nodes are now able to sustain a higher feed rate by using multiple threads for disk index fusion. Read more.

Feeding improvements

Cluster-internal communications are now multithreaded out of the box, for  high throughput feeding operations. This fully utilizes a 10 Gbps network and improves utilization of high-CPU content nodes.

Ideal state optimizations

Whenever the content cluster state changes, the ideal state is calculated. This is now optimized (faster and runs less often) and state transitions like node up/down will have less impact on read and write operations. Learn more in the dynamic data distribution documentation.

Download ML models during deploy

One procedure for using/importing ML models to Vespa is to put them in the application package in the models directory. Applications where models are trained frequently in some external system can refer to the model by URL rather than including it in the application package. This use case is now documented in deploying remote models, and solves the challenge of deploying huge models.

We welcome your contributions and feedback (tweet or email) about any of these new features or future improvements you’d like to request.

High performance feeding with Vespa CLI

Martin Polden

Martin Polden

Principal Vespa Engineer


Photo by Shiro
hatori on Unsplash

For a long time
vespa-feed-client has been
the best option for feeding large sets of documents to Vespa efficiently. While
the client itself performs well, it depends on a Java runtime and its
installation method is rather cumbersome. Compared to Vespa CLI it also lacks
many ease-of-use features such as automatic configuration of authentication and
endpoint discovery.

Since our initial announcement of Vespa
CLI it has become the standard
interface for working with Vespa applications, both for self-hosted
installations and Vespa Cloud. However, document
feeding with Vespa CLI was initially limited to single-document operations,
using the vespa document command.

Having to juggle multiple tools while working with Vespa is obviously not ideal.
We therefore decided to implement a high performance feeding client inside Vespa
CLI, thus making it a universal client for Vespa.

Today we’re excited to announce this new feed client! See it in action in the
screencast below:

Performance

The new feed client is ready for most use-cases. If you’re already using
vespa-feed-client and want to switch to vespa feed, we recommend comparing
the feed performance of your particular document set before making the switch.
vespa feed outputs statistics on the same format as vespa-feed-client,
making comparison easy.

We’ve invested a lot of time into making vespa feed as performant as the old
client. In our performance tests, its current default configuration outperforms
the old client when feeding small- (10B) and medium-sized (1KB) documents, but
it still lags behind vespa-feed-client when feeding large (10KB+) documents.

Below you can see a throughput comparison (queries per second) of the two
clients when feeding two million documents at sizes 10B, 1KB and 10KB:

We’ll continue making performance improvements to the new client, so make sure
to keep your Vespa CLI installation up-to-date.

Future of the Java client

The introduction of vespa feed does not deprecate vespa-feed-client. If
you’re already using vespa-feed-client there is no immediate need to migrate
to the new client. vespa-feed-client provides both a Java library and a
command-line interface for that library, both of which will remain supported.

However, if you’d rather use Vespa CLI for all things Vespa and don’t depend on
vespa-feed-client as a Java library, we encourage you to try our new client.

Getting started

The new feed client is available in Vespa CLI as of version 8.164. See vespa
help feed
for usage and the Vespa
documentation for further
details.

If you’re using Homebrew you can upgrade to the latest version using brew
upgrade vespa-cli
or you can download the latest release from our GitHub
releases page.

New to Vespa CLI? Please see our quick start guides for self-hosted
Vespa or Vespa
Cloud.

Found a bug or have a feature request? Feel free to file a GitHub
issue. Need help with Vespa CLI
or Vespa in general? Drop by our community Slack
channel.