Seq 5 Signal Indexing Beta

TL;DR: Seq 5.0.2138-pre is our first supported beta from the 5.x series, and will be upgradable through to RTM. Seq 5 includes massive storage improvements for high-volume deployments. Download from https://getseq.net/download or pull the Docker image.

Earlier this year, we started work on Seq version 5, aiming to deliver a completely new storage engine, support for Docker and Linux, and improvements in practically every part of the app.

Seq 5 - Dark Theme

We haven't held back. Seq 5 is a major overhaul that looks great, runs just about anywhere, works more efficiently for small deployments, and is more stable and responsive for large ones.

While there are still a few features left on our backlog, we're confident that Seq 5 is stable enough for beta-level use, and from this point we'll be testing to ensure that beta installations can be upgraded smoothly to RTM when it arrives.

The biggest change in Seq 5 is at the storage layer: while past releases have focused on the performance of queries from Seq's in-memory cache, Seq 5 lays new foundations for improving longer-range queries from the disk archive through signal indexes. We hope that, through the beta, we can gather feedback and tune this feature for a wide range of usage scenarios.

What are signal indexes?

Internally, Seq is divided into two complementary storage systems: the in-memory cache, which is fast, and the disk-backed archive, which is slow.

Cache and Archive

This works because, for diagnostics, a significant majority of queries will be over recent events: it's more common to hunt for the source of a bug that popped up last week, rather than one that was observed last year.

As Seq is used for larger and larger event volumes, though, there's a limit to how much history can be stored in the cache. Once queries have to be served from the disk archive, the performance drop can be a burden.

While we could turn to comprehensive secondary indexes to improve archive search performance, we'd then be on the well-trodden path to new I/O bottlenecks because of write amplification. It's easy to imagine generating 10× as much secondary index data as the log data itself.

Seq 5 instead uses very lightweight page-level indexes to narrow down search queries. When you create a signal in Seq 5, Seq's new storage engine makes a map of the disk pages that contain one or more matching events. Since the map uses just one bit per signal per page, at most a few megabytes of index information typically have to be maintained for each gigabyte of log data.

Queries in the Seq UI that are scoped to one or more signals will then read less disk pages, parse fewer event payloads, and spend less time evaluating filter expressions.

Signal index performance

To give you an idea of the difference signal indexes can make, we've collected some timings from our test data set and compared the time it takes to run select count(*) from stream on a synthetic data set in Seq 4.2, to the time taken by Seq 5.0.

In order to avoid the effects of the memory cache, we've turned it off on the test system so that all results are served from the disk archive. The test system is an i7 ThinkPad laptop with SSD, and the numbers are best-of-three server-reported execution times (ignoring network round-trips).

SignalCountSeq 4.2 (ms)Seq 5.0 (ms)Improvement
None1509541220761544530%
Warnings2749621868358884%
Errors34302198150198%
Exceptions11862149517799%

As you can see, querying the full event stream is already faster in Seq 5.0, but once the query is scoped down to Warnings, the signal index kicks in and what takes 27 seconds for Seq 4.2 to compute needs only 4 seconds in Seq 5.0.

Errors and Exceptions are even more noticeable, getting well into subsecond response times.

The neat thing about choosing signals for Seq's second-level indexing strategy (after the primary timestamp index) is that they map very well to how Seq is used in practice. If you activate Errors in Production in Web front end, the combination of the indexes behind those signals will cut down the search time proportionally to how much the signals themselves intersect.

When do signal indexes kick in?

Seq stores events in 7-day extents, and each extent is tied to the storage engine that created it. Existing data stored using Seq 4 won't be indexed, as it will continue using the ESENT storage engine from that version. Only new extents created after the upgrade to Seq 5 will be indexed, so in practice, it may take some time for the effects of the new version to become evident.

What else is in the beta?

Signal indexing - and the new native storage engine that drives it - is one of the headline features of the release, but there are a stack of other things you should check out in the beta. Here are a few more to try:

  • seqcli - ingest, search, and administer using the new command-line client, included in the Windows installer, and downloadable for macOS and Linux; run seqcli help for a list of available commands
  • Fine-grained API key permissions and personal API keys - check this out by creating a new key in the API keys screen, and by clicking your username and selecting API keys
  • Manageability improvements - see the ingested data rate per API key, and the RAM used by Seq apps; permalinks no longer interfere with disk space reclamation, and more
  • New light and dark themes - click your username, choose Theme, and pick the color scheme that works best for your viewing conditions

Release Schedule

It's likely that further betas will follow as we finish the last few features, and squash any bugs that show up. We're targeting this quarter for a stable release. If you're in a position to try today's beta, we'd love to have your feedback.

Getting the beta

Windows installers are on https://getseq.net/download (bottom of page). For instructions using the Docker image, read the quick-start documentation here. Existing Seq installations from version 3.0 on will upgrade in-place: just click through the MSI and you're done.

Happy logging!

nblumhardt

Read more posts by this author.