Skip to content
Watches & Compaction

Watches & Compaction

Watch vs Watcher

Note the distinction between a Watch and a Watcher:

  • Watch: a long-lived subscription to key/key-range changes that streams ordered updates (puts/deletes) in real time as they are committed.
    • Watch events are only delivered for Records at or below the committed_revision — tentative Records above the committed_revision are never visible to Watchers.
  • Watcher: a process connected to the etcd API with one or more Watch.

All Netsy Nodes can have a set of independent Watchers with multiple Watches.

For example, each kubectl client can have an active Watch, and they would be connected to a kube-apiserver, which is a Watcher.

Minimum Watch Revisions

Each Watcher and Watch is tracked in-memory on each Node. Critically, when a new Watch is created, each Node must calculate the min(imum) revision for that Watch.

The Peer API of each Node exposes an endpoint whereby the global min(inimum) version for all of its Watches can be queried by the Primary, which is critical information for Compaction.

If a Node has no active Watches, it returns its current committed_revision as the minimum revision.

Each Node also persists the latest accepted Compaction Revision locally. Restarting Replicas can also restore watch-admission gating from the Primary’s Initial message on the Follow stream, without waiting for the next compaction cycle.

What is Compaction?

Compaction is the process of removing historical data from the KV Data store.

Due to the nature of etcd’s API design, every create/update/delete operation writes a new record:

  • Create “example” key with value “example1”
    • KV Data will now have revision 1 example=example1.
  • Update “example” key with value “example2”
    • KV Data will now have revision 1 example=example1, and revision 2 example=example2.
  • Delete “example” key.
    • KV Data will now have revision 1 example=example1, and revision 2 example=example2, and revision 3 example (record deleted).

If the first or second revision is no longer tracked by a Watch, they can be safely removed from the KV Data store.

How Compaction Works in Netsy

The current Primary can periodically schedule Compaction across all Nodes.

To do this, it retrieves the global min revision of all Watches for each Node via the Peer API, and then finds the global minimum of that. The Compaction Revision is set to one below this global minimum, so that every revision up to and including the Compaction Revision is considered safe to “compact” (matching etcd’s inclusive semantics).

  • If a Node cannot be successfully queried for the min revision, the Compaction process ends early and awaits its next scheduled occurrence.

Once the Compaction Revision has been identified, if it is greater than the previous Compaction Revision:

  1. The Primary will send a notice to every Node that the new minimum revision will be this “compaction revision”. Each Node must, under a single lock, atomically: first raise its watch-admission floor to the compaction revision (blocking any new Watch requests for revisions below it), then validate that no existing active Watch has a revision below the compaction revision. If validation fails, the Node rolls back the floor and rejects the notice. Only if both steps succeed does the Node confirm. If any of the Nodes fail to confirm, they are retried once, or otherwise the Compaction process exits until the next interval.

  2. Once the notice has been accepted cluster-wide, the Primary sends a logical Compact message on Follow streams. On receiving that confirmation, each Node persists the Compaction Revision into its local compactions table and keeps the watch-admission gate in place durably. If a restarting Node or a newly elected Primary finds this table empty during startup or preflight, it must seed it from the Compaction Revision implied by contiguous existing records rows with compacted_at already set before accepting new Watches or writes.

Nodes must then enqueue an async compaction task, where it simply sets the compacted_at timestamp and value to NULL for any record not already compacted with a revision number at or below the compacted_revision (inclusive, matching etcd semantics)

Note that unlike etcd, Netsy does not remove the record entirely, only the value blob.

Compaction & Snapshots

Because of the design of this compaction mechanism, all future snapshots created will be effectively compacted - retaining a full history of revisions, but without the overhead of (often large) values. No new records/chunks will be produced as a result of the process.

Compaction and snapshot creation can safely run concurrently because SQLite’s WAL mode provides snapshot isolation — a snapshot read sees a consistent point-in-time view regardless of concurrent compaction writes. No additional locking between the two processes is required.