My notes on the SILK: Preventing Latency Spikes in LSM Key-Value Stores whitepaper.
SILK is a technique designed to reduce latency spikes in LSM trees.
Many LSM KV stores exhibit high tail latencies.
Root cause: Interference between client writes, flushes, and compactions.
For example: A burst of client writes triggering a burst of flushes. If a number of compactions are going on at the same time, the flushes have to share the limited bandwidth with the compactions, and they become slow. → In-memory component filling up and blocking further writes. → Latency spikes.
Simply limiting bandwidth for internal operations doesn’t solve the problem of limited bandwidth being available for flushes and can, in fact, exacerbate it in the long run. → Postpones compactions but increases the likelihood that, at some point, many compactions occur at the same time.
Main takeaway: Not all internal operations are equal. Internal operations on the lower levels of the tree (i.e., closer to the memtable) are critical because failing to complete them in a timely fashion may result in stalling client operations.
Solutions:
Opportunistically allocating I/O bandwidth to internal operations.
In production workloads, the load of client-facing operations typically varies over time: SILK continuously monitors the bandwidth used by client operations and allocates the available leftover I/O bandwidth to internal operations. For example, allocating less I/O bandwidth to compactions on higher levels during peak client load, and exploiting transient low-load periods to boost the processing of internal operations.
Prioritizing internal operations at the lower levels of the tree: Execution of flushes and compactions from L0 to L1 should be prioritized. SILK splits internal operations into 3 categories (from higher to lower) with respect to the effect they have on client latencies:
Ensures that flushes are fast, making room in memory to absorb incoming updates.
L0 to L1 compactions, ensuring that L0 doesn’t reach its full capacity, so that flushes can proceed.
Compaction on the levels below L1 because, while they maintain the structure of the LSM tree, their timely execution doesn’t significantly affect client operation latencies in the short term.
Preempting compactions: New compaction algorithm that allows internal operations on lower levels of the tree to preempt compaction on higher levels.
Typical recommendation: Set the number of compaction threads equal to the total number of cores. However, we find that the number of compaction threads should instead depend on the total drive I/O bandwidth and the amount of I/O bandwidth required by individual compaction operations.