May. 11, 2026 · 6 min read
Age-Based Live Event Caching
Netflix’s stream of the Tyson-Paul fight in November 2024 peaked at 65 million concurrent viewers (Rayburn, 2025). Many viewers rewound their stream. Rewinding by a few minutes requests segments that receive relatively little traffic at the live edge. A standard LRU cache treats low request frequency as a signal to evict, adding latency for those viewers.
The gap between what viewers want and what edge servers preserve is the problem a recent Netflix patent addresses (Newton, 2026).
What Is the Live Caching Problem?
Standard CDN edge caching works well for video-on-demand. Popular films accumulate requests. Niche titles gradually drop out of cache. The access pattern drives retention. LRU aligns cache incentives with actual demand (Hasslinger et al., 2023).
Live events break this alignment. During a live stream, nearly all viewers watch the live edge. Two-minute-old segments receive a fraction of the traffic that two-second-old segments do. LRU interprets sparse access as low value and evicts accordingly. When a viewer scrubs backward (rewinds), the edge server has nothing to serve. It must fetch from origin or a mid-tier cache, adding latency, loading the upstream link, and degrading the experience for the live concurrent viewers (Liu, Lynch, and Newton, 2025).
The obvious fix is to keep every segment of every active live event in a protected high-priority cache on every edge server until each event ends. That solves scrubbing, but creates a different problem: memory. A distribution center handling many concurrent live streams, each at multiple bitrates and resolutions, needs enormous cache memory to hold all segments simultaneously. That requirement scales with the product of stream count, rendition count, and event duration.
The Assignment Approach
Netflix’s patent US 12,621,504 B2, filed April 6, 2023 and issued May 5, 2026, describes a third path (Newton, 2026).
The core mechanism is assignment. Each live event downloadable (term for a specific encoded rendition combining bitrate, resolution, and source feed) is assigned to exactly one edge server per distribution center. The assignment uses a consistent hash on the downloadable’s identifier. Every edge server independently computes the same assignment without runtime coordination.
The assigned edge server caches every downloaded segment of its assigned downloadable in a high-priority list. Nothing in that list gets evicted while the live event is active. That server becomes the reliable source of full event history for that stream.
Edge servers not assigned to a given downloadable operate differently. They maintain a cutoff threshold, defaulting to five minutes. Segments younger than the threshold are near-live and go into a high-priority list. Segments older than the threshold move to a low-priority list that can be pruned when free cache falls below a threshold.
The result is a two-tier model. Recent segments are available at every edge server in the distribution center. The full event history lives on exactly one server.
Segment Age as a Control Variable
Age is the difference between a segment’s creation time and the current time. Creation time comes from the last-modified header the origin server returns when sending the segment to the CDN (Newton, 2026). This makes the calculation stateless and consistent across all edge servers without inter-server communication.
When an unassigned edge server receives a segment, it checks the age against the cutoff threshold. Near-live segments go to the head of the high-priority list. Segments already past the threshold go to the head of the low-priority list. When a cached high-priority segment later crosses the threshold, it moves to the head of the low-priority list. The low-priority list is pruned tail-first when free cache falls below a floor.
When a client requests a cached segment from an unassigned edge server, the server re-evaluates whether the segment is still near-live. If still near-live, it moves to the head of the high-priority list. If not, to the head of the low-priority list. This request-triggered reclassification lets frequently accessed older segments persist in cache past the cutoff threshold. The patent applies LRU ordering to the high-priority list.
Segments of assigned downloadables are never classified by age. They stay in the high-priority list regardless of age or access frequency.
Common Mistakes
Using the same cache policy for live and VOD content. LRU works for VOD. It fails for live events where access recency and retention value diverge sharply. Running both content types through the same eviction policy means one is served poorly. Netflix runs separate caching applications for each on the same edge server (Newton, 2026).
Ignoring the memory math of uniform high-priority caching. Protecting all segments on every edge server scales with event count, rendition count, and event duration simultaneously. At thirty concurrent streams across multiple bitrates, that cost is not negligible.
Treating scrubbing as a secondary feature. Netflix sheds DVR read load first when the origin is under pressure (Liu, Lynch, and Newton, 2025). That is an intentional load shedding policy, not an architectural flaw. The patent never allows the assigned server to evict segments during an active event. When edge cache is reliable, the origin receives fewer DVR requests under load.
Fixing the cutoff threshold globally. The patent’s default is five minutes. But, a short-form event and a six-hour broadcast have different scrubbing needs. One threshold serves neither.
Leaving the assigned server as a single point of failure. If the assigned server becomes unavailable, full DVR rewind fails until the event ends and VOD segments are available from origin. That failure mode deserves explicit attention in reliability planning.
Put It Into Practice
Find a storage tier or data pipeline in your system where retention is uniform but demand is not. Determine the age past which queries become rare. Identify which node handles the most historical reads for each series, by consistent hashing or load balancer affinity. Designate that node explicitly as the owner of full depth, extend its retention, and configure the others to drop data past the threshold. That is the assignment approach, applied outside the CDN.
As an example, Prometheus stores data with the same retention window on every instance by default. Recent samples are queried constantly. Samples older than a few hours are queried rarely. The correction mirrors what the patent formalizes: assign one instance per metric series to own full retention, configure the rest to serve only a short window. Thanos and Mimir support this through store tier separation and consistent routing (Wilkie, 2019; Pracucci and Dimitrov, 2023).
References
Hasslinger, Gerhard, Mahshid Okhovatzadeh, Konstantinos Ntougias, Frank Hasslinger, and Oliver Hohlfeld (2023). “An overview of analysis methods and evaluation results for caching strategies.” Computer Networks, 228: 109583. https://arxiv.org/abs/2308.02875
Liu, Xiaomei, Joseph Lynch, and Christopher Newton (2025). “Netflix Live Origin.” Netflix Technology Blog, December. https://netflixtechblog.com/netflix-live-origin-41f1b0ad5371
Newton, Christopher Alan (2026). Techniques for Caching Media Content When Streaming Live Events. U.S. Patent 12,621,504 B2. Filed April 6, 2023. Issued May 5, 2026. https://patents.justia.com/patent/12621504
Pracucci, Marco and Dimitar Dimitrov (2023). “Breaking the memory barrier: How Grafana Mimir’s store-gateway overcame out-of-memory errors.” Grafana Labs Blog, July 6. https://grafana.com/blog/2023/07/06/breaking-the-memory-barrier-how-grafana-mimirs-store-gateway-overcame-out-of-memory-errors/
Rayburn, Dan (2023). “A List of the Largest Live Streaming Events in History and How They Are Measured.” Streaming Media Blog, October 16 (updated April 1, 2025). https://www.streamingmediablog.com/2023/10/largestlivestreaminghistory.html
Wilkie, Tom (2019). “[PromCon Recap] Two Households, Both Alike in Dignity: Cortex and Thanos.” Grafana Labs Blog, November 21. https://grafana.com/blog/2019/11/21/promcon-recap-two-households-both-alike-in-dignity-cortex-and-thanos/
Changelog
2026-05-11 Initial publication.