Skip to content

HTTP/2 CONNECT Upgraded stream bypasses H2 flow-control backpressure, causing unbounded memory growth (OOM) #4049

@abbshr

Description

@abbshr

Version

hyper: 1.8.0, 1.8.1, 1.9.0

Platform

Linux x86_64 (production Kubernetes cluster, but platform-independent bug)

Summary

Description

UpgradedSendStreamTask::tick() in src/proto/h2/upgrade.rs has a logic error where break 'capacity (L98) allows send_data() to be called even when h2 flow control capacity is zero. This completely bypasses the max_send_buffer_size backpressure mechanism, causing unbounded memory growth in the h2 per-stream send buffer when the downstream TCP write is slow or blocked. In production, this manifests as rapid OOM (165MB → 8GB+ in ~2 minutes) for HTTP/2 CONNECT tunnel proxies.

Architecture Overview (v1.8.0+)

PR #3967 refactored H2Upgraded to eliminate unsafe transmute code. The new architecture introduces:

  1. H2Upgraded::poll_write — writes data into an mpsc::channel(1) (the "bridge")
  2. UpgradedSendStreamTask::tick() — a separate spawned task that reads from the channel and calls h2::SendStream::send_data()

The backpressure contract is supposed to be:

  • mpsc::channel(1) provides 1 slot of buffering between poll_write and the send task
  • tick() checks h2 flow control capacity via poll_capacity() before calling send_data()
  • When capacity is 0, tick() should suspend (return Poll::Pending), which leaves the channel slot occupied, causing poll_write to also return Poll::Pending

The Bug

In UpgradedSendStreamTask::tick() (src/proto/h2/upgrade.rs, L68-133):

fn tick(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Result<(), crate::Error>> {
    let mut me = self.project();

    loop {
        me.h2_tx.reserve_capacity(1);

        if me.h2_tx.capacity() == 0 {
            'capacity: loop {
                match me.h2_tx.poll_capacity(cx) {
                    Poll::Ready(Some(Ok(0))) => {}
                    Poll::Ready(Some(Ok(_))) => break,
                    Poll::Ready(Some(Err(e))) => {
                        return Poll::Ready(Err(crate::Error::new_body_write(e)))
                    }
                    Poll::Ready(None) => {
                        return Poll::Ready(Err(crate::Error::new_body_write(
                            "send stream capacity unexpectedly closed",
                        )));
                    }
                    Poll::Pending => break 'capacity, // <-- BUG: should be `return Poll::Pending`
                }
            }
        }

        match me.h2_tx.poll_reset(cx) { ... }

        match me.rx.as_mut().poll_next(cx) {    // <-- continues to drain data from channel
            Poll::Ready(Some(cursor)) => {
                me.h2_tx
                    .send_data(SendBuf::Cursor(cursor), false) // <-- sends with 0 capacity!
                    .map_err(crate::Error::new_body_write)?;
            }
            ...
        }
    }
}

The critical error is at L98: Poll::Pending => break 'capacity.

break 'capacity only exits the inner 'capacity loop, but the outer loop { ... } continues execution. The code proceeds to:

  1. poll_reset(cx) — typically returns Pending (no reset), registers waker
  2. rx.poll_next(cx)reads data from the mpsc channel (which is almost always Ready because the channel has capacity 1 and the writer — H2Upgraded::poll_write — eagerly fills it)
  3. send_data()sends the data with zero h2 flow control capacity

Why send_data() Accepts Data Without Capacity

h2::SendStream::send_data() does not check or enforce flow control capacity. Looking at h2-0.4.13/src/proto/streams/prioritize.rs L145-222:

pub fn send_data<B>(
    &mut self,
    frame: frame::Data<B>,
    buffer: &mut Buffer<Frame<B>>,
    stream: &mut store::Ptr,
    counts: &mut Counts,
    task: &mut Option<Waker>,
) -> Result<(), UserError>
where
    B: Buf,
{
    let sz = frame.payload().remaining();
    // ...

    // Unconditionally increases buffered data counter
    stream.buffered_send_data += sz as usize;

    // Implicitly requests more capacity if needed
    if (stream.requested_send_capacity as usize) < stream.buffered_send_data {
        stream.requested_send_capacity = ...;
        self.try_assign_capacity(stream);
    }

    // If no flow control window available, just buffers the frame (no error!)
    if stream.send_flow.available() > 0 || stream.buffered_send_data == 0 {
        self.queue_frame(frame.into(), buffer, stream, task);
    } else {
        stream.pending_send.push_back(buffer, frame.into()); // <-- buffered indefinitely
    }

    Ok(()) // <-- always succeeds
}

The capacity check is done exclusively by poll_capacity(), which returns the value from stream.capacity(max_buffer_size):

// h2-0.4.13/src/proto/streams/stream.rs L275-279
pub fn capacity(&self, max_buffer_size: usize) -> WindowSize {
    let available = self.send_flow.available().as_size() as usize;
    let buffered = self.buffered_send_data;
    available.min(max_buffer_size).saturating_sub(buffered) as WindowSize
}

When buffered_send_data >= max_buffer_size, capacity() returns 0, and poll_capacity() returns Pending. This is the only backpressure mechanism — and hyper's break 'capacity bypasses it entirely.

The Feedback Loop

Once the bug triggers, a destructive feedback loop forms:

TCP write to downstream blocks (slow client / network congestion)
  → h2 connection cannot flush DATA frames to TCP
  → h2 stream flow control window fills up
  → poll_capacity() returns Pending (capacity == 0)
  → tick() hits `break 'capacity` (BUG: should return Pending)
  → tick() continues to rx.poll_next() → gets data from channel
  → tick() calls send_data() → data goes into unbounded h2 per-stream buffer
  → channel slot is now empty → poll_write returns Ready
  → upstream TCP read continues (no backpressure!)
  → more data read from upstream → sent to channel → repeat
  → h2 send buffer grows WITHOUT BOUND
  → memory consumption: upstream_bandwidth × time_blocked → OOM

Comparison with Correct Implementation (hyper ≤ v1.7.0)

In hyper v1.0.0 through v1.7.0, H2Upgraded::poll_write directly interacts with h2 SendStream without any intermediate channel or spawned task:

// hyper v1.6.0, src/proto/h2/mod.rs L317-352
impl<B> Write for H2Upgraded<B>
where
    B: Buf,
{
    fn poll_write(
        mut self: Pin<&mut Self>,
        cx: &mut Context<'_>,
        buf: &[u8],
    ) -> Poll<Result<usize, std::io::Error>> {
        if buf.is_empty() {
            return Poll::Ready(Ok(0));
        }
        self.send_stream.reserve_capacity(buf.len());

        let cnt = match ready!(self.send_stream.poll_capacity(cx)) {
            //         ^^^^^^ -- KEY: `ready!()` macro returns Poll::Pending if not ready
            None => Some(0),
            Some(Ok(cnt)) => self
                .send_stream
                .write(&buf[..cnt], false)
                .ok()
                .map(|()| cnt),
            Some(Err(_)) => None,
        };
        // ...
    }
}

Why the Old Implementation Is Correct

  1. ready!() macro: When poll_capacity() returns Poll::Pending, the ready!() macro immediately returns Poll::Pending from poll_write. This is the correct behavior — it tells the caller (the bidirectional copy loop) to stop reading from upstream until h2 has capacity to send.
  2. Direct flow control: There is no intermediate buffer between the caller and h2. The caller writes directly into h2's SendStream::write(), which only accepts exactly as many bytes as poll_capacity() reported available.
  3. Tight backpressure coupling: The poll_write caller, h2 flow control, and TCP write are all in the same task, with no channel indirection that could break the backpressure chain.

Why the New Implementation Broke

The refactoring in PR #3967 decoupled the IO path into two separate tasks connected by an mpsc::channel(1):

[poll_write caller] --mpsc::channel(1)--> [UpgradedSendStreamTask] --h2 SendStream--> [TCP]

The intent was that mpsc::channel(1) would naturally provide backpressure: when the send task can't flush to h2, the channel stays full, causing poll_write to return Pending. However, the break 'capacity bug means the send task always drains the channel regardless of h2 capacity, so the channel is effectively always empty, and poll_write is effectively never blocked.

How to Reproduce

Minimal Reproduction Setup

  1. An HTTP/2 server that accepts CONNECT requests and upgrades them (the proxy)
  2. An upstream TCP target that sends data rapidly
  3. A downstream HTTP/2 client whose TCP connection becomes slow/blocked

Reproduction Steps

  1. Establish an HTTP/2 CONNECT tunnel through the proxy
  2. Have the upstream target send data at high rate (e.g., dd if=/dev/urandom | nc)
  3. Throttle or block the downstream client's TCP read (e.g., using tc netem or simply not reading from the client socket)
  4. Observe: the proxy's memory will grow linearly with upstream_rate × time_blocked, without any bound

Reproduction Pseudo-Code

// 1. Start a hyper HTTP/2 server with CONNECT support (using default config)
let builder = http2::Builder::new(TokioExecutor::new());
// ... serve_connection with a handler that connects upstream and does bidirectional copy

// 2. Upstream: rapidly send data
let upstream = TcpListener::bind("127.0.0.1:9000").await?;
tokio::spawn(async move {
    let (mut stream, _) = upstream.accept().await.unwrap();
    let data = vec![0u8; 8192];
    loop {
        stream.write_all(&data).await.unwrap();
    }
});

// 3. Client: establish H2 CONNECT, then stop reading
let (mut send_request, conn) = hyper::client::conn::http2::Builder::new(TokioExecutor::new())
    .handshake(io)
    .await?;

let req = Request::connect("127.0.0.1:9000").body(Empty::new())?;
let res = send_request.send_request(req).await?;
let upgraded = hyper::upgrade::on(res).await?;

// 4. Simply stop reading from `upgraded` — simulate slow/blocked downstream
tokio::time::sleep(Duration::from_secs(120)).await;
// After 120s, observe proxy memory: should have grown by upstream_rate * 120s

Observable Symptoms

  • Memory: grows linearly over time (rate = upstream data throughput)
  • CPU: elevated due to tick() hot-looping (it re-polls on every channel receive)
  • Network: inbound traffic sustained, outbound traffic drops to near zero
  • Tokio tasks: stable count (no new tasks spawned — the bug is within a single task's loop)

Suggested Fix

Replace break 'capacity with return Poll::Pending at L98 of src/proto/h2/upgrade.rs:

// Before (BUG):
Poll::Pending => break 'capacity,

// After (FIX):
Poll::Pending => return Poll::Pending,

This ensures that when h2 has no flow control capacity, the entire tick() function returns Pending, the mpsc channel slot stays occupied, and H2Upgraded::poll_write also returns Pending — restoring the backpressure chain.

Verification

After the fix, the backpressure chain becomes:

h2 flow control full → poll_capacity returns Pending
  → tick() returns Pending → channel slot stays full
  → poll_write returns Pending → copy loop stops reading upstream
  → memory stays bounded at O(buffer_size) per tunnel

Workaround

  1. Downgrade to hyper v1.7.0: The old implementation has correct backpressure. However, it relies on unsafe transmute with Neutered<B> (a repr(transparent) uninhabited type), which may break with future Rust compiler versions (see rust-lang/rust#147588).
  2. Memory limits with OOM monitoring: Ensure resource limits are set and OOM events trigger alerts and automatic restarts. This does not prevent the bug but limits blast radius.

Code Sample

fn tick(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Result<(), crate::Error>> {
    let mut me = self.project();

    loop {
        me.h2_tx.reserve_capacity(1);

        if me.h2_tx.capacity() == 0 {
            'capacity: loop {
                match me.h2_tx.poll_capacity(cx) {
                    Poll::Ready(Some(Ok(0))) => {}
                    Poll::Ready(Some(Ok(_))) => break,
                    Poll::Ready(Some(Err(e))) => {
                        return Poll::Ready(Err(crate::Error::new_body_write(e)))
                    }
                    Poll::Ready(None) => {
                        return Poll::Ready(Err(crate::Error::new_body_write(
                            "send stream capacity unexpectedly closed",
                        )));
                    }
                    Poll::Pending => break 'capacity, // <-- BUG: should be `return Poll::Pending`
                }
            }
        }

        match me.h2_tx.poll_reset(cx) { ... }

        match me.rx.as_mut().poll_next(cx) {    // <-- continues to drain data from channel
            Poll::Ready(Some(cursor)) => {
                me.h2_tx
                    .send_data(SendBuf::Cursor(cursor), false) // <-- sends with 0 capacity!
                    .map_err(crate::Error::new_body_write)?;
            }
            ...
        }
    }
}

Expected Behavior

poll_capacity() backpressure effective, no data will push to h2 send buffer if h2 sending window capacity = 0

Actual Behavior

unbounded memory growth in the h2 per-stream send buffer, leading to OOM

Additional Context

Production Incident

Environment: Kubernetes cluster, Linux x86_64
Time: 21:15:30 ~ 21:16:09
Component: HTTP/2 CONNECT tunnel proxy (server side)

Timeline

Time NIC Inbound (target→proxy) NIC Outbound (proxy→client) RSS Memory Notes
21:15:30 705 Mbps 705 Mbps 165 MB Inbound/outbound perfectly balanced, all normal
21:15:30+ Climbing Slowing, falling behind inbound Rising fast Backpressure chain breaks: downstream h2 send rate can't keep up, delta accumulates in memory
21:15:45 1.36 Gbps 892 Mbps Delta reaches 468 Mbps (≈58.5 MB/s accumulating into h2 send buffer)
21:16:00 1.55 Gbps 89 Mbps 4.51 GB Outbound nearly collapsed (severe downstream TCP congestion), inbound still at 1.55 Gbps, all buffered in memory
21:16:09 > 8 GB RSS exceeds 8Gi limit, OOMKilled

Key Observations

  • From 21:15:30 (balanced) to 21:16:09 (OOM), the entire incident lasted only 39 seconds
  • Memory grew from 165MB to 8GB+, delta ≈ 7.8GB, average accumulation rate ≈ 200 MB/s
  • Two proxy instances OOMed sequentially (resource limit: memory: 8Gi)

Key Metrics During Incident

Metric Observation
Tunnel count < 500 (no spike)
TCP connections < 2000 (no spike)
Tokio task count 6000, No change (no new task spawns)
NIC inbound Climbed from 705 Mbps to 1.55 Gbps
NIC outbound Climbed slowly to 892 Mbps, then collapsed to 89 Mbps
CPU Elevated (tick hot-loop consuming CPU)

Precise Correlation with Bug Mechanism

The timeline above perfectly matches the theoretical model of break 'capacity backpressure bypass:

Phase 1: Backpressure Chain Breaks (21:15:30 ~ 21:15:45)

At 21:15:30, inbound/outbound are balanced (705 Mbps), meaning h2 flow control and TCP write capacity could fully absorb upstream data. Then upstream traffic climbs while downstream TCP begins to congest (possibly client slowdown or network jitter), and h2 flow control windows gradually fill up. Under correct backpressure, poll_capacity() returning Pending should stop upstream reads, causing inbound traffic to also drop. But break 'capacity causes tick() to keep draining the channel and calling send_data(), so upstream reads continue unblocked and inbound traffic keeps climbing.

Phase 2: Vicious Cycle Acceleration (21:15:45 ~ 21:16:00)

The inbound-outbound delta widens from 468 Mbps to 1.46 Gbps (1.55 - 0.089). Outbound plummets from 892 Mbps to 89 Mbps, indicating severe downstream TCP congestion (TCP send buffer full, window shrinking). Yet inbound traffic keeps rising — precisely because the backpressure chain is completely broken: poll_write always returns Ready, so the copy loop reads from upstream TCP at full speed. All net inbound data accumulates in h2 per-stream send buffers.

Phase 3: OOM (21:16:00 ~ 21:16:09)

In the 9 seconds from 21:16:00 to 21:16:09, inbound ≈ 1.55 Gbps (193.75 MB/s), net accumulation rate ≈ (1.55 - 0.089) Gbps ≈ 182 MB/s. 9 seconds of accumulation ≈ 1.64 GB. Adding the 4.51 GB already accumulated by 21:16:00: 4.51 + 1.64 = 6.15 GB. Accounting for jemalloc arena overhead and fragmentation amplification (typically 1.2-1.5x), actual RSS reaches 7.4 ~ 9.2 GB, precisely hitting the 8Gi OOM threshold.

Why 500 Tunnels Can Cause 8GB OOM

Without the bug, each tunnel's memory is bounded: ~30KB (2×8KB buffers + h2 overhead).
With the bug, each affected tunnel's memory is unbounded: upstream_bandwidth × time_blocked.

Only a simple combination of conditions is needed: fast upstream target (e.g., CDN, large file download) + slow downstream client (e.g., GC pause, network jitter, processing delay). In this incident, peak inbound traffic reached 1.55 Gbps, and at this rate, 39 seconds was enough to fill 8GB of memory.

Even a single tunnel with a fast upstream and blocked downstream can cause OOM.

Metadata

Metadata

Assignees

No one assigned

    Labels

    C-bugCategory: bug. Something is wrong. This is bad!

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions