HTTP/2 CONNECT Upgraded stream bypasses H2 flow-control backpressure, causing unbounded memory growth (OOM)

### Version

hyper: 1.8.0, 1.8.1, 1.9.0

### Platform

Linux x86_64 (production Kubernetes cluster, but platform-independent bug)

### Summary



## Description

`UpgradedSendStreamTask::tick()` in `src/proto/h2/upgrade.rs` has a logic error where `break 'capacity` (L98) allows `send_data()` to be called even when h2 flow control capacity is zero. This completely bypasses the `max_send_buffer_size` backpressure mechanism, causing unbounded memory growth in the h2 per-stream send buffer when the downstream TCP write is slow or blocked. In production, this manifests as rapid OOM (165MB → 8GB+ in \~2 minutes) for HTTP/2 CONNECT tunnel proxies.

### Architecture Overview (v1.8.0+)

PR #3967 refactored `H2Upgraded` to eliminate `unsafe` transmute code. The new architecture introduces:

1. **`H2Upgraded::poll_write`** — writes data into an `mpsc::channel(1)` (the "bridge")
2. **`UpgradedSendStreamTask::tick()`** — a separate spawned task that reads from the channel and calls `h2::SendStream::send_data()`

The backpressure contract is supposed to be:

- `mpsc::channel(1)` provides 1 slot of buffering between `poll_write` and the send task
- `tick()` checks h2 flow control capacity via `poll_capacity()` before calling `send_data()`
- When capacity is 0, `tick()` should suspend (return `Poll::Pending`), which leaves the channel slot occupied, causing `poll_write` to also return `Poll::Pending`

### The Bug

In `UpgradedSendStreamTask::tick()` (`src/proto/h2/upgrade.rs`, L68-133):

```rust
fn tick(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Result<(), crate::Error>> {
 let mut me = self.project();

 loop {
 me.h2_tx.reserve_capacity(1);

 if me.h2_tx.capacity() == 0 {
 'capacity: loop {
 match me.h2_tx.poll_capacity(cx) {
 Poll::Ready(Some(Ok(0))) => {}
 Poll::Ready(Some(Ok(_))) => break,
 Poll::Ready(Some(Err(e))) => {
 return Poll::Ready(Err(crate::Error::new_body_write(e)))
 }
 Poll::Ready(None) => {
 return Poll::Ready(Err(crate::Error::new_body_write(
 "send stream capacity unexpectedly closed",
 )));
 }
 Poll::Pending => break 'capacity, // <-- BUG: should be `return Poll::Pending`
 }
 }
 }

 match me.h2_tx.poll_reset(cx) { ... }

 match me.rx.as_mut().poll_next(cx) { // <-- continues to drain data from channel
 Poll::Ready(Some(cursor)) => {
 me.h2_tx
 .send_data(SendBuf::Cursor(cursor), false) // <-- sends with 0 capacity!
 .map_err(crate::Error::new_body_write)?;
 }
 ...
 }
 }
}
```

**The critical error is at L98**: `Poll::Pending => break 'capacity`.

`break 'capacity` only exits the inner `'capacity` loop, but the outer `loop { ... }` continues execution. The code proceeds to:

1. `poll_reset(cx)` — typically returns `Pending` (no reset), registers waker
2. `rx.poll_next(cx)` — **reads data from the mpsc channel** (which is almost always `Ready` because the channel has capacity 1 and the writer — `H2Upgraded::poll_write` — eagerly fills it)
3. `send_data()` — **sends the data with zero h2 flow control capacity**

### Why `send_data()` Accepts Data Without Capacity

`h2::SendStream::send_data()` does **not** check or enforce flow control capacity. Looking at `h2-0.4.13/src/proto/streams/prioritize.rs` L145-222:

```rust
pub fn send_data(
 &mut self,
 frame: frame::Data,
 buffer: &mut Buffer<Frame>,
 stream: &mut store::Ptr,
 counts: &mut Counts,
 task: &mut Option<Waker>,
) -> Result<(), UserError>
where
 B: Buf,
{
 let sz = frame.payload().remaining();
 // ...

 // Unconditionally increases buffered data counter
 stream.buffered_send_data += sz as usize;

 // Implicitly requests more capacity if needed
 if (stream.requested_send_capacity as usize) < stream.buffered_send_data {
 stream.requested_send_capacity = ...;
 self.try_assign_capacity(stream);
 }

 // If no flow control window available, just buffers the frame (no error!)
 if stream.send_flow.available() > 0 || stream.buffered_send_data == 0 {
 self.queue_frame(frame.into(), buffer, stream, task);
 } else {
 stream.pending_send.push_back(buffer, frame.into()); // <-- buffered indefinitely
 }

 Ok(()) // <-- always succeeds
}
```

The capacity check is done **exclusively** by `poll_capacity()`, which returns the value from `stream.capacity(max_buffer_size)`:

```rust
// h2-0.4.13/src/proto/streams/stream.rs L275-279
pub fn capacity(&self, max_buffer_size: usize) -> WindowSize {
 let available = self.send_flow.available().as_size() as usize;
 let buffered = self.buffered_send_data;
 available.min(max_buffer_size).saturating_sub(buffered) as WindowSize
}
```

When `buffered_send_data >= max_buffer_size`, `capacity()` returns 0, and `poll_capacity()` returns `Pending`. This is the **only** backpressure mechanism — and hyper's `break 'capacity` bypasses it entirely.

### The Feedback Loop

Once the bug triggers, a destructive feedback loop forms:

```
TCP write to downstream blocks (slow client / network congestion)
 → h2 connection cannot flush DATA frames to TCP
 → h2 stream flow control window fills up
 → poll_capacity() returns Pending (capacity == 0)
 → tick() hits `break 'capacity` (BUG: should return Pending)
 → tick() continues to rx.poll_next() → gets data from channel
 → tick() calls send_data() → data goes into unbounded h2 per-stream buffer
 → channel slot is now empty → poll_write returns Ready
 → upstream TCP read continues (no backpressure!)
 → more data read from upstream → sent to channel → repeat
 → h2 send buffer grows WITHOUT BOUND
 → memory consumption: upstream_bandwidth × time_blocked → OOM
```

## Comparison with Correct Implementation (hyper ≤ v1.7.0)

In hyper v1.0.0 through v1.7.0, `H2Upgraded::poll_write` directly interacts with h2 `SendStream` without any intermediate channel or spawned task:

```rust
// hyper v1.6.0, src/proto/h2/mod.rs L317-352
impl Write for H2Upgraded
where
 B: Buf,
{
 fn poll_write(
 mut self: Pin<&mut Self>,
 cx: &mut Context<'_>,
 buf: &[u8],
 ) -> Poll<Result<usize, std::io::Error>> {
 if buf.is_empty() {
 return Poll::Ready(Ok(0));
 }
 self.send_stream.reserve_capacity(buf.len());

 let cnt = match ready!(self.send_stream.poll_capacity(cx)) {
 // ^^^^^^ -- KEY: `ready!()` macro returns Poll::Pending if not ready
 None => Some(0),
 Some(Ok(cnt)) => self
 .send_stream
 .write(&buf[..cnt], false)
 .ok()
 .map(|()| cnt),
 Some(Err(_)) => None,
 };
 // ...
 }
}
```

### Why the Old Implementation Is Correct

1. **`ready!()`** **macro**: When `poll_capacity()` returns `Poll::Pending`, the `ready!()` macro **immediately returns** **`Poll::Pending`** from `poll_write`. This is the correct behavior — it tells the caller (the bidirectional copy loop) to stop reading from upstream until h2 has capacity to send.
2. **Direct flow control**: There is no intermediate buffer between the caller and h2. The caller writes directly into h2's `SendStream::write()`, which only accepts exactly as many bytes as `poll_capacity()` reported available.
3. **Tight backpressure coupling**: The `poll_write` caller, h2 flow control, and TCP write are all in the same task, with no channel indirection that could break the backpressure chain.

### Why the New Implementation Broke

The refactoring in PR #3967 decoupled the IO path into two separate tasks connected by an `mpsc::channel(1)`:

```
[poll_write caller] --mpsc::channel(1)--> [UpgradedSendStreamTask] --h2 SendStream--> [TCP]
```

The intent was that `mpsc::channel(1)` would naturally provide backpressure: when the send task can't flush to h2, the channel stays full, causing `poll_write` to return `Pending`. However, the `break 'capacity` bug means the send task **always drains the channel** regardless of h2 capacity, so the channel is effectively always empty, and `poll_write` is effectively never blocked.

## How to Reproduce

### Minimal Reproduction Setup

1. An HTTP/2 server that accepts CONNECT requests and upgrades them (the proxy)
2. An upstream TCP target that sends data rapidly
3. A downstream HTTP/2 client whose TCP connection becomes slow/blocked

### Reproduction Steps

1. Establish an HTTP/2 CONNECT tunnel through the proxy
2. Have the upstream target send data at high rate (e.g., `dd if=/dev/urandom | nc`)
3. Throttle or block the downstream client's TCP read (e.g., using `tc netem` or simply not reading from the client socket)
4. Observe: the proxy's memory will grow linearly with `upstream_rate × time_blocked`, without any bound

### Reproduction Pseudo-Code

```rust
// 1. Start a hyper HTTP/2 server with CONNECT support (using default config)
let builder = http2::Builder::new(TokioExecutor::new());
// ... serve_connection with a handler that connects upstream and does bidirectional copy

// 2. Upstream: rapidly send data
let upstream = TcpListener::bind("127.0.0.1:9000").await?;
tokio::spawn(async move {
 let (mut stream, _) = upstream.accept().await.unwrap();
 let data = vec![0u8; 8192];
 loop {
 stream.write_all(&data).await.unwrap();
 }
});

// 3. Client: establish H2 CONNECT, then stop reading
let (mut send_request, conn) = hyper::client::conn::http2::Builder::new(TokioExecutor::new())
 .handshake(io)
 .await?;

let req = Request::connect("127.0.0.1:9000").body(Empty::new())?;
let res = send_request.send_request(req).await?;
let upgraded = hyper::upgrade::on(res).await?;

// 4. Simply stop reading from `upgraded` — simulate slow/blocked downstream
tokio::time::sleep(Duration::from_secs(120)).await;
// After 120s, observe proxy memory: should have grown by upstream_rate * 120s
```

### Observable Symptoms

- **Memory**: grows linearly over time (rate = upstream data throughput)
- **CPU**: elevated due to `tick()` hot-looping (it re-polls on every channel receive)
- **Network**: inbound traffic sustained, outbound traffic drops to near zero
- **Tokio tasks**: stable count (no new tasks spawned — the bug is within a single task's loop)

## Suggested Fix

Replace `break 'capacity` with `return Poll::Pending` at L98 of `src/proto/h2/upgrade.rs`:

```rust
// Before (BUG):
Poll::Pending => break 'capacity,

// After (FIX):
Poll::Pending => return Poll::Pending,
```

This ensures that when h2 has no flow control capacity, the entire `tick()` function returns `Pending`, the mpsc channel slot stays occupied, and `H2Upgraded::poll_write` also returns `Pending` — restoring the backpressure chain.

### Verification

After the fix, the backpressure chain becomes:

```
h2 flow control full → poll_capacity returns Pending
 → tick() returns Pending → channel slot stays full
 → poll_write returns Pending → copy loop stops reading upstream
 → memory stays bounded at O(buffer_size) per tunnel
```

## Workaround

1. **Downgrade to hyper v1.7.0**: The old implementation has correct backpressure. However, it relies on `unsafe` transmute with `Neutered` (a `repr(transparent)` uninhabited type), which may break with future Rust compiler versions (see [rust-lang/rust#147588](https://github.com/rust-lang/rust/issues/147588)).
2. **Memory limits with OOM monitoring**: Ensure resource limits are set and OOM events trigger alerts and automatic restarts. This does not prevent the bug but limits blast radius.

### Code Sample

```rust
fn tick(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Result<(), crate::Error>> {
 let mut me = self.project();

 loop {
 me.h2_tx.reserve_capacity(1);

 if me.h2_tx.capacity() == 0 {
 'capacity: loop {
 match me.h2_tx.poll_capacity(cx) {
 Poll::Ready(Some(Ok(0))) => {}
 Poll::Ready(Some(Ok(_))) => break,
 Poll::Ready(Some(Err(e))) => {
 return Poll::Ready(Err(crate::Error::new_body_write(e)))
 }
 Poll::Ready(None) => {
 return Poll::Ready(Err(crate::Error::new_body_write(
 "send stream capacity unexpectedly closed",
 )));
 }
 Poll::Pending => break 'capacity, // <-- BUG: should be `return Poll::Pending`
 }
 }
 }

 match me.h2_tx.poll_reset(cx) { ... }

 match me.rx.as_mut().poll_next(cx) { // <-- continues to drain data from channel
 Poll::Ready(Some(cursor)) => {
 me.h2_tx
 .send_data(SendBuf::Cursor(cursor), false) // <-- sends with 0 capacity!
 .map_err(crate::Error::new_body_write)?;
 }
 ...
 }
 }
}
```

### Expected Behavior

`poll_capacity()` backpressure effective, no data will push to h2 send buffer if h2 sending window capacity = 0

### Actual Behavior

unbounded memory growth in the h2 per-stream send buffer, leading to OOM

### Additional Context

## Production Incident

**Environment**: Kubernetes cluster, Linux x86_64
**Time**: 21:15:30 ~ 21:16:09
**Component**: HTTP/2 CONNECT tunnel proxy (server side)

### Timeline

| Time | NIC Inbound (target→proxy) | NIC Outbound (proxy→client) | RSS Memory | Notes |
| :----------: | :------------------------: | :-----------------------------: | :---------: | :--------------------------------------------------------------------------------------------------------------- |
| **21:15:30** | 705 Mbps | 705 Mbps | **165 MB** | Inbound/outbound perfectly balanced, all normal |
| 21:15:30+ | Climbing | Slowing, falling behind inbound | Rising fast | **Backpressure chain breaks**: downstream h2 send rate can't keep up, delta accumulates in memory |
| **21:15:45** | **1.36 Gbps** | **892 Mbps** | — | Delta reaches 468 Mbps (≈58.5 MB/s accumulating into h2 send buffer) |
| **21:16:00** | **1.55 Gbps** | **89 Mbps** | **4.51 GB** | Outbound nearly collapsed (severe downstream TCP congestion), inbound still at 1.55 Gbps, all buffered in memory |
| **21:16:09** | — | — | **> 8 GB** | RSS exceeds 8Gi limit, OOMKilled |

### Key Observations

- From 21:15:30 (balanced) to 21:16:09 (OOM), the **entire incident lasted only 39 seconds**
- Memory grew from 165MB to 8GB+, delta ≈ 7.8GB, average accumulation rate ≈ **200 MB/s**
- **Two proxy instances OOMed sequentially** (resource limit: `memory: 8Gi`)

### Key Metrics During Incident

| Metric | Observation |
| ---------------- | ----------------------------------------------------- |
| Tunnel count | < 500 (no spike) |
| TCP connections | < 2000 (no spike) |
| Tokio task count | 6000, **No change** (no new task spawns) |
| NIC inbound | Climbed from 705 Mbps to 1.55 Gbps |
| NIC outbound | Climbed slowly to 892 Mbps, then collapsed to 89 Mbps |
| CPU | Elevated (tick hot-loop consuming CPU) |

### Precise Correlation with Bug Mechanism

The timeline above **perfectly matches** the theoretical model of `break 'capacity` backpressure bypass:

**Phase 1: Backpressure Chain Breaks (21:15:30 \~ 21:15:45)**

At 21:15:30, inbound/outbound are balanced (705 Mbps), meaning h2 flow control and TCP write capacity could fully absorb upstream data. Then upstream traffic climbs while downstream TCP begins to congest (possibly client slowdown or network jitter), and h2 flow control windows gradually fill up. Under correct backpressure, `poll_capacity()` returning `Pending` should stop upstream reads, causing inbound traffic to also drop. But `break 'capacity` causes `tick()` to keep draining the channel and calling `send_data()`, so upstream reads continue unblocked and inbound traffic keeps climbing.

**Phase 2: Vicious Cycle Acceleration (21:15:45 \~ 21:16:00)**

The inbound-outbound delta widens from 468 Mbps to 1.46 Gbps (1.55 - 0.089). Outbound plummets from 892 Mbps to 89 Mbps, indicating severe downstream TCP congestion (TCP send buffer full, window shrinking). Yet inbound traffic keeps rising — precisely because the backpressure chain is completely broken: `poll_write` always returns `Ready`, so the copy loop reads from upstream TCP at full speed. All net inbound data accumulates in h2 per-stream send buffers.

**Phase 3: OOM (21:16:00 \~ 21:16:09)**

In the 9 seconds from 21:16:00 to 21:16:09, inbound ≈ 1.55 Gbps (193.75 MB/s), net accumulation rate ≈ (1.55 - 0.089) Gbps ≈ 182 MB/s. 9 seconds of accumulation ≈ 1.64 GB. Adding the 4.51 GB already accumulated by 21:16:00: 4.51 + 1.64 = **6.15 GB**. Accounting for jemalloc arena overhead and fragmentation amplification (typically 1.2-1.5x), actual RSS reaches 7.4 \~ 9.2 GB, **precisely hitting the 8Gi OOM threshold**.

### Why 500 Tunnels Can Cause 8GB OOM

Without the bug, each tunnel's memory is bounded: \~30KB (2×8KB buffers + h2 overhead).
With the bug, each affected tunnel's memory is **unbounded**: `upstream_bandwidth × time_blocked`.

Only a simple combination of conditions is needed: fast upstream target (e.g., CDN, large file download) + slow downstream client (e.g., GC pause, network jitter, processing delay). In this incident, peak inbound traffic reached **1.55 Gbps**, and at this rate, **39 seconds was enough to fill 8GB of memory**.

Even a **single tunnel** with a fast upstream and blocked downstream can cause OOM.

Time	NIC Inbound (target→proxy)	NIC Outbound (proxy→client)	RSS Memory	Notes
21:15:30	705 Mbps	705 Mbps	165 MB	Inbound/outbound perfectly balanced, all normal
21:15:30+	Climbing	Slowing, falling behind inbound	Rising fast	Backpressure chain breaks: downstream h2 send rate can't keep up, delta accumulates in memory
21:15:45	1.36 Gbps	892 Mbps	—	Delta reaches 468 Mbps (≈58.5 MB/s accumulating into h2 send buffer)
21:16:00	1.55 Gbps	89 Mbps	4.51 GB	Outbound nearly collapsed (severe downstream TCP congestion), inbound still at 1.55 Gbps, all buffered in memory
21:16:09	—	—	> 8 GB	RSS exceeds 8Gi limit, OOMKilled

Metric	Observation
Tunnel count	< 500 (no spike)
TCP connections	< 2000 (no spike)
Tokio task count	6000, No change (no new task spawns)
NIC inbound	Climbed from 705 Mbps to 1.55 Gbps
NIC outbound	Climbed slowly to 892 Mbps, then collapsed to 89 Mbps
CPU	Elevated (tick hot-loop consuming CPU)

Uh oh!

HTTP/2 CONNECT Upgraded stream bypasses H2 flow-control backpressure, causing unbounded memory growth (OOM) #4049

Description

Version

Platform

Summary

Description

Architecture Overview (v1.8.0+)

The Bug

Why send_data() Accepts Data Without Capacity

The Feedback Loop

Comparison with Correct Implementation (hyper ≤ v1.7.0)

Why the Old Implementation Is Correct

Why the New Implementation Broke

How to Reproduce

Minimal Reproduction Setup

Reproduction Steps

Reproduction Pseudo-Code

Observable Symptoms

Suggested Fix

Verification

Workaround

Code Sample

Expected Behavior

Actual Behavior

Additional Context

Production Incident

Timeline

Key Observations

Key Metrics During Incident

Precise Correlation with Bug Mechanism

Why 500 Tunnels Can Cause 8GB OOM

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Why `send_data()` Accepts Data Without Capacity