Skip to content

AwsChunkedStream: unbounded std::stringstream memory growth causes OOM #3732

@Shekaboy

Description

@Shekaboy

Describe the bug

AwsChunkedStream::BufferedRead() uses a std::stringstream (m_chunkingStream) as an intermediate buffer. On every call, it reads a 64KB chunk from the source stream and appends the chunked-encoded data via writeChunk(). The std::stringstream's underlying std::string never reclaims memory behind the read pointer — clear() only resets error/EOF flags, not the buffer.

For a single PutObject of size N, the stringstream's backing string grows monotonically to ~N bytes. At ~4GB, std::string::reserve() attempts to double to 8GB+1 (0x200000001), which exceeds the maximum supported allocation size (0x200000000), causing SIGABRT / std::bad_alloc.

Root cause location: AwsChunkedStream.h, BufferedRead() and writeChunk()

The clear() calls on lines 82, 93 only reset stream state flags (eofbit/failbit). They do not call str("") to release the underlying buffer. The pptr (write pointer) advances on every writeChunk() but consumed data is never freed.

Regression Issue

  • Select this option if this issue appears to be a regression.

Expected Behavior

Memory usage during chunked upload should be O(1) — bounded by the chunk size (~65KB), not the total upload size.

Current Behavior

Memory grows linearly with bytes uploaded. A 5GB single PutObject consumes ~10GB of RAM for the stringstream buffer alone. At ~4GB the allocation fails:

==12345==ERROR: AddressSanitizer: requested allocation size 0x200000001 exceeds maximum supported size of 0x200000000
#0 in operator new(unsigned long)
#1 in std::__cxx11::basic_string::reserve(unsigned long)
#2 in std::__cxx11::basic_string::_M_replace_aux(unsigned long, unsigned long, unsigned long, char)
#3 in Aws::Utils::Stream::AwsChunkedStream<65536ul>::writeChunk(unsigned long)
#4 in Aws::Utils::Stream::AwsChunkedStream<65536ul>::BufferedRead(char*, unsigned long)
#5 in Aws::Http::CurlHttpClient::MakeRequest() [ReadBody]

Reproduction Steps

Use (CURL-based, not CRT) with SDK >= v1.11.486
Issue a single PutObject for an object > ~2GB
Set a checksum algorithm (or rely on the new default CRC trailing checksum from v1.11.486+)
Observe monotonic memory growth proportional to bytes uploaded

Aws::S3::Model::PutObjectRequest request;
request.SetBucket("my-bucket");
request.SetKey("large-object");
request.SetBody(largeStream); // > 2GB, non-seekable
request.SetChecksumAlgorithm(Aws::S3::Model::ChecksumAlgorithm::CRC32C);
// OR just rely on the SDK default (crc64nvme since v1.11.486)

auto outcome = s3Client.PutObject(request);
// → OOM crash during upload

Possible Solution

Before writing a new chunk when the stream is fully drained, reset the stringstream buffer:

size_t BufferedRead(char *dst, size_t amountToRead) {
assert(dst != nullptr);

bool chunkingStreamEmpty =
    (m_chunkingStream->peek() == EOF || m_chunkingStream->eof()) &&
    !m_chunkingStream->bad();

if (chunkingStreamEmpty && m_stream->good()) {
  // CRITICAL: Reset the stringstream buffer to free consumed memory.
  // clear() only resets flags; str("") actually releases the string.
  m_chunkingStream->str("");
  m_chunkingStream->clear();

  m_stream->read(m_data.GetUnderlyingData(), DataBufferSize);
  size_t bytesRead = static_cast<size_t>(m_stream->gcount());
  writeChunk(bytesRead);

  if ((m_stream->peek() == EOF || m_stream->eof()) && !m_stream->bad()) {
    writeTrailerToUnderlyingStream();
  }
}
// ... rest unchanged

This bounds memory at ~130KB (one 65KB read buffer + one 65KB+overhead stringstream buffer).

Additional Information/Context

Pre-v1.11.486: This code path was unreachable because MD5 checksums were computed in the header by default. The aws-chunked encoding was never activated for PutObject.

v1.11.486+ ([#3253]): Default S3 integrity changed to CRC trailing checksums, activating AwsChunkedStream for all PutObject calls.

PR [#3635]
This issue specifically affects users doing single large PutObject (not multipart) via the CURL-based S3Client. S3CrtClient and TransferManager users are unaffected because they split into small parts.

AWS CPP SDK version used

1.11.638

Compiler and Version used

GCC 11

Operating System and version

Linux

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugThis issue is a bug.response-requestedWaiting on additional info and feedback. Will move to "closing-soon" in 10 days.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions