-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
Describe the bug
AwsChunkedStream::BufferedRead() uses a std::stringstream (m_chunkingStream) as an intermediate buffer. On every call, it reads a 64KB chunk from the source stream and appends the chunked-encoded data via writeChunk(). The std::stringstream's underlying std::string never reclaims memory behind the read pointer — clear() only resets error/EOF flags, not the buffer.
For a single PutObject of size N, the stringstream's backing string grows monotonically to ~N bytes. At ~4GB, std::string::reserve() attempts to double to 8GB+1 (0x200000001), which exceeds the maximum supported allocation size (0x200000000), causing SIGABRT / std::bad_alloc.
Root cause location: AwsChunkedStream.h, BufferedRead() and writeChunk()
The clear() calls on lines 82, 93 only reset stream state flags (eofbit/failbit). They do not call str("") to release the underlying buffer. The pptr (write pointer) advances on every writeChunk() but consumed data is never freed.
Regression Issue
- Select this option if this issue appears to be a regression.
Expected Behavior
Memory usage during chunked upload should be O(1) — bounded by the chunk size (~65KB), not the total upload size.
Current Behavior
Memory grows linearly with bytes uploaded. A 5GB single PutObject consumes ~10GB of RAM for the stringstream buffer alone. At ~4GB the allocation fails:
==12345==ERROR: AddressSanitizer: requested allocation size 0x200000001 exceeds maximum supported size of 0x200000000
#0 in operator new(unsigned long)
#1 in std::__cxx11::basic_string::reserve(unsigned long)
#2 in std::__cxx11::basic_string::_M_replace_aux(unsigned long, unsigned long, unsigned long, char)
#3 in Aws::Utils::Stream::AwsChunkedStream<65536ul>::writeChunk(unsigned long)
#4 in Aws::Utils::Stream::AwsChunkedStream<65536ul>::BufferedRead(char*, unsigned long)
#5 in Aws::Http::CurlHttpClient::MakeRequest() [ReadBody]
Reproduction Steps
Use (CURL-based, not CRT) with SDK >= v1.11.486
Issue a single PutObject for an object > ~2GB
Set a checksum algorithm (or rely on the new default CRC trailing checksum from v1.11.486+)
Observe monotonic memory growth proportional to bytes uploaded
Aws::S3::Model::PutObjectRequest request;
request.SetBucket("my-bucket");
request.SetKey("large-object");
request.SetBody(largeStream); // > 2GB, non-seekable
request.SetChecksumAlgorithm(Aws::S3::Model::ChecksumAlgorithm::CRC32C);
// OR just rely on the SDK default (crc64nvme since v1.11.486)
auto outcome = s3Client.PutObject(request);
// → OOM crash during upload
Possible Solution
Before writing a new chunk when the stream is fully drained, reset the stringstream buffer:
size_t BufferedRead(char *dst, size_t amountToRead) {
assert(dst != nullptr);
bool chunkingStreamEmpty =
(m_chunkingStream->peek() == EOF || m_chunkingStream->eof()) &&
!m_chunkingStream->bad();
if (chunkingStreamEmpty && m_stream->good()) {
// CRITICAL: Reset the stringstream buffer to free consumed memory.
// clear() only resets flags; str("") actually releases the string.
m_chunkingStream->str("");
m_chunkingStream->clear();
m_stream->read(m_data.GetUnderlyingData(), DataBufferSize);
size_t bytesRead = static_cast<size_t>(m_stream->gcount());
writeChunk(bytesRead);
if ((m_stream->peek() == EOF || m_stream->eof()) && !m_stream->bad()) {
writeTrailerToUnderlyingStream();
}
}
// ... rest unchanged
This bounds memory at ~130KB (one 65KB read buffer + one 65KB+overhead stringstream buffer).
Additional Information/Context
Pre-v1.11.486: This code path was unreachable because MD5 checksums were computed in the header by default. The aws-chunked encoding was never activated for PutObject.
v1.11.486+ ([#3253]): Default S3 integrity changed to CRC trailing checksums, activating AwsChunkedStream for all PutObject calls.
PR [#3635]
This issue specifically affects users doing single large PutObject (not multipart) via the CURL-based S3Client. S3CrtClient and TransferManager users are unaffected because they split into small parts.
AWS CPP SDK version used
1.11.638
Compiler and Version used
GCC 11
Operating System and version
Linux