Worked Copilot suggestion from #13098
Component: iocore/net — TLS cert lookup
Files:
src/iocore/net/SSLCertLookup.cc
src/iocore/net/P_SSLCertLookup.h
Summary
SSLCertContext::operator= (src/iocore/net/SSLCertLookup.cc:245-256) mutates this->ctx and the four sibling fields (opt, userconfig, keyblock, ctx_type) without acquiring this->ctx_mutex. The class's documented contract — established by getCtx() / setCtx() (src/iocore/net/SSLCertLookup.cc:259-271) and by the comment in P_SSLCertLookup.h:122 ("Threadsafe Functions to get and set shared SSL_CTX pointer") — is that ctx is protected by ctx_mutex. operator= violates that invariant: a thread copy-assigning to *cc races against any concurrent cc->getCtx() / cc->setCtx() reader or writer, because the assigner takes a lock only on other.ctx_mutex, never on this->ctx_mutex.
The same gap is present (less severely) for opt, userconfig, keyblock, ctx_type: those fields are read directly throughout the codebase without any lock, and operator= writes them with no synchronization at all.
Why this matters
Published SSLCertContext instances are mutated post-publication today. The cert-secret reload path calls setCtx() on entries returned by lookup->find(...):
src/iocore/net/SSLUtils.cc:1704
src/iocore/net/SSLUtils.cc:1729
There is currently no in-tree call site that invokes operator= on a published SSLCertContext (insertion goes through SSLContextStorage::store() → ctx_store.push_back(cc) at src/iocore/net/SSLCertLookup.cc:437-441, which uses the copy constructor, not operator=, even on vector reallocation). So the bug is latent. But the class invariant is wrong as stated: a future change that assigns to a published entry — e.g., a partial-reload variant that swaps a slot in-place rather than rebuilding the whole SSLCertLookup — would silently introduce a data race that current readers via getCtx() would not be protected against.
Reproduction sketch (conceptual — not currently triggerable in tree)
SSLCertContext *cc = lookup->find(name); // published entry
// Thread A:
auto p = cc->getCtx(); // lock_guard(ctx_mutex), reads ctx
// Thread B:
*cc = SSLCertContext(new_ctx, ...); // takes other.ctx_mutex,
// then writes cc->ctx with NO lock on cc->ctx_mutex
Thread A and Thread B race on cc->ctx (a std::shared_ptr). Tearing during the assignment can leave cc->ctx observably half-updated to A; Thread A may also load a refcount/control-block pair that disagree, leading to use-after-free or double-decrement on the underlying SSL_CTX.
Possible Fix
Lock both other.ctx_mutex and this->ctx_mutex in operator=, using std::lock to acquire them in a deadlock-free order, and move all field assignments inside the locked section so the assignment is atomic from any reader's perspective. The self-assignment guard (&other != this) must remain to avoid double-locking the same mutex.
Apply this patch to src/iocore/net/SSLCertLookup.cc:245-256:
SSLCertContext &
SSLCertContext::operator=(SSLCertContext const &other)
{
if (&other != this) {
std::unique_lock<std::mutex> other_lock(other.ctx_mutex, std::defer_lock);
std::unique_lock<std::mutex> this_lock(this->ctx_mutex, std::defer_lock);
std::lock(other_lock, this_lock);
this->opt = other.opt;
this->userconfig = other.userconfig;
this->keyblock = other.keyblock;
this->ctx_type = other.ctx_type;
this->ctx = other.ctx;
}
return *this;
}
Note: the copy constructor at src/iocore/net/SSLCertLookup.cc:234-242 is not affected — this is not yet observable during construction, so locking this->ctx_mutex is unnecessary. Locking other.ctx_mutex is sufficient and is already done.
Worked Copilot suggestion from #13098
Component:
iocore/net— TLS cert lookupFiles:
src/iocore/net/SSLCertLookup.ccsrc/iocore/net/P_SSLCertLookup.hSummary
SSLCertContext::operator=(src/iocore/net/SSLCertLookup.cc:245-256) mutatesthis->ctxand the four sibling fields (opt,userconfig,keyblock,ctx_type) without acquiringthis->ctx_mutex. The class's documented contract — established bygetCtx()/setCtx()(src/iocore/net/SSLCertLookup.cc:259-271) and by the comment inP_SSLCertLookup.h:122("Threadsafe Functions to get and set shared SSL_CTX pointer") — is thatctxis protected byctx_mutex.operator=violates that invariant: a thread copy-assigning to*ccraces against any concurrentcc->getCtx()/cc->setCtx()reader or writer, because the assigner takes a lock only onother.ctx_mutex, never onthis->ctx_mutex.The same gap is present (less severely) for
opt,userconfig,keyblock,ctx_type: those fields are read directly throughout the codebase without any lock, andoperator=writes them with no synchronization at all.Why this matters
Published
SSLCertContextinstances are mutated post-publication today. The cert-secret reload path callssetCtx()on entries returned bylookup->find(...):src/iocore/net/SSLUtils.cc:1704src/iocore/net/SSLUtils.cc:1729There is currently no in-tree call site that invokes
operator=on a publishedSSLCertContext(insertion goes throughSSLContextStorage::store()→ctx_store.push_back(cc)atsrc/iocore/net/SSLCertLookup.cc:437-441, which uses the copy constructor, notoperator=, even on vector reallocation). So the bug is latent. But the class invariant is wrong as stated: a future change that assigns to a published entry — e.g., a partial-reload variant that swaps a slot in-place rather than rebuilding the wholeSSLCertLookup— would silently introduce a data race that current readers viagetCtx()would not be protected against.Reproduction sketch (conceptual — not currently triggerable in tree)
Thread A and Thread B race on
cc->ctx(astd::shared_ptr). Tearing during the assignment can leavecc->ctxobservably half-updated to A; Thread A may also load a refcount/control-block pair that disagree, leading to use-after-free or double-decrement on the underlyingSSL_CTX.Possible Fix
Lock both
other.ctx_mutexandthis->ctx_mutexinoperator=, usingstd::lockto acquire them in a deadlock-free order, and move all field assignments inside the locked section so the assignment is atomic from any reader's perspective. The self-assignment guard (&other != this) must remain to avoid double-locking the same mutex.Apply this patch to
src/iocore/net/SSLCertLookup.cc:245-256:Note: the copy constructor at
src/iocore/net/SSLCertLookup.cc:234-242is not affected —thisis not yet observable during construction, so lockingthis->ctx_mutexis unnecessary. Lockingother.ctx_mutexis sufficient and is already done.