fix(transaction): prevent concurrent map access in monitor#5309
fix(transaction): prevent concurrent map access in monitor#5309mfw78 wants to merge 2 commits intoethersphere:masterfrom
Conversation
52325e9 to
6396e4b
Compare
6396e4b to
a17e3a9
Compare
| if receipt != nil { | ||
| // if we have a receipt we have a confirmation | ||
| confirmedNonces[nonceGroup] = receipt | ||
| confirmedNonces[nonce] = receipt |
There was a problem hiding this comment.
I think we could stop checking other txHashes here, for this nonce, because we found the receipt?
| continue | ||
| } | ||
|
|
||
| oldNonce, err := tm.backend.NonceAt(tm.ctx, tm.sender, new(big.Int).SetUint64(block-tm.cancellationDepth)) |
There was a problem hiding this comment.
I do not see the reason to call tm.backend.NonceAt multiple times in the loop, this should be lazy loaded (we should only fetch oldNonce once and only if there are unconfirmed nonces to check). Maybe we can use this PR to improve this?
| // if we have a receipt we have a confirmation | ||
| confirmedNonces[nonceGroup] = receipt | ||
| confirmedNonces[nonce] = receipt | ||
| break // found receipt for this nonce, no need to check other tx hashes |
There was a problem hiding this comment.
@mfw78 we discussed this change and figured that this change, being orthogonal to the original PR should be reverted. Ideally let's just stick to the concurrent map access related changes. Thanks!
|
I would also suggest that the other changes should be reverted. This component is critical and I'm not sure we have unit test coverage to see that these extra changes are inconsequential. I'd suggest to stick to the original changeset. |
Can you kindly confirm if by this you mean to just simply revert to a17e3a9? |
Checklist
Description
Fixes a race condition in
pkg/transaction/monitor.gothat causesfatal error: concurrent map iteration and map writepanics.The
checkPending()function was iterating overwatchesByNoncewithout holding the lock, whileWatchTransaction()could concurrently modify the map. Classic Go map concurrency footgun. 🔫Solution: Snapshot-based approach that minimizes lock contention:
TransactionReceipt,NonceAt)This avoids blocking
WatchTransaction()callers during potentially slow (100-500ms) RPC operations.Open API Spec Version Changes (if applicable)
N/A
Motivation and Context
Discovered while running Bee nodes in a Kurtosis test environment. The panic occurred during chequebook deployment when transaction monitoring was active.
Related Issue (Optional)
N/A
Screenshots (if appropriate):
N/A