Go: use shared SSA library (codeql.ssa.Ssa)#21554
Conversation
c83c654 to
a899767
Compare
|
@copilot The last commit contains test changes which need investigation. There are some tests that show we have lost some results for SSA definitions of capture variables. And there are some query tests which show lost results, which I believe are because of capture variables as well. Please investigate and fix the SSA implemention so that the query tests pass. It is okay if there are test differences relating to the renaming of data flow nodes or SSA classes. |
Fixed in 5a786ac. Two issues were causing the lost results:
|
286611e to
1cb1d47
Compare
|
@copilot Please update the PR description and then investigate the change in test output in the last commit and explain why it happens and whether it is a sign that something should be fixed. |
There was a problem hiding this comment.
Pull request overview
Migrates Go’s SSA implementation from a bespoke library to the shared codeql.ssa.Ssa library, aligning Go with other CodeQL languages and updating the Go pack’s SSA adapters, public SSA API, and test baselines accordingly.
Changes:
- Added the
codeql/ssadependency and introduced a Go CFG adapter (BasicBlocks::Cfg) for the shared SSA library. - Rewrote Go’s SSA implementation to use
codeql.ssa.Ssaand updated the public Go SSA API wrappers. - Updated numerous Go query/library test expectation files and added new SSA closure test cases.
Show a summary per file
| File | Description |
|---|---|
| go/ql/test/query-tests/Security/CWE-601/BadRedirectCheck/BadRedirectCheck.expected | Updates expected dataflow/SSA node labels to the new SSA definition formatting. |
| go/ql/test/query-tests/Security/CWE-347/MissingJwtSignatureCheck.expected | Updates expected edges/nodes to reference SSA definitions using the new formatting. |
| go/ql/test/query-tests/Security/CWE-326/InsufficientKeySize.expected | Refreshes expected SSA definition labels. |
| go/ql/test/query-tests/Security/CWE-322/InsecureHostKeyCallback.expected | Refreshes expected SSA/capture-related labels in provenance edges. |
| go/ql/test/query-tests/Security/CWE-312/CleartextLogging.expected | Large baseline update reflecting new SSA definition label formatting across many paths. |
| go/ql/test/query-tests/Security/CWE-079/StoredXss.expected | Updates expected SSA definition labels. |
| go/ql/test/query-tests/Security/CWE-078/CommandInjection.expected | Updates expected SSA definition labels in edges/nodes. |
| go/ql/test/query-tests/Security/CWE-022/ZipSlip.expected | Updates expected SSA definition labels. |
| go/ql/test/query-tests/Security/CWE-022/UnsafeUnzipSymlink.expected | Updates expected SSA definition labels. |
| go/ql/test/query-tests/InconsistentCode/UnhandledCloseWritableHandle/UnhandledCloseWritableHandle.expected | Updates expected SSA/capture provenance labels. |
| go/ql/test/query-tests/InconsistentCode/MissingErrorCheck/MissingErrorCheck.expected | Updates expected SSA definition labels for result/err. |
| go/ql/test/library-tests/semmle/go/frameworks/Yaml/tests.expected | Records model-test failures related to SSA definition label formatting (should be resolved). |
| go/ql/test/library-tests/semmle/go/frameworks/Twirp/RequestForgery.expected | Updates expected SSA/capture-related labels and return-node labels. |
| go/ql/test/library-tests/semmle/go/frameworks/gqlgen/gqlgen.expected | Records model-test failures related to SSA definition label formatting (should be resolved). |
| go/ql/test/library-tests/semmle/go/frameworks/GoMicro/LogInjection.expected | Updates expected SSA definition labels for request-derived values. |
| go/ql/test/library-tests/semmle/go/frameworks/GoMicro/gomicro.expected | Records model-test failures related to SSA definition label formatting (should be resolved). |
| go/ql/test/library-tests/semmle/go/frameworks/GoKit/RemoteFlowSources.expected | Records model-test failures related to SSA definition label formatting (should be resolved). |
| go/ql/test/library-tests/semmle/go/frameworks/Beego/CleartextLogging.expected | Updates expected SSA definition labels in taint flow baselines. |
| go/ql/test/library-tests/semmle/go/dataflow/SSA/VarUses.expected | Updates SSA-related expected outputs for variable uses (including closure cases). |
| go/ql/test/library-tests/semmle/go/dataflow/SSA/VarDefs.expected | Updates SSA-related expected outputs for variable definitions (including closure cases). |
| go/ql/test/library-tests/semmle/go/dataflow/SSA/SsaWithFields.expected | Updates SSA-with-fields baseline to new “SSA def/phi” formatting. |
| go/ql/test/library-tests/semmle/go/dataflow/SSA/SsaDefinition.expected | Updates SSA definition rendering baseline. |
| go/ql/test/library-tests/semmle/go/dataflow/SSA/main.go | Adds new closure read/update cases to exercise SSA liveness/capture behavior. |
| go/ql/test/library-tests/semmle/go/dataflow/SSA/DefUse.expected | Updates def-use baseline to the new SSA definition/phi formatting. |
| go/ql/test/library-tests/semmle/go/dataflow/PromotedFields/LocalFlowStep.expected | Updates baseline SSA definition labels for promoted-field dataflow steps. |
| go/ql/test/library-tests/semmle/go/dataflow/GlobalValueNumbering/GlobalValueNumber.expected | Updates baseline SSA definition labels for GVN-related steps. |
| go/ql/test/library-tests/semmle/go/dataflow/FunctionInputsAndOutputs/FunctionInput_getExitNode.expected | Updates baseline SSA labels for parameters/receivers at function exits. |
| go/ql/test/library-tests/semmle/go/dataflow/FunctionInputsAndOutputs/FunctionInput_getEntryNode.expected | Updates baseline SSA labels for results/locals at function entry. |
| go/ql/test/library-tests/semmle/go/dataflow/FlowSteps/LocalFlowStep.expected | Updates baseline SSA labels used in local flow-step testing. |
| go/ql/test/library-tests/semmle/go/dataflow/ExternalValueFlow/srcs.expected | Updates baseline SSA labels for external value-flow sources. |
| go/ql/test/library-tests/semmle/go/dataflow/ExternalTaintFlow/srcs.expected | Updates baseline SSA labels for external taint-flow sources. |
| go/ql/test/library-tests/semmle/go/concepts/Regexp/RegexpPattern.expected | Updates baseline SSA definition label for regexp pattern definition. |
| go/ql/test/experimental/Unsafe/WrongUsageOfUnsafe.expected | Updates baseline SSA labels for unsafe usage provenance. |
| go/ql/test/example-tests/snippets/typeinfo.expected | Updates snippet baseline SSA labels for req definition. |
| go/ql/lib/semmle/go/dataflow/SsaImpl.qll | Replaces Go’s bespoke SSA impl with a shared-SSA-based adapter and exported predicates/types. |
| go/ql/lib/semmle/go/dataflow/SSA.qll | Updates public SSA API wrapper classes to extend shared SSA definitions and adjusts labeling behavior. |
| go/ql/lib/semmle/go/controlflow/BasicBlocks.qll | Adds a CfgSig-compatible CFG view module for shared SSA integration. |
| go/ql/lib/qlpack.yml | Adds codeql/ssa: ${workspace} dependency to the Go library pack. |
| go/ql/lib/change-notes/2026-03-23-use-shared-ssa.md | Adds a Go pack change note documenting the SSA migration and potential edge-case differences. |
Copilot's findings
- Files reviewed: 40/40 changed files
- Comments generated: 5
1cb1d47 to
c4b1ff2
Compare
|
@copilot Please update the PR description and then investigate the change in test output in the last commit and explain why it happens and whether it is a sign that something should be fixed. If it is reasonably clear that it should be fixed and how to fix it then go ahead and fix it. |
Updated the PR description. Investigation of the last commit's test diffThe two removed entries in func test3() (res int) {
res = 7 // L34
defer func() { res = 8 }()
return 9 // L38 (implicitly: res = 9; return)
}
So the change in behaviour is: dead writes to captured variables are now pruned the same way as dead writes to non‑captured variables. I think this is an acceptable precision improvement rather than something to fix:
I therefore haven't reverted the diff. Happy to revisit if you'd like the old conservative behaviour preserved exactly. |
Co-authored-by: owen-mc <62447351+owen-mc@users.noreply.github.com> Agent-Logs-Url: https://github.com/github/codeql/sessions/b400ebd5-4095-401e-8811-fb550600b3c4
- Extend synthetic uncertain reads to function exits of any function that writes a captured variable, not just the declaring function. This ensures writes to captured variables inside closures remain live (matching the old `v.isCaptured()` liveness shortcut). - Uncomment toString overrides for SsaExplicitDefinition, SsaVariableCapture, SsaPhiNode, and SsaVariable to restore original output formats. - Revert test expected files to pre-test-changes state matching the correct toString formats and capture variable results. Agent-Logs-Url: https://github.com/github/codeql/sessions/6dbf9d42-b2e2-42a2-984b-8ea31df4e633 Co-authored-by: owen-mc <62447351+owen-mc@users.noreply.github.com>
c4b1ff2 to
99538f0
Compare
| */ | ||
| private predicate mayCapture(ReachableBasicBlock bb, int i, SsaSourceVariable v) { | ||
| cached | ||
| predicate mayCapture(BasicBlock bb, int i, SsaSourceVariable v) { |
There was a problem hiding this comment.
This predicate is a bit weird to me - it bunches together uncertain updates associated with a call that may potentially update a captured variable, and the entry definitions in capturing functions. The latter are expected to be certain writes, but I'll have to dig through the shared lib to see if it makes a differences that they're treated as uncertain here. Given that things seem to work then it might not be a problem.
There's also some naming here that confuses me: To me, the thing that captures a variable is a closure, i.e. a nested function, while here the predicate columns essentially reference a CfgNode. I think perhaps mayUpdateCapturedVariable would have been a more precise predicate name.
There was a problem hiding this comment.
I don't fully understand it myself. I'm mostly following what was done before. I asked copilot and it said this:
Here's my read on your concern — and I think the uncertain classification is actually correct, not a quirk that merely "happens to work":
Why lumping them is fine
At an entry node, "certain" vs "uncertain" collapses to nearly the same thing for liveness.
The only behavioral teeth a certain write has over an uncertain one is that it kills liveness of prior definitions (see the comment at Ssa.qll:353-362: "uncertain writes have no impact on liveness"). An entry node has no predecessor and nothing prior in the same scope to kill, so making it certain would buy you no extra precision here.The uncertainty is the point for the captured-variable case.
An uncertain write generates a pseudo-read of its prior definition and exposes it via getPriorDefinition() / uncertainWriteDefinitionInput. For a closure-entry capture, the "real" value lives in the enclosing scope, not in this function's SSA graph. Modeling the entry as uncertain gives the dataflow layer an input slot through which the captured value flows into the closure. If you modeled it as a certain write, the SSA def would be a fresh, opaque value that overrides whatever was captured — which would actually break flow of the captured value into the closure body. So certain would be the wrong choice, not just a less-tidy one.This is the same pattern C# uses for implicit captured-variable entry definitions (Ssa.qll:189, and the ref-argument analogy in the variableWrite doc at Ssa.qll:27-32).
- The two branches really are the same concept.
Both branches of mayUpdateCapturedVariable describe "a point where v's value is (re)established from outside this function's local control flow" — once at entry (initial capture), and again after each call (the closure may have mutated it non-locally). Treating both as uncertain writes whose prior value may survive is the consistent and correct framing.So, is it a problem?
I don't think so. The "things seem to work" is load-bearing here in a good way: the entry-capture-as-uncertain is the mechanism that lets captured values flow into closures, so if it were genuinely mis-modeled you'd see missing capture flow, not silent over-approximation. If you want belt-and-suspenders confidence, the thing worth confirming is that the Go dataflow capture step consumes getPriorDefinition() / treats these entry UncertainWriteDefinitions as the entry point for captured values (rather than relying on something else) — that's the only place a regression from the old hand-rolled SSA would surface.
There was a problem hiding this comment.
I disagree with copilot here.
| * We add a synthetic uncertain read at the exit node of every function | ||
| * that references a captured variable `v`. This ensures that definitions | ||
| * of captured variables are included in the SSA graph even when the | ||
| * variable is not locally read in that function scope (but may be read | ||
| * by another function sharing the same closure). |
There was a problem hiding this comment.
Why? I don't understand why that's necessary.
There was a problem hiding this comment.
To elaborate: I would not expect SSA to connect a write in one callable with a read in another, since we cannot in general prove that the write was the single assignment flowing to that read.
There was a problem hiding this comment.
Again, I'm mostly preserving what was there before. I am happy to improve it in a follow-up, to make things work more like other languages. I put your question to copilot and it said:
You're right on the principle, and the good news is the SSA library is not doing what you'd object to — the synthetic exit read is purely a liveness-preservation hack, and the connection stays intra-procedural.
What actually forces the synthetic read
An SSA write definition only gets created if the variable is live after the write — see Ssa.qll:464-467:
Liveness is driven by reads, and at the liveness level reads count regardless of the certain flag — Ssa.qll:344: variableRead(bb, i, v, _) and k = Read().
So: a function F that writes a captured variable v but never reads v locally would have that write classified as dead and pruned — there'd be no SSA definition node at all. The synthetic uncertain read at F's exit makes v live at exit, which keeps the write live, which keeps the TWriteDef in the graph.
Why this isn't cross-callable SSA
The synthetic read is added at bb.getScope() = F's own exit node. Everything it does is within F's SSA graph: it links F's write to F's exit, intra-procedurally. SSA never reaches from F into another function G. The library is per-scope and stays that way.
The actual "does this write flow to that read in another closure?" question — exactly the thing you say SSA can't prove — is not answered here. It's deferred to the variable-capture dataflow layer, which uses these retained SSA definitions/reads as the endpoints it stitches together (uncertainly). The SSA library's only job is to make sure those endpoints exist as nodes so the capture dataflow has something to attach to.
Why uncertain (and why it matches your intuition)
The read is certain = false precisely because, from F's local perspective, it's not a real use — nobody in F reads v at exit. Marking it uncertain keeps v live (so the def survives) without asserting a genuine local use that would otherwise show up in the certain-read–only relations (e.g. the AdjacentSsaRefs / firstUse machinery at Ssa.qll:594 and Ssa.qll:733 gate on variableRead(..., true)). So it deliberately does not manufacture a use-use edge that would imply "this single write is the value read here."
So the mental model is:
SSA (intra-procedural): keep captured-variable defs alive via synthetic exit reads, so they appear as nodes. No cross-callable linking.
Variable-capture dataflow (inter-procedural): does the actual, necessarily-uncertain matching of writes in one closure to reads in another.
aschackmull
left a comment
There was a problem hiding this comment.
Overall LGTM - this is a great improvement, but there's plenty more to potentially do as followup: The shared SSA lib also contains a more opinionated submodule MakeSsa, which can be used to replace most of the Go-specific class definitions. Doing so will also handle caching. See #20761, #21744, and #21762 for how this was done in Java and C#.
For now, I think we should just remove the change note and do the minor simplification that I pointed out in go/ql/lib/semmle/go/controlflow/BasicBlocks.qll
| } | ||
|
|
||
| private module BbImpl = BB::Make<Location, Input>; | ||
| module Cfg = BB::Make<Location, Input>; |
|
CI seems to be stuck so I've merged in |
|
I just remembered I should run QA before merging. I'll start it now. |
Go was the last language with its own bespoke SSA implementation. This migrates it to the shared
codeql.ssa.Ssalibrary, consistent with C#, Java, Ruby, Rust, and Swift.Key changes
go/ql/lib/qlpack.yml: Addcodeql/ssa: ${workspace}dependency.BasicBlocks.qll: Addmodule Cfg implements BB::CfgSig<Location>— the CFG adapter required by the shared library.SsaImpl.qll(complete rewrite):defAt,useAt,mayCapture,assignedThroughClosure).SsaInput implements SsaImplCommon::InputSig<Location, BasicBlock>feedingvariableWrite(certain = explicit def, uncertain = capture) andvariableRead.v.isCaptured()liveness shortcut for the last write reaching the exit of each such function.SsaImplCommon::Make<Location, BasicBlocks::Cfg, SsaInput> as Impl.Definition,WriteDefinition,UncertainWriteDefinition,PhiNode) and cached predicates (ssaDefReachesRead,phiHasInputFromBlock,firstUse,adjacentUseUse).SSA.qll(updated public API):SsaVariable,SsaDefinitionnow extend the sharedDefinitioninstead of a local newtype.SsaExplicitDefinition extends SsaDefinition, WriteDefinitionSsaVariableCapture extends SsaImplicitDefinition, UncertainWriteDefinitionSsaPhiNode extends SsaPseudoDefinition, PhiNodedefinesAt(bb, idx, v)compatibility wrapper (no external callers) and the now-redundantgetBasicBlock()override (it'sfinalin the shared library).toString()format changeThe shared library's default formats are used:
"SSA def(x)"for explicit/capture definitions and"SSA phi(x)"for phi nodes (replacing Go's previous"definition of x","capture variable x", and"x = phi(def@L:C, ...)"). The preparatory commit "Preemptively changetoString()for SSA classes" regenerated all affected.expectedfiles to use the new formats before the migration, so per-test diffs are limited to format updates.Note on the test diff in the last commit
The last commit ("Test changes to investigate") removes two GVN entries for
test3()inGlobalValueNumbering/GlobalValueNumber.expected:main.go:33:15:33:17 | SSA def(res) | main.go:10:7:10:7 | 0— this write is dead because it is overwritten byres = 7on the line below. Previously allSsaNodes for captured variables were never pruned for liveness. Note that the line| main.go:33:15:33:17 | zero value for res | main.go:10:7:10:7 | 0 |at the same location is kept because it corresponds to anInstructionNodeinstead of anSsaNode, and they are not pruned based on liveness.main.go:34:2:34:4 | SSA def(res) | main.go:34:8:34:8 | 7— the writeres = 7is dead: it is overwritten by the implicitres = 9fromreturn 9before any local read, and the deferred closure body only runs afterreturn 9, so the value 7 is never observed anywhere. The old SSA kept this def alive solely because of the conservativev.isCaptured()shortcut. The shared library uniformly prunes truly dead writes — note thattest2.res = 5is already pruned even on the old SSA, becauseresthere is not captured. The new behaviour is therefore more uniform and slightly more precise; no downstream query result changes were observed.This is consistent with how the shared SSA library is used in other languages (C#, Java, Ruby, Rust, Swift) and is not a regression worth working around.
Original prompt
Created from VS Code.
📍 Connect Copilot coding agent with Jira, Azure Boards or Linear to delegate work to Copilot in one click without leaving your project management tool.