Skip to content

Python: Track instance attributes through type tracking#21969

Merged
owen-mc merged 17 commits into
mainfrom
copilot/investigate-missing-alerts
Jun 17, 2026
Merged

Python: Track instance attributes through type tracking#21969
owen-mc merged 17 commits into
mainfrom
copilot/investigate-missing-alerts

Conversation

Copilot AI commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Type-tracking-based modeling (Concepts, API graphs) misses values stored in a self attribute and read from another method. A typical symptom is a PEP 249 connection assigned to self._conn in one method not being recognized as a connection when used later, so downstream modeling such as SQL-execution sinks can be missed.

Changes

  • TypeTrackingImpl.qll: Add localFieldStep, a no-call-graph type-tracking level step that models flow through self.attr across instance methods on the same class.
    • This complements the existing attribute store/read steps by relating writes to self.attr with later reads of the same attribute on the same class.
    • The step intentionally applies across instance methods in general, not just __init__, matching the existing instance-insensitive and order-insensitive treatment used for similar field tracking in Ruby and JavaScript.
  • hdbcli/pep249.py: Add regression coverage for a connection stored in self._conn and used both directly and via a getter.
  • attribute_tests.py: Update expectations to reflect tracking from values stored on self into later self.foo reads inside instance methods, while leaving external instance.foo reads unchanged.
  • Experimental call-graph tests: Update inline expectations and the corresponding .expected file to reflect the improved type-tracker results.
  • Change note: Add a Python library change note describing the additional analysis coverage.

Notes for reviewers

  • The new step is deliberately an over-approximation for instance attributes on the same class; it does not distinguish different instances and does not require a write to be ordered before a read.
  • External attribute reads such as instance.foo are still not modeled by this change.

Original prompt

This test has missing alerts when the object returned by connect() is stored in a class field (attribute). I believe the problem lies in execute(DataFlow::TypeTracker t). Investigate. This could be a missing step in type tracking, in which case we should fix it in general. There should be a store step into the attribute _conn in init (see attributeStoreStep in python/ql/lib/semmle/python/dataflow/new/internal/DataFlowPrivate.qll) and then a read step from that attribute in get_connection (see attributeReadStep in python/ql/lib/semmle/python/dataflow/new/internal/DataFlowPrivate.qll). Another option is that we are misusing type tracking in this case. Please investigate and explain the cause to me, with suggestions for how to fix.

The user has attached the following file paths as relevant context:

  • python/ql/test/library-tests/frameworks/hdbcli/pep249.py
  • python/ql/lib/semmle/python/frameworks/PEP249.qll
  • .github/instructions/ql-files.instructions.md

Copilot AI changed the title [WIP] Investigate missing alerts when storing connect() result Document missing PEP249 SQL alerts when connection is stored in a self attribute Jun 11, 2026
Copilot AI requested a review from owen-mc June 11, 2026 05:50
Copilot AI changed the title Document missing PEP249 SQL alerts when connection is stored in a self attribute Track instance attributes set in __init__ through type tracking Jun 11, 2026
Copilot stopped work on behalf of owen-mc due to an error June 11, 2026 12:13
Copilot stopped work on behalf of owen-mc due to an error June 11, 2026 12:14
Use a field level step like JS and Ruby.
@owen-mc owen-mc force-pushed the copilot/investigate-missing-alerts branch from 6ccc564 to 73bc2d7 Compare June 11, 2026 12:53
@owen-mc owen-mc changed the title Track instance attributes set in __init__ through type tracking Track instance attributes through type tracking Jun 11, 2026
@owen-mc owen-mc marked this pull request as ready for review June 11, 2026 20:18
@owen-mc owen-mc requested a review from a team as a code owner June 11, 2026 20:18
Copilot AI review requested due to automatic review settings June 11, 2026 20:18

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves Python type-tracking so that values stored into instance attributes (for example in __init__) can be recovered when those attributes are read in other methods, enabling downstream modeling (Concepts / API graphs) to recognize types such as PEP 249 connections when used outside the constructor.

Changes:

  • Extend type-tracking with a no-call-graph “local field” level step to connect attribute writes (self.attr = ...) to later reads (self.attr) across methods on the same class.
  • Add/extend Python library tests for PEP249 (hdbcli) and type-tracking attribute flows to validate the new behavior.
  • Update experimental inline call-graph expected output to reflect improved type-tracker results through instance attributes.
Show a summary per file
File Description
python/ql/lib/semmle/python/dataflow/new/internal/TypeTrackingImpl.qll Adds a new type-tracking level step intended to model flow through instance attributes across methods.
python/ql/test/library-tests/frameworks/hdbcli/pep249.py Adds a regression test for a DB-API connection stored on self._conn and used in other methods.
python/ql/test/library-tests/dataflow/typetracking/attribute_tests.py Updates expectations for tracking reads of attributes assigned in __init__.
python/ql/test/experimental/library-tests/CallGraph/code/class_attr_assign.py Updates inline expectations to reflect type tracking through instance attributes.
python/ql/test/experimental/library-tests/CallGraph/InlineCallGraphTest.expected Updates expected output to reflect the improved type-tracker behavior.

Copilot's findings

  • Files reviewed: 6/6 changed files
  • Comments generated: 2

@owen-mc owen-mc force-pushed the copilot/investigate-missing-alerts branch from 7817cfe to 3bad60c Compare June 12, 2026 12:47
@owen-mc owen-mc changed the title Track instance attributes through type tracking Python: Track instance attributes through type tracking Jun 12, 2026
@owen-mc owen-mc force-pushed the copilot/investigate-missing-alerts branch from 3bad60c to 9c65082 Compare June 14, 2026 23:15
@owen-mc owen-mc requested a review from yoff June 16, 2026 09:58
yoff
yoff previously approved these changes Jun 16, 2026

@yoff yoff left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fine and performance does not seem problematic. The first few new alerts look good. The file-not-closed ones seem a bit more involved and unclear. If the rest are fine, I am happy for this to be merged.

*/
private predicate selfAttrRef(Class cls, string attr, DataFlowPublic::AttrRef ref) {
exists(Function method, Name selfUse |
method = cls.getAMethod() and

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about super classes? I see we take them into account for the ts/js implementation. Is there a reason why we don't model them here?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I'll implement that. I was about to re-run DCA anyway.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that I only implemented it for direct subclasses or superclasses, not mix-ins. We can always generalise it later if we find we need it.

*/
private predicate instanceAttrRead(Class cls, string attr, DataFlowPublic::AttrRead read) {
read.getObject() = DataFlowDispatch::classInstanceTracker(cls) and
read.mayHaveAttributeName(attr)

@BazookaMusic BazookaMusic Jun 17, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work nicely with @propertydecorators?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not properly, but I think that's consistent with what the python library already does, from what I can see.

@owen-mc

owen-mc commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

I went through the new results from the first DCA run in more detail. The new alerts for py/file-not-closed and py/should-use-with are FPs caused by this PR, and I have fixed them. In both cases we don't want to track through attribute reads. I've introduced a shared mechanism to exclude these results, which can be reused in future. Some other new results I looked at were FPs but ones we'd expect to find, which it doesn't make sense to fix as part of this PR. (In one case we now track taint through a local sanitizer, and the fix is to recognise that kind of sanitizer. In another we incorrectly identify some latitudes and longitudes as possibly sensitive information.)

@yoff yoff left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code changes look reasonable. Approved bearing DCA not raising any alarms.

Comment on lines +390 to +406
/**
* Holds if `nodeFrom` is written to attribute `self.attr` in an instance method of one
* class, and `nodeTo` reads attribute `self.attr` in an instance method of a different
* class that is related to it by inheritance (one is a transitive superclass of the
* other).
*
* This is the cross-hierarchy counterpart of `localFieldStep`: at runtime the receiver
* of both methods may be an instance of the more-derived class, whose behaviour is made
* up of the methods it declares together with those inherited from all of its ancestors.
* It therefore models the common pattern of a base class storing `self.attr` that a
* subclass reads, and vice versa. Resolving the superclass relationship depends on the
* call graph (via `getADirectSuperclass`), so this step is reported as `levelStepCall`
* rather than `levelStepNoCall`.
*
* Like `localFieldStep`, this is an over-approximation: it is both instance-insensitive
* and order-insensitive.
*/

import python
private import semmle.python.ApiGraphs
private import semmle.python.dataflow.new.DataFlow
@owen-mc owen-mc merged commit e618883 into main Jun 17, 2026
20 checks passed
@owen-mc owen-mc deleted the copilot/investigate-missing-alerts branch June 17, 2026 23:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants