Python: Track instance attributes through type tracking#21969
Conversation
__init__ through type tracking
Use a field level step like JS and Ruby.
6ccc564 to
73bc2d7
Compare
__init__ through type trackingThere was a problem hiding this comment.
Pull request overview
This PR improves Python type-tracking so that values stored into instance attributes (for example in __init__) can be recovered when those attributes are read in other methods, enabling downstream modeling (Concepts / API graphs) to recognize types such as PEP 249 connections when used outside the constructor.
Changes:
- Extend type-tracking with a no-call-graph “local field” level step to connect attribute writes (
self.attr = ...) to later reads (self.attr) across methods on the same class. - Add/extend Python library tests for PEP249 (
hdbcli) and type-tracking attribute flows to validate the new behavior. - Update experimental inline call-graph expected output to reflect improved type-tracker results through instance attributes.
Show a summary per file
| File | Description |
|---|---|
| python/ql/lib/semmle/python/dataflow/new/internal/TypeTrackingImpl.qll | Adds a new type-tracking level step intended to model flow through instance attributes across methods. |
| python/ql/test/library-tests/frameworks/hdbcli/pep249.py | Adds a regression test for a DB-API connection stored on self._conn and used in other methods. |
| python/ql/test/library-tests/dataflow/typetracking/attribute_tests.py | Updates expectations for tracking reads of attributes assigned in __init__. |
| python/ql/test/experimental/library-tests/CallGraph/code/class_attr_assign.py | Updates inline expectations to reflect type tracking through instance attributes. |
| python/ql/test/experimental/library-tests/CallGraph/InlineCallGraphTest.expected | Updates expected output to reflect the improved type-tracker behavior. |
Copilot's findings
- Files reviewed: 6/6 changed files
- Comments generated: 2
7817cfe to
3bad60c
Compare
3bad60c to
9c65082
Compare
yoff
left a comment
There was a problem hiding this comment.
This looks fine and performance does not seem problematic. The first few new alerts look good. The file-not-closed ones seem a bit more involved and unclear. If the rest are fine, I am happy for this to be merged.
| */ | ||
| private predicate selfAttrRef(Class cls, string attr, DataFlowPublic::AttrRef ref) { | ||
| exists(Function method, Name selfUse | | ||
| method = cls.getAMethod() and |
There was a problem hiding this comment.
How about super classes? I see we take them into account for the ts/js implementation. Is there a reason why we don't model them here?
There was a problem hiding this comment.
Good point. I'll implement that. I was about to re-run DCA anyway.
There was a problem hiding this comment.
Note that I only implemented it for direct subclasses or superclasses, not mix-ins. We can always generalise it later if we find we need it.
| */ | ||
| private predicate instanceAttrRead(Class cls, string attr, DataFlowPublic::AttrRead read) { | ||
| read.getObject() = DataFlowDispatch::classInstanceTracker(cls) and | ||
| read.mayHaveAttributeName(attr) |
There was a problem hiding this comment.
Does this work nicely with @propertydecorators?
There was a problem hiding this comment.
Not properly, but I think that's consistent with what the python library already does, from what I can see.
|
I went through the new results from the first DCA run in more detail. The new alerts for |
yoff
left a comment
There was a problem hiding this comment.
Code changes look reasonable. Approved bearing DCA not raising any alarms.
| /** | ||
| * Holds if `nodeFrom` is written to attribute `self.attr` in an instance method of one | ||
| * class, and `nodeTo` reads attribute `self.attr` in an instance method of a different | ||
| * class that is related to it by inheritance (one is a transitive superclass of the | ||
| * other). | ||
| * | ||
| * This is the cross-hierarchy counterpart of `localFieldStep`: at runtime the receiver | ||
| * of both methods may be an instance of the more-derived class, whose behaviour is made | ||
| * up of the methods it declares together with those inherited from all of its ancestors. | ||
| * It therefore models the common pattern of a base class storing `self.attr` that a | ||
| * subclass reads, and vice versa. Resolving the superclass relationship depends on the | ||
| * call graph (via `getADirectSuperclass`), so this step is reported as `levelStepCall` | ||
| * rather than `levelStepNoCall`. | ||
| * | ||
| * Like `localFieldStep`, this is an over-approximation: it is both instance-insensitive | ||
| * and order-insensitive. | ||
| */ |
|
|
||
| import python | ||
| private import semmle.python.ApiGraphs | ||
| private import semmle.python.dataflow.new.DataFlow |
Type-tracking-based modeling (Concepts, API graphs) misses values stored in a
selfattribute and read from another method. A typical symptom is a PEP 249 connection assigned toself._connin one method not being recognized as a connection when used later, so downstream modeling such as SQL-execution sinks can be missed.Changes
TypeTrackingImpl.qll: AddlocalFieldStep, a no-call-graph type-tracking level step that models flow throughself.attracross instance methods on the same class.self.attrwith later reads of the same attribute on the same class.__init__, matching the existing instance-insensitive and order-insensitive treatment used for similar field tracking in Ruby and JavaScript.hdbcli/pep249.py: Add regression coverage for a connection stored inself._connand used both directly and via a getter.attribute_tests.py: Update expectations to reflect tracking from values stored onselfinto laterself.fooreads inside instance methods, while leaving externalinstance.fooreads unchanged..expectedfile to reflect the improved type-tracker results.Notes for reviewers
instance.fooare still not modeled by this change.Original prompt