-
Notifications
You must be signed in to change notification settings - Fork 2k
Python: Track instance attributes through type tracking #21969
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
7795884
a4585d8
73bc2d7
befb557
913dcb1
d389ea4
434a994
9c65082
8909694
199fd86
d721446
415857c
ea7510b
47c2c9e
dd61dd2
1f9899d
c7c1eca
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,4 @@ | ||
| --- | ||
| category: minorAnalysis | ||
| --- | ||
| * Python type tracking now follows values stored in instance attributes such as `self.attr` across instance methods, including across a class hierarchy (for example, a value stored on `self.attr` in a base class and read in a subclass, or vice versa). As a result, analysis is more likely to recognize user-defined objects that are stored on `self` and used later in other methods, which may produce additional results. |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,46 @@ | ||
| /** | ||
| * Provides a parameterized module for identifying values that instance-attribute type tracking | ||
| * has re-exposed (laundered) out of an instance attribute, rather than freshly created. | ||
| */ | ||
|
|
||
| private import python | ||
| private import semmle.python.dataflow.new.DataFlow | ||
|
|
||
| /** Holds if `node` should be treated as an instance source. */ | ||
| signature predicate instanceNodeSig(DataFlow::Node node); | ||
|
|
||
| /** | ||
| * Provides a predicate for identifying values that instance-attribute type tracking has | ||
| * re-exposed (laundered) out of an instance attribute, rather than freshly created. | ||
| * | ||
| * Instance-attribute type tracking can flow a value into an instance attribute and back out at | ||
| * a later attribute read, for example `BufferedRWPair.reader` or `FileIO.fileno` returning | ||
| * `self._fd`. Such a re-exposed value is owned by the enclosing instance and is not a fresh | ||
| * resource; queries that reason about resource creation or lifetime should not treat it as one. | ||
| * | ||
| * The parameter `isInstance` defines which nodes count as instance sources (typically the result | ||
| * of a class- or resource-instance type tracker). | ||
| */ | ||
| module ReExposedInstance<instanceNodeSig/1 isInstance> { | ||
| /** | ||
| * Holds if `read` is an attribute read that re-exposes an instance held in an instance | ||
| * attribute. | ||
| */ | ||
| private predicate launderedAttrRead(DataFlow::AttrRead read) { isInstance(read) } | ||
|
|
||
| /** Type tracking forward from an attribute read that re-exposes an instance held in a field. */ | ||
| private DataFlow::TypeTrackingNode launderedInstance(DataFlow::TypeTracker t) { | ||
| t.start() and | ||
| launderedAttrRead(result) | ||
| or | ||
| exists(DataFlow::TypeTracker t2 | result = launderedInstance(t2).track(t2, t)) | ||
| } | ||
|
|
||
| /** | ||
| * Holds if `node` is a value that has been re-exposed (laundered) out of an instance attribute, | ||
| * rather than being a freshly created instance. | ||
| */ | ||
| predicate isReExposed(DataFlow::Node node) { | ||
| launderedInstance(DataFlow::TypeTracker::end()).flowsTo(node) | ||
| } | ||
| } |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -167,11 +167,17 @@ | |
| } | ||
|
|
||
| /** Holds if there is a level step from `nodeFrom` to `nodeTo`, which may depend on the call graph. */ | ||
| predicate levelStepCall(Node nodeFrom, LocalSourceNode nodeTo) { none() } | ||
| predicate levelStepCall(Node nodeFrom, LocalSourceNode nodeTo) { | ||
| instanceFieldStep(nodeFrom, nodeTo) | ||
| or | ||
| inheritedFieldStep(nodeFrom, nodeTo) | ||
| } | ||
|
|
||
| /** Holds if there is a level step from `nodeFrom` to `nodeTo`, which does not depend on the call graph. */ | ||
| predicate levelStepNoCall(Node nodeFrom, LocalSourceNode nodeTo) { | ||
| TypeTrackerSummaryFlow::levelStepNoCall(nodeFrom, nodeTo) | ||
| or | ||
| localFieldStep(nodeFrom, nodeTo) | ||
| } | ||
|
|
||
| /** | ||
|
|
@@ -317,6 +323,133 @@ | |
| ) | ||
| } | ||
|
|
||
| /** | ||
| * Holds if `ref` accesses attribute `attr` of `self`, where `self` is the first | ||
| * parameter of an instance method of `cls` (i.e. an access of the form `self.attr`). | ||
| * | ||
| * Static methods and class methods are excluded, since their first parameter is not a | ||
| * `self` instance reference. | ||
| */ | ||
| private predicate selfAttrRef(Class cls, string attr, DataFlowPublic::AttrRef ref) { | ||
| exists(Function method, Name selfUse | | ||
| method = cls.getAMethod() and | ||
| not DataFlowDispatch::isStaticmethod(method) and | ||
| not DataFlowDispatch::isClassmethod(method) and | ||
| selfUse.getVariable() = method.getArg(0).(Name).getVariable() and | ||
| ref.getObject().asCfgNode().getNode() = selfUse and | ||
| ref.mayHaveAttributeName(attr) | ||
| ) | ||
| } | ||
|
|
||
| /** | ||
| * Holds if `read` reads attribute `attr` from an instance of `cls`, where the instance | ||
| * is referred to from outside the methods of `cls` (i.e. an access of the form | ||
| * `instance.attr`, where `instance` is a reference to an instance of `cls`). | ||
| * | ||
| * This complements `selfAttrRef`, which only handles `self.attr` accesses inside the | ||
| * methods of `cls`. Unlike `selfAttrRef`, this depends on the call graph (via | ||
| * `classInstanceTracker`), so steps using it must be reported as `levelStepCall`. | ||
| */ | ||
| private predicate instanceAttrRead(Class cls, string attr, DataFlowPublic::AttrRead read) { | ||
| read.getObject() = DataFlowDispatch::classInstanceTracker(cls) and | ||
| read.mayHaveAttributeName(attr) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does this work nicely with
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not properly, but I think that's consistent with what the python library already does, from what I can see. |
||
| } | ||
|
|
||
| /** | ||
| * Holds if `nodeFrom` is written to attribute `self.attr` in some instance method of a | ||
| * class, and `nodeTo` reads attribute `self.attr` in some (possibly different) instance | ||
| * method of the same class. | ||
| * | ||
| * This models flow through instance attributes (`self.foo`): a value stored into | ||
| * `self.foo` in one method can be read from `self.foo` in another method. Type-tracking | ||
| * handles the store and read steps via `AttrWrite`/`AttrRead`, but on its own it cannot | ||
| * relate the `self` of the writing method to the `self` of the reading method. Following | ||
| * the approach used for Ruby and JavaScript, we model this directly as a level step from | ||
| * the written value to the read reference, for any pair of methods on the class (not | ||
| * just from `__init__`). | ||
| * | ||
| * Flow across the class hierarchy (a write in one class observed in a method inherited | ||
| * from, or contributed by, a related class) is handled separately by | ||
| * `inheritedFieldStep`, because resolving superclasses depends on the call graph and so | ||
| * cannot appear in this call-graph-independent step. | ||
| * | ||
| * This is an over-approximation: it is instance-insensitive (it does not distinguish | ||
|
owen-mc marked this conversation as resolved.
|
||
| * between different instances of the same class) and order-insensitive (it does not | ||
| * require the write to happen before the read), matching the precision of | ||
| * instance-attribute handling for Ruby and JavaScript. | ||
| */ | ||
| private predicate localFieldStep(Node nodeFrom, LocalSourceNode nodeTo) { | ||
| exists(Class cls, string attr, DataFlowPublic::AttrWrite write, DataFlowPublic::AttrRead read | | ||
| selfAttrRef(cls, attr, write) and | ||
| nodeFrom = write.getValue() and | ||
| selfAttrRef(cls, attr, read) and | ||
| nodeTo = read | ||
| ) | ||
| } | ||
|
|
||
| /** | ||
| * Holds if `nodeFrom` is written to attribute `self.attr` in an instance method of one | ||
| * class, and `nodeTo` reads attribute `self.attr` in an instance method of a different | ||
| * class that is related to it by inheritance (one is a transitive superclass of the | ||
| * other). | ||
| * | ||
| * This is the cross-hierarchy counterpart of `localFieldStep`: at runtime the receiver | ||
| * of both methods may be an instance of the more-derived class, whose behaviour is made | ||
| * up of the methods it declares together with those inherited from all of its ancestors. | ||
| * It therefore models the common pattern of a base class storing `self.attr` that a | ||
| * subclass reads, and vice versa. Resolving the superclass relationship depends on the | ||
| * call graph (via `getADirectSuperclass`), so this step is reported as `levelStepCall` | ||
| * rather than `levelStepNoCall`. | ||
| * | ||
| * Like `localFieldStep`, this is an over-approximation: it is both instance-insensitive | ||
| * and order-insensitive. | ||
| */ | ||
Check warningCode scanning / CodeQL Misspelling Warning
This comment contains the non-US spelling 'behaviour', which should instead be 'behavior'.
|
||
|
Comment on lines
+390
to
+406
|
||
| private predicate inheritedFieldStep(Node nodeFrom, LocalSourceNode nodeTo) { | ||
| exists( | ||
| Class writeCls, Class readCls, string attr, DataFlowPublic::AttrWrite write, | ||
| DataFlowPublic::AttrRead read | ||
| | | ||
| selfAttrRef(writeCls, attr, write) and | ||
| nodeFrom = write.getValue() and | ||
| selfAttrRef(readCls, attr, read) and | ||
| nodeTo = read and | ||
| writeCls != readCls and | ||
| ( | ||
| writeCls = DataFlowDispatch::getADirectSuperclass*(readCls) | ||
| or | ||
| readCls = DataFlowDispatch::getADirectSuperclass*(writeCls) | ||
| ) | ||
| ) | ||
| } | ||
|
|
||
| /** | ||
| * Holds if `nodeFrom` is written to attribute `self.attr` in some instance method of a | ||
| * class, and `nodeTo` reads attribute `attr` from an instance of that class (or a | ||
| * subclass of it) outside its methods (e.g. `instance.attr`). | ||
| * | ||
| * This is the cross-instance counterpart of `localFieldStep`: it relates a write of | ||
| * `self.attr` inside a class to a read of `attr` on a reference to an instance of that | ||
| * class or one of its subclasses. Identifying instances relies on the call graph (via | ||
| * `classInstanceTracker`), so this step is reported as `levelStepCall` rather than | ||
| * `levelStepNoCall`. The write may occur in the instance's own class or in any of its | ||
| * superclasses, since those methods are inherited. | ||
| * | ||
| * Like `localFieldStep`, this is an over-approximation: it is both instance-insensitive | ||
| * and order-insensitive. | ||
| */ | ||
| private predicate instanceFieldStep(Node nodeFrom, LocalSourceNode nodeTo) { | ||
| exists( | ||
| Class writeCls, Class instanceCls, string attr, DataFlowPublic::AttrWrite write, | ||
| DataFlowPublic::AttrRead read | ||
| | | ||
| selfAttrRef(writeCls, attr, write) and | ||
| nodeFrom = write.getValue() and | ||
| instanceAttrRead(instanceCls, attr, read) and | ||
| nodeTo = read and | ||
| writeCls = DataFlowDispatch::getADirectSuperclass*(instanceCls) | ||
| ) | ||
| } | ||
|
|
||
| /** | ||
| * Holds if data can flow from `node1` to `node2` in a way that discards call contexts. | ||
| */ | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,4 @@ | ||
| --- | ||
| category: minorAnalysis | ||
| --- | ||
| * The `py/modification-of-locals` query no longer flags modifications of a `locals()` dictionary that has been passed out of the scope in which `locals()` was called (for example, by passing it to another function or storing it in an instance attribute). In such cases the dictionary is used as an ordinary mapping and modifying it is meaningful, so these were false positives. The "modification has no effect" claim only applies within the scope that called `locals()`, which is now the only case reported. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about super classes? I see we take them into account for the ts/js implementation. Is there a reason why we don't model them here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I'll implement that. I was about to re-run DCA anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that I only implemented it for direct subclasses or superclasses, not mix-ins. We can always generalise it later if we find we need it.