Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
---
category: minorAnalysis
---
* Python type tracking now follows values stored in instance attributes such as `self.attr` across instance methods, including across a class hierarchy (for example, a value stored on `self.attr` in a base class and read in a subclass, or vice versa). As a result, analysis is more likely to recognize user-defined objects that are stored on `self` and used later in other methods, which may produce additional results.
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
/**
* Provides a parameterized module for identifying values that instance-attribute type tracking
* has re-exposed (laundered) out of an instance attribute, rather than freshly created.
*/

private import python
private import semmle.python.dataflow.new.DataFlow

/** Holds if `node` should be treated as an instance source. */
signature predicate instanceNodeSig(DataFlow::Node node);

/**
* Provides a predicate for identifying values that instance-attribute type tracking has
* re-exposed (laundered) out of an instance attribute, rather than freshly created.
*
* Instance-attribute type tracking can flow a value into an instance attribute and back out at
* a later attribute read, for example `BufferedRWPair.reader` or `FileIO.fileno` returning
* `self._fd`. Such a re-exposed value is owned by the enclosing instance and is not a fresh
* resource; queries that reason about resource creation or lifetime should not treat it as one.
*
* The parameter `isInstance` defines which nodes count as instance sources (typically the result
* of a class- or resource-instance type tracker).
*/
module ReExposedInstance<instanceNodeSig/1 isInstance> {
/**
* Holds if `read` is an attribute read that re-exposes an instance held in an instance
* attribute.
*/
private predicate launderedAttrRead(DataFlow::AttrRead read) { isInstance(read) }

/** Type tracking forward from an attribute read that re-exposes an instance held in a field. */
private DataFlow::TypeTrackingNode launderedInstance(DataFlow::TypeTracker t) {
t.start() and
launderedAttrRead(result)
or
exists(DataFlow::TypeTracker t2 | result = launderedInstance(t2).track(t2, t))
}

/**
* Holds if `node` is a value that has been re-exposed (laundered) out of an instance attribute,
* rather than being a freshly created instance.
*/
predicate isReExposed(DataFlow::Node node) {
launderedInstance(DataFlow::TypeTracker::end()).flowsTo(node)
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -167,11 +167,17 @@
}

/** Holds if there is a level step from `nodeFrom` to `nodeTo`, which may depend on the call graph. */
predicate levelStepCall(Node nodeFrom, LocalSourceNode nodeTo) { none() }
predicate levelStepCall(Node nodeFrom, LocalSourceNode nodeTo) {
instanceFieldStep(nodeFrom, nodeTo)
or
inheritedFieldStep(nodeFrom, nodeTo)
}

/** Holds if there is a level step from `nodeFrom` to `nodeTo`, which does not depend on the call graph. */
predicate levelStepNoCall(Node nodeFrom, LocalSourceNode nodeTo) {
TypeTrackerSummaryFlow::levelStepNoCall(nodeFrom, nodeTo)
or
localFieldStep(nodeFrom, nodeTo)
}

/**
Expand Down Expand Up @@ -317,6 +323,133 @@
)
}

/**
* Holds if `ref` accesses attribute `attr` of `self`, where `self` is the first
* parameter of an instance method of `cls` (i.e. an access of the form `self.attr`).
*
* Static methods and class methods are excluded, since their first parameter is not a
* `self` instance reference.
*/
private predicate selfAttrRef(Class cls, string attr, DataFlowPublic::AttrRef ref) {
exists(Function method, Name selfUse |
method = cls.getAMethod() and

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about super classes? I see we take them into account for the ts/js implementation. Is there a reason why we don't model them here?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I'll implement that. I was about to re-run DCA anyway.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that I only implemented it for direct subclasses or superclasses, not mix-ins. We can always generalise it later if we find we need it.

not DataFlowDispatch::isStaticmethod(method) and
not DataFlowDispatch::isClassmethod(method) and
selfUse.getVariable() = method.getArg(0).(Name).getVariable() and
ref.getObject().asCfgNode().getNode() = selfUse and
ref.mayHaveAttributeName(attr)
)
}

/**
* Holds if `read` reads attribute `attr` from an instance of `cls`, where the instance
* is referred to from outside the methods of `cls` (i.e. an access of the form
* `instance.attr`, where `instance` is a reference to an instance of `cls`).
*
* This complements `selfAttrRef`, which only handles `self.attr` accesses inside the
* methods of `cls`. Unlike `selfAttrRef`, this depends on the call graph (via
* `classInstanceTracker`), so steps using it must be reported as `levelStepCall`.
*/
private predicate instanceAttrRead(Class cls, string attr, DataFlowPublic::AttrRead read) {
read.getObject() = DataFlowDispatch::classInstanceTracker(cls) and
read.mayHaveAttributeName(attr)

@BazookaMusic BazookaMusic Jun 17, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work nicely with @propertydecorators?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not properly, but I think that's consistent with what the python library already does, from what I can see.

}

/**
* Holds if `nodeFrom` is written to attribute `self.attr` in some instance method of a
* class, and `nodeTo` reads attribute `self.attr` in some (possibly different) instance
* method of the same class.
*
* This models flow through instance attributes (`self.foo`): a value stored into
* `self.foo` in one method can be read from `self.foo` in another method. Type-tracking
* handles the store and read steps via `AttrWrite`/`AttrRead`, but on its own it cannot
* relate the `self` of the writing method to the `self` of the reading method. Following
* the approach used for Ruby and JavaScript, we model this directly as a level step from
* the written value to the read reference, for any pair of methods on the class (not
* just from `__init__`).
*
* Flow across the class hierarchy (a write in one class observed in a method inherited
* from, or contributed by, a related class) is handled separately by
* `inheritedFieldStep`, because resolving superclasses depends on the call graph and so
* cannot appear in this call-graph-independent step.
*
* This is an over-approximation: it is instance-insensitive (it does not distinguish
Comment thread
owen-mc marked this conversation as resolved.
* between different instances of the same class) and order-insensitive (it does not
* require the write to happen before the read), matching the precision of
* instance-attribute handling for Ruby and JavaScript.
*/
private predicate localFieldStep(Node nodeFrom, LocalSourceNode nodeTo) {
exists(Class cls, string attr, DataFlowPublic::AttrWrite write, DataFlowPublic::AttrRead read |
selfAttrRef(cls, attr, write) and
nodeFrom = write.getValue() and
selfAttrRef(cls, attr, read) and
nodeTo = read
)
}

/**
* Holds if `nodeFrom` is written to attribute `self.attr` in an instance method of one
* class, and `nodeTo` reads attribute `self.attr` in an instance method of a different
* class that is related to it by inheritance (one is a transitive superclass of the
* other).
*
* This is the cross-hierarchy counterpart of `localFieldStep`: at runtime the receiver
* of both methods may be an instance of the more-derived class, whose behaviour is made
* up of the methods it declares together with those inherited from all of its ancestors.
* It therefore models the common pattern of a base class storing `self.attr` that a
* subclass reads, and vice versa. Resolving the superclass relationship depends on the
* call graph (via `getADirectSuperclass`), so this step is reported as `levelStepCall`
* rather than `levelStepNoCall`.
*
* Like `localFieldStep`, this is an over-approximation: it is both instance-insensitive
* and order-insensitive.
*/

Check warning

Code scanning / CodeQL

Misspelling Warning

This comment contains the non-US spelling 'behaviour', which should instead be 'behavior'.
Comment on lines +390 to +406
private predicate inheritedFieldStep(Node nodeFrom, LocalSourceNode nodeTo) {
exists(
Class writeCls, Class readCls, string attr, DataFlowPublic::AttrWrite write,
DataFlowPublic::AttrRead read
|
selfAttrRef(writeCls, attr, write) and
nodeFrom = write.getValue() and
selfAttrRef(readCls, attr, read) and
nodeTo = read and
writeCls != readCls and
(
writeCls = DataFlowDispatch::getADirectSuperclass*(readCls)
or
readCls = DataFlowDispatch::getADirectSuperclass*(writeCls)
)
)
}

/**
* Holds if `nodeFrom` is written to attribute `self.attr` in some instance method of a
* class, and `nodeTo` reads attribute `attr` from an instance of that class (or a
* subclass of it) outside its methods (e.g. `instance.attr`).
*
* This is the cross-instance counterpart of `localFieldStep`: it relates a write of
* `self.attr` inside a class to a read of `attr` on a reference to an instance of that
* class or one of its subclasses. Identifying instances relies on the call graph (via
* `classInstanceTracker`), so this step is reported as `levelStepCall` rather than
* `levelStepNoCall`. The write may occur in the instance's own class or in any of its
* superclasses, since those methods are inherited.
*
* Like `localFieldStep`, this is an over-approximation: it is both instance-insensitive
* and order-insensitive.
*/
private predicate instanceFieldStep(Node nodeFrom, LocalSourceNode nodeTo) {
exists(
Class writeCls, Class instanceCls, string attr, DataFlowPublic::AttrWrite write,
DataFlowPublic::AttrRead read
|
selfAttrRef(writeCls, attr, write) and
nodeFrom = write.getValue() and
instanceAttrRead(instanceCls, attr, read) and
nodeTo = read and
writeCls = DataFlowDispatch::getADirectSuperclass*(instanceCls)
)
}

/**
* Holds if data can flow from `node1` to `node2` in a way that discards call contexts.
*/
Expand Down
19 changes: 18 additions & 1 deletion python/ql/src/Resources/FileNotAlwaysClosedQuery.qll
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
import python
import semmle.python.dataflow.new.internal.DataFlowDispatch
import semmle.python.ApiGraphs
private import semmle.python.dataflow.new.internal.ReExposedInstance

/** A CFG node where a file is opened. */
abstract class FileOpenSource extends DataFlow::CfgNode { }
Expand All @@ -21,12 +22,28 @@ private DataFlow::TypeTrackingNode fileOpenInstance(DataFlow::TypeTracker t) {
exists(DataFlow::TypeTracker t2 | result = fileOpenInstance(t2).track(t2, t))
}

/**
* Holds if `node` is tracked to be an instance of an open file object.
*/
private predicate fileInstanceNode(DataFlow::Node node) {
fileOpenInstance(DataFlow::TypeTracker::end()).flowsTo(node)
}

private module FileReExposed = ReExposedInstance<fileInstanceNode/1>;

/**
* A call that returns an instance of an open file object.
* This includes calls to methods that transitively call `open` or similar.
*/
class FileOpen extends DataFlow::CallCfgNode {
FileOpen() { fileOpenInstance(DataFlow::TypeTracker::end()).flowsTo(this) }
FileOpen() {
fileOpenInstance(DataFlow::TypeTracker::end()).flowsTo(this) and
// Don't treat an accessor that merely re-exposes a file held in an instance attribute
// (e.g. `FileIO.fileno` returning `self._fd`) as opening a new file. Such flow is
// introduced by instance-attribute type tracking; the underlying open is tracked at its
// real creation site.
not FileReExposed::isReExposed(this)
}
}

/** A call that may wrap a file object in a wrapper class or `os.fdopen`. */
Expand Down
12 changes: 11 additions & 1 deletion python/ql/src/Statements/ModificationOfLocals.ql
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,19 @@

import python
private import semmle.python.ApiGraphs
private import semmle.python.dataflow.new.DataFlow

Check warning

Code scanning / CodeQL

Redundant import Warning

Redundant import, the module is already imported inside
semmle.python.ApiGraphs
.

predicate originIsLocals(ControlFlowNode n) {
API::builtin("locals").getReturn().getAValueReachableFromSource().asCfgNode() = n
// Only consider the `locals()` dictionary within the scope that called `locals()`.
// Once the dictionary is passed to another scope (e.g. as an argument or via an
// instance attribute) it is just an ordinary mapping, and modifying it is both
// meaningful and effective. Restricting to local (intraprocedural) flow ensures we
// only report modifications in the scope where the `locals()` gotcha actually applies.
exists(DataFlow::LocalSourceNode src, DataFlow::Node use |
src = API::builtin("locals").getReturn().asSource() and
src.flowsTo(use) and
use.asCfgNode() = n
)
}

predicate modification_of_locals(ControlFlowNode f) {
Expand Down
16 changes: 14 additions & 2 deletions python/ql/src/Statements/ShouldUseWithStatement.ql
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,9 @@
*/

import python
private import semmle.python.dataflow.new.DataFlow
private import semmle.python.dataflow.new.internal.DataFlowDispatch
private import semmle.python.dataflow.new.internal.ReExposedInstance

predicate calls_close(Call c) { exists(Attribute a | c.getFunc() = a and a.getName() = "close") }

Expand All @@ -23,11 +25,21 @@ predicate only_stmt_in_finally(Try t, Call c) {
)
}

from Call close, Try t, Class cls
/** Holds if `node` is tracked to be an instance of some class. */
private predicate classInstanceNode(DataFlow::Node node) { node = classInstanceTracker(_) }

private module ClassReExposed = ReExposedInstance<classInstanceNode/1>;

from Call close, Try t, Class cls, DataFlow::Node closeTarget
where
only_stmt_in_finally(t, close) and
calls_close(close) and
classInstanceTracker(cls).asExpr() = close.getFunc().(Attribute).getObject() and
closeTarget.asExpr() = close.getFunc().(Attribute).getObject() and
closeTarget = classInstanceTracker(cls) and
// Don't report closing a resource that is held in an instance attribute (e.g. `self.reader`).
// Such flow is introduced by instance-attribute type tracking; the object's lifetime is tied
// to the enclosing instance and cannot be expressed with a `with` statement.
not ClassReExposed::isReExposed(closeTarget) and
DuckTyping::isContextManager(cls)
select close,
"Instance of context-manager class $@ is closed in a finally block. Consider using 'with' statement.",
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
---
category: minorAnalysis
---
* The `py/modification-of-locals` query no longer flags modifications of a `locals()` dictionary that has been passed out of the scope in which `locals()` was called (for example, by passing it to another function or storing it in an instance attribute). In such cases the dictionary is used as an ordinary mapping and modifying it is meaningful, so these were false positives. The "modification has no effect" claim only applies within the scope that called `locals()`, which is now the only case reported.
Original file line number Diff line number Diff line change
@@ -1,9 +1,6 @@
testFailures
debug_callableNotUnique
pointsTo_found_typeTracker_notFound
| code/class_attr_assign.py:10:9:10:27 | ControlFlowNode for Attribute() | my_func |
| code/class_attr_assign.py:11:9:11:25 | ControlFlowNode for Attribute() | my_func |
| code/class_attr_assign.py:26:9:26:25 | ControlFlowNode for Attribute() | DummyObject.method |
| code/class_super.py:50:1:50:6 | ControlFlowNode for Attribute() | outside_def |
| code/conditional_in_argument.py:18:5:18:11 | ControlFlowNode for Attribute() | X.bar |
| code/func_defined_outside_class.py:21:1:21:11 | ControlFlowNode for Attribute() | A.foo |
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,8 @@ def __init__(self, func):
self.direct_ref = my_func

def later(self):
self.indirect_ref() # $ pt=my_func MISSING: tt=my_func
self.direct_ref() # $ pt=my_func MISSING: tt=my_func
self.indirect_ref() # $ pt=my_func tt=my_func
self.direct_ref() # $ pt=my_func tt=my_func

foo = Foo(my_func) # $ tt=Foo.__init__
foo.later() # $ pt,tt=Foo.later
Expand All @@ -23,7 +23,7 @@ def __init__(self):
self.obj = DummyObject()

def later(self):
self.obj.method() # $ pt=DummyObject.method MISSING: tt=DummyObject.method
self.obj.method() # $ pt=DummyObject.method tt=DummyObject.method


bar = Bar(my_func) # $ tt=Bar.__init__
Expand Down
Loading
Loading