diff --git a/.ai/skills/audit-skill-md/SKILL.md b/.ai/skills/audit-skill-md/SKILL.md
new file mode 100644
index 000000000..30e1a90fd
--- /dev/null
+++ b/.ai/skills/audit-skill-md/SKILL.md
@@ -0,0 +1,284 @@
+<!---
+  Licensed to the Apache Software Foundation (ASF) under one
+  or more contributor license agreements.  See the NOTICE file
+  distributed with this work for additional information
+  regarding copyright ownership.  The ASF licenses this file
+  to you under the Apache License, Version 2.0 (the
+  "License"); you may not use this file except in compliance
+  with the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing,
+  software distributed under the License is distributed on an
+  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  KIND, either express or implied.  See the License for the
+  specific language governing permissions and limitations
+  under the License.
+-->
+
+---
+name: audit-skill-md
+description: Audit the user-facing skill at skills/datafusion_python/SKILL.md against the current public Python API. Find new APIs that should be documented, stale mentions of removed/renamed APIs, examples that drifted from current idiomatic style, and places that need a "requires datafusion-python NN or newer" note. Run after upstream syncs and before each release.
+argument-hint: [scope] (e.g., "session-context", "dataframe", "expr", "functions", "patterns", "pitfalls", "version-notes", "all")
+---
+
+# Audit `skills/datafusion_python/SKILL.md`
+
+You are auditing the user-facing skill at
+[`skills/datafusion_python/SKILL.md`](../../skills/datafusion_python/SKILL.md)
+against the current state of the Python API. The skill is the source of truth
+for how AI coding assistants are taught to write `datafusion-python` code, so
+it must match what the project actually ships. This skill identifies gaps
+caused by upstream syncs, refactors, or renames, and (if asked) applies the
+edits directly to `SKILL.md`.
+
+The skill is most usefully run **after** the `check-upstream` step of an
+upstream sync (see `dev/release/upstream-sync.md`) — once any new APIs are
+exposed, this skill makes sure they get documented.
+
+## What the skill covers
+
+The user-facing `SKILL.md` documents these public surfaces. This list is not
+exhaustive — if a new top-level area is added (e.g., a new `Catalog` API
+exposed at the package root), include it.
+
+| Surface | Module | Sections in SKILL.md |
+|---|---|---|
+| `SessionContext` | `python/datafusion/context.py` | "Data Loading" |
+| `DataFrame` | `python/datafusion/dataframe.py` | "DataFrame Operations Quick Reference", "Executing and Collecting Results", "Idiomatic Patterns" |
+| `Expr` | `python/datafusion/expr.py` | "Expression Building", "Common Pitfalls" |
+| `functions` | `python/datafusion/functions.py` | "Available Functions (Categorized)", scattered uses throughout |
+| Top-level helpers (`col`, `lit`, `WindowFrame`, ...) | `python/datafusion/__init__.py` | "Import Conventions", "Core Abstractions" |
+
+## Scope argument
+
+The user may specify a scope via `$ARGUMENTS` to limit the audit. If no scope
+is given or `all` is specified, audit every area.
+
+| Scope | Audit target |
+|---|---|
+| `session-context` | `SessionContext` methods and the "Data Loading" section |
+| `dataframe` | `DataFrame` methods and the operations / executing / patterns sections |
+| `expr` | `Expr` methods/operators and the "Expression Building" section |
+| `functions` | `functions.py` `__all__` and the "Available Functions (Categorized)" section |
+| `patterns` | "Idiomatic Patterns" section — confirm patterns still match recommended style |
+| `pitfalls` | "Common Pitfalls" — confirm each pitfall still reproduces, drop ones fixed upstream |
+| `version-notes` | Cross-check version annotations (see below) |
+| `all` | Everything above |
+
+## Inputs to read
+
+Before producing the report:
+
+1. `skills/datafusion_python/SKILL.md` — the document being audited.
+2. The relevant Python module(s) for the chosen scope. Public surface is the
+   `__all__` list (where defined) plus `class` and `def` symbols not prefixed
+   with `_`.
+3. `Cargo.toml` (root) for the current `datafusion-python` version — read
+   the `version` field under `[workspace.package]` (format `NN.0.0`). The
+   major version always matches the upstream `datafusion` crate, so a
+   single `datafusion-python` version expresses both.
+   `python/datafusion/__init__.py`'s `__version__` is the same value
+   exposed at runtime.
+4. Recent commits touching the relevant module(s) for context on what
+   changed since the last sync:
+   ```bash
+   git log --oneline -- python/datafusion/dataframe.py | head -20
+   ```
+
+## What to look for
+
+Walk through each scoped area and flag four kinds of issues.
+
+### 1. New APIs not mentioned
+
+For each public symbol in the module's `__all__` (or each public class
+method), check whether it appears anywhere in `SKILL.md`. A symbol is
+"covered" if it shows up in:
+
+- A code block (the strongest signal — it's demonstrated).
+- The "Available Functions (Categorized)" list.
+- The SQL-to-DataFrame Reference table.
+
+**Decide whether each missing symbol deserves an entry.** Not every public
+symbol belongs in `SKILL.md` — the skill is curated for the patterns users
+hit daily, not exhaustive API reference. Use these heuristics:
+
+- **Add it** if it replaces or supersedes something already in the skill
+  (e.g., a new operation that is the idiomatic alternative to a documented
+  workaround).
+- **Add it** if it fits a category already present (a new aggregate function
+  goes in the aggregate list; a new join type goes in the joining section).
+- **Add it** if it changes how a documented pattern should be written.
+- **Skip it** if it is genuinely niche / advanced / experimental.
+- **Skip it** if it is internal plumbing exposed for FFI but not user-facing.
+
+When you flag a missing symbol, include a one-line proposed insertion point
+(which section / which table row) so a reviewer can decide quickly.
+
+### 2. Stale mentions
+
+For each function name, method name, or import shown in `SKILL.md`, verify it
+still exists in the current API:
+
+- Function names mentioned in prose or in the categorized list should appear
+  in `python/datafusion/functions.py`'s `__all__`.
+- Method calls in code blocks should resolve against the current class.
+- Imports (`from datafusion import ...`) should succeed against the current
+  `__init__.py`.
+
+A quick way to check imports without running them:
+
+```bash
+python -c "from datafusion import SessionContext, col, lit; from datafusion import functions as F; print('ok')"
+```
+
+For each stale mention, propose either:
+- a rename to the current name, or
+- removal if the API is gone with no replacement.
+
+### 3. Examples that drifted from idiomatic style
+
+The skill teaches a Pythonic style: prefer plain strings to `col(...)` when a
+column reference is all you need; prefer raw Python values to `lit(...)`
+where auto-wrapping applies. Recent refactors (see the `make-pythonic`
+skill) keep moving more functions toward accepting native types.
+
+For each code example in `SKILL.md`, check:
+
+- Does it use `lit(value)` where a raw value would work? Comparison RHS,
+  arithmetic with a column, etc. all auto-wrap. (Reserve `lit()` for the
+  cases listed in pitfall #2.)
+- Does it use `col("name")` where a plain string would work? `select(...)`,
+  `aggregate([keys], ...)`, `sort(...)`, `sort_by(...)` all accept plain
+  name strings.
+- Do `functions.py` calls match the current pythonic signature for that
+  function? If `make-pythonic` recently changed a signature (e.g.,
+  `repeat(string, n: Expr | int)`), the example should pass `3` rather than
+  `lit(3)`.
+- Does any example use a deprecated or removed parameter name?
+
+For drift, propose the updated snippet. If the change is purely stylistic
+and the older form still works, mark the suggestion as **non-blocking**.
+
+### 4. Missing or stale version notes
+
+When an API depends on a specific version, the skill should say so —
+otherwise an agent referencing the skill in an older project will write
+code that fails at import or at runtime.
+
+`datafusion-python` shares its major version number with the upstream
+`datafusion` crate (e.g., `datafusion-python 53.x` tracks upstream
+`datafusion 53`). Always express version requirements in terms of
+`datafusion-python` only — there is no need to call out upstream and
+package versions separately.
+
+Add a version note when:
+
+- A method or function shown in the skill was added in a specific release
+  (e.g., a new `DataFrame` method that didn't exist before 53).
+- A breaking change altered behavior in a specific release (signature
+  change, default-value change, new required argument).
+- A pitfall was fixed in a specific release. Either annotate the pitfall
+  block with "fixed in datafusion-python NN, kept here for users on older
+  versions" or remove it once the supported floor moves past that version.
+
+Format for version notes (inline, italicized):
+
+```markdown
+*Requires datafusion-python 53 or newer.*
+```
+
+For each missing/stale version note, propose the exact line and where it
+belongs.
+
+## How to discover changes since the last audit
+
+If the user supplies a previous version or commit SHA where the audit was
+last run, diff against it:
+
+```bash
+# Public-API-relevant changes since SHA <prev>
+git log --oneline <prev>..HEAD -- python/datafusion/
+
+# Whose signatures actually moved
+git diff <prev>..HEAD -- python/datafusion/functions.py | grep '^[+-]def '
+```
+
+If no prior audit point is given, fall back to "since the last upstream
+sync" by inspecting commits that touch `Cargo.toml`'s `datafusion` pin:
+
+```bash
+git log --oneline -- Cargo.toml | grep -i datafusion | head -5
+```
+
+## Output Format
+
+Produce a report grouped by scope. Each finding is one bullet with a
+proposed action, so a maintainer can review the list quickly and apply
+edits in order.
+
+```
+## SKILL.md Audit (scope: <scope>)
+
+Audited against:
+- skills/datafusion_python/SKILL.md @ <git SHA / "working tree">
+- datafusion-python <version>
+
+### New APIs to cover
+- `DataFrame.foo()` — added in datafusion-python 53. Insert in "DataFrame Operations Quick Reference" under <subsection>.
+  Proposed snippet:
+  ```python
+  df.foo(...)
+  ```
+
+### Stale mentions
+- "old_function_name" referenced in the categorized list (line N) — renamed to "new_function_name". Replace.
+
+### Drifted examples
+- "Filtering" section, `df.filter(col("a") > lit(10))` — drop `lit(10)`, auto-wrap applies. (non-blocking)
+- "Aggregation" section, `df.aggregate([col("region")], ...)` — pass `"region"` as a plain string per "Projection" guidance.
+
+### Version notes
+- `DataFrame.foo()` block needs *Requires datafusion-python 53 or newer.*
+- "Common Pitfalls" #N — fixed in datafusion-python 53; remove the pitfall and update the SQL-to-DataFrame row to no longer flag the workaround.
+
+### No-change confirmed
+- `SessionContext` data-loading section — all entries match current API.
+```
+
+If asked to apply the changes, edit `skills/datafusion_python/SKILL.md`
+directly with `Edit` tool calls, one finding at a time, and re-run the
+relevant doctest sanity check at the end:
+
+```bash
+pytest --doctest-modules python/datafusion -q
+```
+
+## What NOT to flag
+
+- **Internal helpers / underscored names.** Private symbols are not part of
+  the user-facing surface.
+- **Functions intentionally omitted.** Niche / advanced APIs (custom
+  catalogs, raw FFI plumbing, low-level execution plan accessors) live in
+  the API reference, not the skill. If an omission was deliberate and a
+  comment / commit explains why, leave it out.
+- **Style nits inside explanatory prose.** The skill mixes example code and
+  prose; only enforce the pythonic style on actual code blocks.
+- **Function-by-function coverage of every `functions.py` symbol.** The
+  "Available Functions (Categorized)" list is curated by category, not
+  exhaustive. Adding a single new aggregate to the aggregate list is
+  enough — the user follows the pointer to the API reference for the rest.
+
+## Coordination with other skills
+
+- Run `/check-upstream` first to expose any missing upstream APIs into the
+  Python layer. Without that, this skill cannot recommend documenting
+  something that is not yet exposed.
+- Run `/make-pythonic` before this skill if a Pythonic-signature pass is
+  planned for a release — that way this skill can update examples to the
+  final signature in one shot rather than churning them twice.
+- The order during an upstream sync (PR 3 of `dev/release/upstream-sync.md`)
+  is therefore: `/check-upstream` → `/make-pythonic` (optional) →
+  `/audit-skill-md`.
diff --git a/dev/release/README.md b/dev/release/README.md
index 4833be55a..1b02ca576 100644
--- a/dev/release/README.md
+++ b/dev/release/README.md
@@ -33,6 +33,12 @@ release branch without blocking ongoing development in the `main` branch.
 We can cherry-pick commits from the `main` branch into `branch-53` as needed and then create new patch releases
 from that branch.
 
+## Upstream Sync
+
+Between releases the `main` branch is periodically synced to a newer upstream `apache/datafusion` version. This is
+broken into a three-PR workflow (bump + fix breakage, consolidate transitive deps, fill API and documentation gaps).
+See [`upstream-sync.md`](upstream-sync.md) for the full process.
+
 ## Detailed Guide
 
 ### Pre-requisites
@@ -53,6 +59,9 @@ You will also need access to the [datafusion](https://test.pypi.org/project/data
 Before creating a new release:
 
 - We need to ensure that the main branch does not have any GitHub dependencies
+- Confirm the upstream sync workflow in [`upstream-sync.md`](upstream-sync.md) has been completed for this release cycle
+  (crate bump + breakage fixes, transitive dependency consolidation, and the `/check-upstream` and `/audit-skill-md`
+  passes). Any gaps surfaced by those skills should land before the release branch is cut.
 - a PR should be created and merged to update the major version number of the project
 - A new release branch should be created, such as `branch-53`
 - It is best to push this branch to the apache repository rather than a personal fork in case patch releases are required.
diff --git a/dev/release/upstream-sync.md b/dev/release/upstream-sync.md
new file mode 100644
index 000000000..dc05b4195
--- /dev/null
+++ b/dev/release/upstream-sync.md
@@ -0,0 +1,170 @@
+<!---
+  Licensed to the Apache Software Foundation (ASF) under one
+  or more contributor license agreements.  See the NOTICE file
+  distributed with this work for additional information
+  regarding copyright ownership.  The ASF licenses this file
+  to you under the Apache License, Version 2.0 (the
+  "License"); you may not use this file except in compliance
+  with the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing,
+  software distributed under the License is distributed on an
+  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  KIND, either express or implied.  See the License for the
+  specific language governing permissions and limitations
+  under the License.
+-->
+
+# Upstream Sync Process
+
+This document describes how to sync `datafusion-python` to a new version of the
+upstream `apache/datafusion` Rust crates. This is a recurring task: between
+official releases the `main` branch tracks DataFusion via crates.io or GitHub
+dependencies, and we periodically bump those dependencies to pick up new
+features and bug fixes.
+
+The work is broken into **three sequential PRs** rather than landing as one
+large change. Splitting reviews along these lines keeps each PR focused, makes
+breakage easier to bisect, and lets reviewers concentrate on one concern at a
+time.
+
+## PR 1: Bump DataFusion crate dependencies and fix breakage
+
+**Goal:** update the upstream `datafusion` crate version and make the project
+build, test, and lint cleanly against it.
+
+1. In the root `Cargo.toml`, update:
+   - `[workspace.package].version` to the new major (the `datafusion-python`
+     major tracks the upstream `datafusion` major, so a 53→54 bump moves
+     this from `53.0.0` to `54.0.0`), and
+   - every `datafusion` / `datafusion-*` entry in `[workspace.dependencies]`
+     to the same new major.
+
+   Per-crate manifests under `crates/` inherit these pins via
+   `workspace = true` and need no edit.
+2. Update `Cargo.lock` for the datafusion family only — leave unrelated
+   transitives at their current pins so PR 2 can address them deliberately.
+   List every `datafusion-*` workspace dependency with `-p`:
+
+   ```bash
+   cargo update \
+     -p datafusion \
+     -p datafusion-substrait \
+     -p datafusion-proto \
+     -p datafusion-ffi \
+     -p datafusion-catalog \
+     -p datafusion-common \
+     -p datafusion-functions-aggregate \
+     -p datafusion-functions-window \
+     -p datafusion-expr
+   ```
+
+   Or pin exact versions with `--precise`, one crate at a time:
+
+   ```bash
+   cargo update -p datafusion --precise 54.0.0
+   # repeat for each datafusion-* sibling
+   ```
+
+   A bare `cargo update` would refresh every transitive crate and blur the
+   diff between PR 1 and PR 2.
+3. Run the standard build and test commands and address compilation errors,
+   API renames, signature changes, and behavior changes:
+   - `cargo build`
+   - `cargo test`
+   - `pytest`
+   - `pre-commit run --all-files`
+4. Fix only what's needed to restore green CI. Resist the urge to bundle
+   unrelated cleanups — those belong in their own PR.
+5. If a breaking change in upstream requires a user-facing API change in
+   `datafusion-python`, add the `api change` label and document the change
+   in the PR description so it surfaces in the changelog.
+
+**Reference PRs:** [#1311](https://github.com/apache/datafusion-python/pull/1311)
+(DF51), [#1337](https://github.com/apache/datafusion-python/pull/1337) (DF52).
+
+## PR 2: Consolidate transitive dependencies
+
+**Goal:** after the upstream bump, the dependency tree may have multiple
+versions of the same transitive crate (for example, two `arrow` versions, two
+`object_store` versions). Reconcile these so we ship a single coherent set.
+
+1. Inspect the lockfile for duplicates:
+   ```bash
+   cargo tree --duplicates
+   ```
+2. For each duplicate that matters (Arrow, `object_store`, `parquet`,
+   `tokio`, `arrow-flight`, etc.), update our direct dependency declarations
+   in `Cargo.toml` to versions compatible with what upstream DataFusion now
+   pulls in. The goal is one version of each ecosystem-critical crate.
+3. Re-run `cargo update` and re-run the full test matrix. Some duplicates are
+   benign (small leaf crates with no FFI surface) and can be left alone if
+   reconciliation would force a much larger change. Use judgment.
+4. If consolidating forces a behavioral change visible to users (for example,
+   a newer `pyarrow`-compatible Arrow version), call it out in the PR
+   description.
+
+Keeping this work separate from PR 1 means PR 1 stays a "make it compile"
+review and PR 2 stays a "tidy the dependency graph" review.
+
+## PR 3: Fill API and documentation gaps
+
+**Goal:** with the upstream version locked in, identify new APIs that landed
+upstream and decide whether to expose them, and update agent-facing
+documentation so it still matches the surface we ship.
+
+1. Run the `check-upstream` skill (`.ai/skills/check-upstream/SKILL.md`) to
+   diff the upstream Rust API against what's exposed in
+   `python/datafusion/`. The skill covers scalar/aggregate/window/table
+   functions, `DataFrame` methods, `SessionContext` methods, and FFI types.
+   Invoke it from the assistant with `/check-upstream` (optionally scoped to
+   one area, e.g. `/check-upstream scalar functions`).
+2. For each gap, decide whether to:
+   - Expose it now (small, obvious additions can land in this PR).
+   - File a tracking issue (anything non-trivial — separate PR per feature
+     keeps reviews focused).
+   - Skip it (internal-only or already covered by an existing API; record
+     the decision in the "Evaluated and not requiring exposure" sections of
+     the skill so future runs don't re-flag it).
+3. (Optional) Run the `make-pythonic` skill
+   (`.ai/skills/make-pythonic/SKILL.md`) over any newly exposed APIs to
+   align signatures with the project's Pythonic style (accepting plain
+   strings for column names, raw Python values where auto-wrapping
+   applies, etc.). Invoke it from the assistant with `/make-pythonic`.
+   Running this *before* the audit step means examples in `SKILL.md` get
+   updated to the final signature in one pass instead of churning twice.
+   Larger reshapes still belong in their own PR.
+4. Run the `audit-skill-md` skill (`.ai/skills/audit-skill-md/SKILL.md`) to
+   cross-reference the user-facing skill at
+   [`skills/datafusion_python/SKILL.md`](../../skills/datafusion_python/SKILL.md)
+   against the current public API. The skill flags stale function names,
+   missing newly exposed APIs, examples that drifted from idiomatic style,
+   and missing version notes. Invoke it from the assistant with
+   `/audit-skill-md` (optionally scoped, e.g. `/audit-skill-md dataframe`).
+   Apply the resulting edits to `SKILL.md` and to the relevant RST pages
+   under `docs/source/user-guide/common-operations/`.
+5. If new aggregate or window functions were exposed in step 2, also update:
+   - `docs/source/user-guide/common-operations/aggregations.rst`
+   - `docs/source/user-guide/common-operations/windows.rst`
+
+## Why three PRs
+
+- **Bisectable.** If a regression appears, `git bisect` lands on the
+  responsible PR (compile fix, dependency consolidation, or API addition)
+  rather than a single mega-commit.
+- **Reviewable.** Each PR has a single concern. Reviewers reading PR 1 don't
+  need to also reason about whether new APIs are well-named.
+- **Skippable.** Some upstream syncs are pure version bumps with no new APIs
+  worth exposing. PR 3 can be empty or merged as a no-op if the audit comes
+  back clean.
+
+## Related documents
+
+- [`README.md`](README.md) — the broader release process (this sync work
+  feeds into the next official release).
+- [`.ai/skills/check-upstream/SKILL.md`](../../.ai/skills/check-upstream/SKILL.md)
+  — API coverage audit.
+- [`skills/datafusion_python/SKILL.md`](../../skills/datafusion_python/SKILL.md)
+  — user-facing agent guide kept in sync via PR 3.