Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 18 additions & 1 deletion DEVLOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -181,6 +181,22 @@ Decisions outgrow this file, split them into `DECISIONS.md` (or `docs/adr/`).

---

## 2026-06-03 - Phase 4 final demo readiness pass

### Result - branch/docs/notebooks now point at the integrated final-demo branch

- **What changed:** aligned Phase 4 references from the earlier split branches to the integrated
`feature/phase4-demo` branch, where PR-A/PR-B/PR-C are now present together.
- **Demo fix:** `scripts/run_demo.py` launches Gradio with `allowed_paths` for
`outputs/layout/crops`, so Colab Drive-resident layout crop PNGs can be displayed in the
gallery without Gradio's `InvalidPathError`.
- **Naming cleanup:** replaced inflated wrap-up wording with "final demo", "final integration", or
"Phase 4 summary" to keep the project description practical.
- **Scope hygiene:** raw data and generated machine artifacts remain gitignored under `data/` and
`outputs/`; committed report artifacts stay under `reports/`.

---

## 2026-06-03 - Phase 4 eval-summary backbone (PR-A)

### Result - one summary aggregated from the per-phase artifacts; report numbers never hand-copied
Expand All @@ -203,7 +219,8 @@ Decisions outgrow this file, split them into `DECISIONS.md` (or `docs/adr/`).
relevant chunk per question, `src/eval_retrieval.py`); a missing artifact degrades to
`{"available": false}` rather than failing.
- **Result:** full `pytest` green (246, +10). Headline echoes: FUNSD `test_50.qa_links` F1 0.727;
QA `gt_markdown` answer_exact 0.675. PR-B (report) and PR-C (Gradio demo) follow.
QA `gt_markdown` answer_exact 0.675. PR-B (report) and PR-C (Gradio demo) later landed on the
integrated Phase 4 demo branch.

---

Expand Down
9 changes: 5 additions & 4 deletions PLAN.md
Original file line number Diff line number Diff line change
Expand Up @@ -545,15 +545,16 @@ Implementation details:
**Phases 0 through 3 are complete and merged** (v1 = table-only RAG; Phase 2 = DocLayNet
layout-crop integration; Phase 3 = FUNSD relation baseline, both merged to `main`
2026-06-03). **Phase 4 (full demo + evaluation + report) is in progress** on
`feature/phase4-demo-eval-report`; PR-A (the eval-summary backbone) has landed.
`feature/phase4-demo`; PR-A/PR-B/PR-C are implemented on the branch and ready for final review.

Phase 4 PR-A delivered (capstone summary backbone; see `docs/phase4_brief.md`):
Phase 4 PR-A delivered (Phase 4 summary backbone; see `docs/phase4_brief.md`):
`src/phase4_summary.py` (pure per-phase summarizers + inline layout-CSV aggregation + markdown
render), `scripts/build_phase4_summary.py` (writes `outputs/evaluation/phase4_summary.json` and
the committed `reports/phase4_metrics.md`), `tests/test_phase4_summary.py` (10 synthetic tests).
Report numbers are generated from the summary (never hand-copied), guarded by a no-drift gate.
Next: PR-B (`reports/final_report.md` + `notebooks/07_final_report.ipynb`) and PR-C
(`scripts/run_demo.py` + `notebooks/06_demo.ipynb`, key-optional Gradio demo).
PR-B (`reports/final_report.md` + `notebooks/07_final_report.ipynb`) and PR-C
(`scripts/run_demo.py` + `notebooks/06_demo.ipynb`, key-optional Gradio demo) are now present on
the integrated Phase 4 demo branch.

Phase 3 V1 delivered (annotation-only deterministic relation baseline; see
`docs/phase3_brief.md`): `src/funsd_extraction.py` (parse + dedupe + per-answer-argmax
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,6 @@ pipeline); and Phase 3 FUNSD relation-linking baseline (annotation-only determin
predictor, held-out `test_50.qa_links` F1 0.727).

Current phase: Phase 4 (full demo + evaluation + report) is in progress on
`feature/phase4-demo-eval-report` — a capstone that aggregates the per-phase metrics into
`feature/phase4-demo` — a final integration phase that aggregates the per-phase metrics into
one summary, a key-optional Gradio demo, and a written report. See [PLAN.md](PLAN.md) for
the phase roadmap.
20 changes: 10 additions & 10 deletions docs/phase4_brief.md
Original file line number Diff line number Diff line change
@@ -1,17 +1,16 @@
# Phase 4 — Demo + Eval Summary + Final Report (capstone)
# Phase 4 — Final Demo + Eval Summary + Final Report

> Implementation brief for Phase 4. Committed in the repo (travels with `git pull` to Colab) so
> the references to it in `DEVLOG.md` and the `src/phase4_summary.py` /
> `scripts/build_phase4_summary.py` docstrings resolve. Status: PR-A (the eval-summary backbone)
> implemented on `feature/phase4-demo-eval-report` — `src/phase4_summary.py`,
> `scripts/build_phase4_summary.py`, `tests/test_phase4_summary.py`, and the generated
> `reports/phase4_metrics.md`. PR-B (report) and PR-C (demo) follow.
> `scripts/build_phase4_summary.py` docstrings resolve. Status: PR-A/PR-B/PR-C are implemented
> on `feature/phase4-demo` — summary backbone, generated metrics, final report, report notebook,
> key-optional Gradio demo, and demo notebook.

## Context

Phases 0-3 are merged to `main` (FinTabNet.c table topology + OCR content + table-only RAG +
DocLayNet layout + FUNSD relations). Phase 4 is the **capstone**: make the work presentable,
reportable, and reproducible. It is explicitly **not new research** — it assembles the existing
DocLayNet layout + FUNSD relations). Phase 4 is the **final integration**: make the work
presentable, reportable, and reproducible. It is explicitly **not new research** — it assembles the existing
deterministic/custom metrics into one summary, a Gradio demo, and a written report.
GriTS/Ragas/DeepEval are future work.

Expand Down Expand Up @@ -97,8 +96,9 @@ cross-encoder reranker / learned query routing; live PDF -> pipeline; HF Spaces
## Build order (TDD) + PR boundaries
- **PR-A (core, done):** tests -> `src/phase4_summary.py` -> `scripts/build_phase4_summary.py` ->
generated `reports/phase4_metrics.md`; + README/DEVLOG/PLAN docs.
- **PR-B (report):** `reports/final_report.md` + `notebooks/07_final_report.ipynb`.
- **PR-C (demo):** `scripts/run_demo.py` + `notebooks/06_demo.ipynb`.
- **PR-B (report, done):** `reports/final_report.md` + `notebooks/07_final_report.ipynb`.
- **PR-C (demo, done):** `scripts/run_demo.py` + `notebooks/06_demo.ipynb`.

## Branch
`feature/phase4-demo-eval-report` cut from the latest `origin/main` after `git fetch`.
`feature/phase4-demo` integrates PR-A/PR-B/PR-C and was cut from the latest `origin/main` after
`git fetch`.
2 changes: 1 addition & 1 deletion notebooks/06_demo.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@
"import os\n",
"\n",
"REPO = '/content/FinDocStructRAG'\n",
"BRANCH = 'feature/phase4-demo' # PR-C; flip to 'main' after merge\n",
"BRANCH = 'feature/phase4-demo' # integrated Phase 4 branch; flip to 'main' after merge\n",
"\n",
"if not os.path.isdir(f'{REPO}/.git'):\n",
" !git clone --quiet https://github.com/AD2000X/FinDocStructRAG.git {REPO}\n",
Expand Down
6 changes: 3 additions & 3 deletions notebooks/07_final_report.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
"source": [
"# Phase 4 - Final report (Colab runner)\n",
"\n",
"Runner only: mount Drive, pull the Phase 4 branch, regenerate the capstone summary from the staged\n",
"Runner only: mount Drive, pull the Phase 4 branch, regenerate the Phase 4 summary from the staged\n",
"evaluation artifacts, then render the final report and the generated metrics table inline. Logic\n",
"lives in `src/` and `scripts/`, not in this notebook (P1/P2).\n",
"\n",
Expand Down Expand Up @@ -43,7 +43,7 @@
"import os\n",
"\n",
"REPO = '/content/FinDocStructRAG'\n",
"BRANCH = 'feature/phase4-report' # PR-B; flip to 'main' after merge\n",
"BRANCH = 'feature/phase4-demo' # integrated Phase 4 branch; flip to 'main' after merge\n",
"\n",
"if not os.path.isdir(f'{REPO}/.git'):\n",
" !git clone --quiet https://github.com/AD2000X/FinDocStructRAG.git {REPO}\n",
Expand Down Expand Up @@ -79,7 +79,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 1 - build the capstone summary\n",
"## Step 1 - build the Phase 4 summary\n",
"\n",
"Aggregates the per-phase artifacts on Drive into `outputs/evaluation/phase4_summary.json` and the\n",
"committed `reports/phase4_metrics.md`. Re-running is idempotent (no-drift)."
Expand Down
6 changes: 3 additions & 3 deletions reports/final_report.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

A layout-aware pipeline for extracting structured tables from financial-report PDFs and
answering questions over them, plus a standalone form relation-linking baseline. This report
is the Phase 4 capstone: it states what was built, how it was evaluated, and what the results
is the Phase 4 final report: it states what was built, how it was evaluated, and what the results
mean. **All metric numbers are generated** by `scripts/build_phase4_summary.py` into
`reports/phase4_metrics.md` and are never hand-copied into this prose;
`notebooks/07_final_report.ipynb` renders that generated table inline beneath this report.
Expand Down Expand Up @@ -100,7 +100,7 @@ per-phase notebooks (`notebooks/01`-`05`) are the runners for steps 1-6.
5. **Phase 2 layout.** `run_layout_batch.py` -> `eval_layout_iou.py --require-table-gt` (pos) and
`--exclude-table-gt` (neg) -> `smoke_structure.py`.
6. **Phase 3 relations.** `evaluate_funsd.py`.
7. **Capstone summary.** `python scripts/build_phase4_summary.py` ->
7. **Phase 4 summary.** `python scripts/build_phase4_summary.py` ->
`reports/phase4_metrics.md` + `outputs/evaluation/phase4_summary.json` (this report reads the
former).
8. **Demo.** `python scripts/run_demo.py` (key-optional Gradio; PR-C).
8. **Demo.** `python scripts/run_demo.py` (key-optional Gradio final demo).
4 changes: 2 additions & 2 deletions scripts/build_phase4_summary.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#!/usr/bin/env python3
"""Build the Phase 4 capstone summary from the per-phase evaluation artifacts.
"""Build the Phase 4 summary from the per-phase evaluation artifacts.

Reads the five metrics JSONs + the three Phase 2 layout CSVs from outputs/, aggregates them with
the pure helpers in src/phase4_summary.py, and writes:
Expand Down Expand Up @@ -50,7 +50,7 @@ def _layout_part(layout_dir: Path):


def main() -> None:
ap = argparse.ArgumentParser(description="Build the Phase 4 capstone summary.")
ap = argparse.ArgumentParser(description="Build the Phase 4 summary.")
ap.add_argument("--run-id", default="mvp_rand",
help="run-id suffix of the Phase 1A/1B deliverable artifacts")
args = ap.parse_args()
Expand Down
8 changes: 4 additions & 4 deletions scripts/run_demo.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#!/usr/bin/env python3
"""Phase 4 demo: artifact-backed Gradio app for the FinDocStructRAG capstone.
"""Phase 4 final demo: artifact-backed Gradio app for FinDocStructRAG.

Serves the already-produced evaluation artifacts (metrics, table outputs, layout crops, FUNSD
results) and does live BM25 retrieval + (optional) grounded answer generation over the existing
Expand Down Expand Up @@ -238,7 +238,7 @@ def funsd_view() -> str:

def overview_view() -> str:
summary = _load_json(config.EVALUATION / "phase4_summary.json")
parts = ["## Capstone overview", ""]
parts = ["## Project overview", ""]
if summary:
parts.append("**Artifact availability:** " + ", ".join(
f"{name}={'OK' if part.get('available') else 'MISSING'}" for name, part in summary.items()))
Expand Down Expand Up @@ -277,8 +277,8 @@ def main() -> None:
pages = list_layout_pages()
answer_gen = "enabled" if HAS_KEY else "disabled (no OPENROUTER_API_KEY)"

with gr.Blocks(title="FinDocStructRAG capstone demo") as demo:
gr.Markdown(f"# FinDocStructRAG - capstone demo\n"
with gr.Blocks(title="FinDocStructRAG final demo") as demo:
gr.Markdown(f"# FinDocStructRAG - final demo\n"
f"Artifact-backed. Retrieval: {', '.join(RETRIEVAL_METHODS)}. "
f"Answer generation: {answer_gen}.")

Expand Down
2 changes: 1 addition & 1 deletion src/phase4_summary.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
"""Phase 4 capstone: aggregate the per-phase evaluation artifacts into one summary.
"""Phase 4 summary: aggregate the per-phase evaluation artifacts into one summary.

Pure helpers only - no file IO, no Drive, no gradio. Each summarizer takes an already-loaded
metrics dict (the per-phase evaluation JSON) or parsed CSV rows (layout) and returns a normalized
Expand Down
2 changes: 1 addition & 1 deletion tests/test_phase4_summary.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
"""Phase 4 capstone summary tests (CPU, synthetic) - Phase 4.
"""Phase 4 summary tests (CPU, synthetic) - Phase 4.

The summarizers take already-loaded metrics dicts (the per-phase evaluation JSONs) or parsed
CSV rows (layout) and return normalized summary dicts; no file IO, no Drive, no gradio is
Expand Down
Loading