Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 4 additions & 6 deletions PLAN.md
Original file line number Diff line number Diff line change
Expand Up @@ -542,19 +542,17 @@ Implementation details:

## 7. Next steps

**Phases 0 through 3 are complete and merged** (v1 = table-only RAG; Phase 2 = DocLayNet
layout-crop integration; Phase 3 = FUNSD relation baseline, both merged to `main`
2026-06-03). **Phase 4 (full demo + evaluation + report) is in progress** on
`feature/phase4-demo`; PR-A/PR-B/PR-C are implemented on the branch and ready for final review.
**Phases 0 through 4 are complete and merged to `main`** (v1 = table-only RAG; Phase 2 =
DocLayNet layout-crop integration; Phase 3 = FUNSD relation baseline; Phase 4 = final
integration demo + eval summary + report). Phase 4 merged via PR #25 on 2026-06-03.

Phase 4 PR-A delivered (Phase 4 summary backbone; see `docs/phase4_brief.md`):
`src/phase4_summary.py` (pure per-phase summarizers + inline layout-CSV aggregation + markdown
render), `scripts/build_phase4_summary.py` (writes `outputs/evaluation/phase4_summary.json` and
the committed `reports/phase4_metrics.md`), `tests/test_phase4_summary.py` (10 synthetic tests).
Report numbers are generated from the summary (never hand-copied), guarded by a no-drift gate.
PR-B (`reports/final_report.md` + `notebooks/07_final_report.ipynb`) and PR-C
(`scripts/run_demo.py` + `notebooks/06_demo.ipynb`, key-optional Gradio demo) are now present on
the integrated Phase 4 demo branch.
(`scripts/run_demo.py` + `notebooks/06_demo.ipynb`, key-optional Gradio demo) are on `main`.

Phase 3 V1 delivered (annotation-only deterministic relation baseline; see
`docs/phase3_brief.md`): `src/funsd_extraction.py` (parse + dedupe + per-answer-argmax
Expand Down
19 changes: 13 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,17 +47,24 @@ pytest

## Status

**Phases 0 through 3 are complete and merged to `main`.** Delivered: the repo foundation;
**Phases 0 through 4 are complete and merged to `main`.** Delivered: the repo foundation;
Phase 1A table topology (TATR grid derivation, spanning-cell mapping, grid validation,
occupancy-aware HTML parsing); Phase 1B OCR content extraction (word-to-cell assignment,
financial number normalization, content metrics); Phase 1C table-only RAG (BM25 + dense
BGE cosine + RRF retrieval, one chunk per table, single-provider grounded answer
generation, GT-filled vs OCR-filled corpora scored separately); Phase 2 DocLayNet
layout-crop integration (page-level region detection -> table crop -> the Phase 1A/1B
pipeline); and Phase 3 FUNSD relation-linking baseline (annotation-only deterministic
predictor, held-out `test_50.qa_links` F1 0.727).
predictor, held-out `test_50.qa_links` F1 0.727); and Phase 4 final integration — one
generated evaluation summary, an artifact-backed key-optional Gradio demo, and a written
report (no new research; GriTS / Ragas / DeepEval are future work).

Current phase: Phase 4 (full demo + evaluation + report) is in progress on
`feature/phase4-demo` — a final integration phase that aggregates the per-phase metrics into
one summary, a key-optional Gradio demo, and a written report. See [PLAN.md](PLAN.md) for
the phase roadmap.
Entry points:
- `python scripts/build_phase4_summary.py` -> `outputs/evaluation/phase4_summary.json` +
the committed `reports/phase4_metrics.md` (generated metrics; never hand-copied).
- `reports/final_report.md` / `notebooks/07_final_report.ipynb` — the final report.
- `python scripts/run_demo.py` / `notebooks/06_demo.ipynb` — the Gradio demo (launches with
no API key: BM25 retrieval + metrics + artifact views; answer generation needs
`OPENROUTER_API_KEY`).

See [PLAN.md](PLAN.md) for the phase roadmap.
2 changes: 1 addition & 1 deletion notebooks/06_demo.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@
"import os\n",
"\n",
"REPO = '/content/FinDocStructRAG'\n",
"BRANCH = 'feature/phase4-demo' # integrated Phase 4 branch; flip to 'main' after merge\n",
"BRANCH = 'main' # Phase 4 merged; tracks main\n",
"\n",
"if not os.path.isdir(f'{REPO}/.git'):\n",
" !git clone --quiet https://github.com/AD2000X/FinDocStructRAG.git {REPO}\n",
Expand Down
2 changes: 1 addition & 1 deletion notebooks/07_final_report.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@
"import os\n",
"\n",
"REPO = '/content/FinDocStructRAG'\n",
"BRANCH = 'feature/phase4-demo' # integrated Phase 4 branch; flip to 'main' after merge\n",
"BRANCH = 'main' # Phase 4 merged; tracks main\n",
"\n",
"if not os.path.isdir(f'{REPO}/.git'):\n",
" !git clone --quiet https://github.com/AD2000X/FinDocStructRAG.git {REPO}\n",
Expand Down
Loading