Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
69 changes: 39 additions & 30 deletions docs/prd-deterministic-audit-cli.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
# Product Requirements Document: Deterministic GEO/SEO Audit CLI + Skill Wrapper

Status: Current implementation baseline plus v1.5 roadmap
Status: Current implementation baseline plus developer repo-audit roadmap
Date: 2026-05-18
Repository: openclaw-geo-seo-audit-skill
Current CLI package: openclaw-geo-seo-audit@0.2.0
Current CLI package: openclaw-geo-seo-audit@0.2.0; main includes unreleased repo audit mode
Primary users: technical SEOs, growth teams, content strategists, frontend engineers, agency auditors, and AI agents running OpenClaw skills

## 1. Summary
Expand All @@ -19,21 +19,21 @@ The production direction remains a deterministic SEO/GEO readiness auditor. Actu

### Current baseline

As of `openclaw-geo-seo-audit@0.2.0`, the repository contains a working deterministic CLI and OpenClaw skill wrapper. The CLI can audit local HTML, live URLs, URL lists, sitemap-seeded crawls, and bounded same-origin crawls. It emits JSON and Markdown, imports supplied ranking/performance evidence, evaluates deterministic page and site rules, and includes restricted-mode guardrails for untrusted targets.
As of `openclaw-geo-seo-audit@0.2.0` plus the merged repo-audit branch, the repository contains a working deterministic CLI and OpenClaw skill wrapper. The CLI can audit local HTML, live URLs, URL lists, sitemap-seeded crawls, bounded same-origin crawls, static output repositories, and explicit preview-server repositories. It emits JSON and Markdown, imports supplied ranking/performance evidence, evaluates deterministic page and site rules, includes repo evidence for source-repository audits, and includes restricted-mode guardrails for untrusted targets.

The next approved product target is v1.5 repo-to-audit mode: the CLI should inspect a website source repository, detect how to build or preview it, audit the generated site through the existing evidence engine, and add source-level findings where deterministic.
The next approved product target is developer-focused repo audit completion: make repository audits easier to run in local development and CI by adding explicit build support, route-list/config parity, richer framework fixtures, and deterministic source-level findings without overclaiming rankings.

## 2. Problem

The current repository contains a working deterministic CLI, an OpenClaw skill wrapper, a Google Search Central citation corpus, report templates, fixtures, golden-output tests, and release workflows. The remaining problem is no longer whether deterministic auditing exists; it is how to stabilize the current beta and extend it to source-repository audits without overclaiming ranking or AI-answer measurement.
The current repository contains a working deterministic CLI, an OpenClaw skill wrapper, a Google Search Central citation corpus, report templates, fixtures, golden-output tests, release workflows, and initial source-repository audit mode. The remaining problem is no longer whether deterministic auditing exists; it is how to make developer repo audits practical enough for repeated local and CI use without overclaiming ranking or AI-answer measurement.

Current remaining gaps:

- The CLI audits live URLs, local files, localhost apps, URL lists, and bounded crawls, but it does not yet treat a source repository as a first-class audit target.
- Release documentation and changelog entries must stay aligned with the implemented `0.2.0` baseline before publishing.
- The CLI can audit static output and explicit preview-server repositories, but it does not yet support explicit build commands, route-list parity, repo config files, or framework-specific fixture coverage beyond generic static/npm preview workflows.
- Release documentation and changelog entries must stay aligned with the implemented `0.2.0` baseline plus unreleased repo-audit work before publishing the next package version.
- Some rule IDs exist in the taxonomy before full trigger coverage, especially deeper entity clarity, hidden text risk, duplicate content clusters, and structured-data visible-content mismatch.
- Ranking and GEO visibility measurement still depends on supplied exports. API-backed Search Console, SERP provider, and AI-answer probes are future integrations.
- The product needs a repo-to-audit mode that can safely build or preview common web apps, crawl the generated site, and connect source-level evidence to rendered output.
- The product needs developer-optimized repo audit workflows that can safely build or preview common web apps, crawl generated output, connect source-level evidence to rendered output, and produce CI-friendly failure semantics.

Users need a tool that can inspect a site from discovery through report generation, identify concrete implementation issues, distinguish known evidence from uncertainty, and produce actionable tasks with cited rationale.

Expand Down Expand Up @@ -160,9 +160,9 @@ The skill should not be the source of deterministic analysis.

### 8.3 Repo-To-Audit Layer

The repo-to-audit layer is the approved v1.5 extension. It should inspect a website source repository, detect supported framework and package-manager signals, choose a static-output or preview-server path, run bounded commands, and feed the generated site into the existing CLI audit engine.
The repo-to-audit layer is now an initial implemented extension. It inspects a website source repository, detects framework and package-manager signals, chooses a static-output or explicit preview-server path, runs bounded preview commands when supplied, and feeds generated or served pages into the existing CLI audit engine.

This layer should add a `repo` evidence section to the JSON output. Source-level evidence must remain separate from rendered-page evidence so reports can distinguish build/configuration problems from observed website output.
This layer adds a `repo` evidence section to the JSON output. Source-level evidence must remain separate from rendered-page evidence so reports can distinguish build/configuration problems from observed website output.

## 9. CLI Commands

Expand Down Expand Up @@ -231,32 +231,33 @@ Required behavior:

- Print rule purpose, severity logic, evidence inputs, recommendation text, and source citations.

### 9.5 Detect Repo (Planned)
### 9.5 Detect Repo

```bash
openclaw-geo-seo-audit detect-repo .
```

Planned behavior:
Implemented behavior:

- Inspect a repository path.
- Report detected framework, package manager, likely build command, likely preview command, static output candidates, route sources, and confidence.
- Avoid executing repository scripts.

### 9.6 Audit Repo (Planned)
### 9.6 Audit Repo

```bash
openclaw-geo-seo-audit audit-repo .
```

Planned behavior:
Implemented behavior:

- Inspect a repository path.
- Use explicit commands when supplied.
- Use conservative auto-detection only when framework signals are clear.
- Build or preview the app within configured timeouts.
- Use detected static output when available unless explicit preview options are supplied.
- Use explicit `--preview-command` and `--preview-url` when supplied.
- Wait for preview startup within configured timeouts and stop the preview process after the audit.
- Crawl generated output with the existing audit engine.
- Emit existing page/site evidence plus a `repo` evidence section.
- Emit repo source findings for missing audit paths, missing static directories, empty static outputs, and unreachable preview servers.

## 10. Configuration

Expand Down Expand Up @@ -837,18 +838,26 @@ Deferred integrations:
- Configured AI-answer visibility probes.
- Optional Lighthouse execution.

### Milestone 7: Repo-To-Audit Mode - Planned v1.5
### Milestone 7: Repo-To-Audit Mode - Initial baseline merged

Planned:
Delivered on main after `0.2.0`:

- `detect-repo <path>` framework and package-manager detection.
- `audit-repo <path>` source-repository audit workflow.
- Static-output and preview-server audit paths.
- Bounded local command execution.
- Route discovery from generated output, sitemap, framework conventions, or supplied route list.
- Bounded explicit preview command execution.
- Route discovery from generated static output.
- `repo` evidence section in JSON output.
- Source-level findings that remain separate from rendered-page findings.

Remaining developer-focused repo audit work:

- Explicit build command support.
- Route-list support for repository audits.
- Repo config support for repeatable CI workflows.
- Vite, Next.js, and Astro fixture coverage.
- Source-level findings for generated sitemap/robots availability, framework metadata usage, route discoverability, and rendered/source mismatches where deterministic.

## 20. Risks and Mitigations

### Risk: Overclaiming ranking ability
Expand Down Expand Up @@ -887,11 +896,11 @@ Resolved for `0.2.0`:
- Raw Google source corpus remains a repository asset. The CLI package ships source code and compact citation behavior, not the full raw corpus.
- CI severity gating is configured with explicit `--fail-on P0|P1|P2|P3`.

Planning questions for repo-to-audit mode:
Planning questions for developer repo audit completion:

- Which fixture framework should be implemented first: Vite, Next.js, or Astro?
- Should the first `audit-repo` release require explicit build and preview commands, or allow high-confidence auto-detected commands?
- Should repo-to-audit internals start as one module or split detection, process management, route discovery, and source evidence into separate modules?
- Should explicit build support run before static route discovery by default, or only when `--build-command` is supplied?
- Should repo audit configuration live inside `audit.config.json` or use a separate `repo-audit.config.json`?
- Which source-level findings provide the highest value for developer CI without requiring brittle source parsing?

## 22. Release Stabilization And v1.5 Readiness Checklist

Expand All @@ -903,11 +912,11 @@ Before publishing or tagging `0.2.0`:
- Confirm readiness language remains separate from measured ranking or AI-answer visibility claims.
- Push and merge the guardrail branch through the repository review workflow.

Before starting repo-to-audit mode:
Before starting developer repo audit completion:

- Choose the first fixture framework for source-repository audits.
- Decide when explicit build and preview commands are required versus when high-confidence auto-detected commands are allowed.
- Define module boundaries for repo detection, process management, route discovery, source evidence, and unified output.
- Decide whether Vite is the first framework fixture for build-and-preview coverage.
- Decide whether repo audit config extends `audit.config.json` or remains command-line only for one more phase.
- Define the first deterministic source-level findings and keep them separate from rendered-page evidence.
- Keep repo-to-audit implementation separate from external API integrations.

## 23. Implemented Baseline And v1.5 Scope
Expand All @@ -925,7 +934,7 @@ The implemented `0.2.0` baseline includes:
- JSON and Markdown output.
- Fixture tests and golden outputs.

The approved v1.5 scope adds repo-to-audit mode after `0.2.0` release stabilization. Repo-to-audit mode should be treated as a focused extension of the current CLI rather than a replacement for URL, local app, static file, or URL-list audits.
The merged repo-to-audit baseline adds source-repository audits after `0.2.0` release stabilization. Repo-to-audit mode remains a focused extension of the current CLI rather than a replacement for URL, local app, static file, or URL-list audits. The next phase should optimize this mode for developer local and CI workflows.

The roadmap still defers:

Expand Down
Loading