Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,21 @@
# Changelog

## Unreleased - 2026-05-18

- Added Phase B repository audit mode so the CLI can inspect a source repository and audit either detected static output or an explicit preview server.
- Added `detect-repo [path]` to report repository metadata including package manager, framework signal, build command, preview command, static output directory, and discovered route sources.
- Added `audit-repo <path>` with `--static-dir`, `--preview-command`, `--preview-url`, preview startup timeout, crawl limits, security mode, JSON output, and Markdown output support.
- Added static output route discovery for HTML builds, including deterministic route normalization for root pages, nested `index.html` routes, and extension routes.
- Added repo-aware audit orchestration with optional `repo` evidence in JSON and Markdown reports, plus source findings for missing audit paths, missing static directories, empty static outputs, and unreachable preview servers.
- Added managed preview process handling with startup polling, preflight checks for already-running URLs, process-group shutdown, repeated-stop safety, early-exit errors, and capped stdout/stderr capture.
- Hardened preview probing so restricted security mode uses the same guarded fetch path as audits and rejects private-network preview URLs before spawning commands.
- Added packaged CLI source-map support so installed-package audits retain top-level source citations instead of silently emitting an empty `sources` array.
- Added release-gate coverage for packed CLI contents and an installed-style packed tarball smoke check that verifies source citations are present.
- Added repo fixture projects and golden summary coverage for static output audits and preview-server audits.
- Updated README, skill wrapper guidance, and skill validation so repository audit mode is documented while keeping ranking claims limited to supplied evidence.
- Preserved explicit preview precedence over auto-detected static output so callers can audit live preview servers even when a stale `dist` directory exists.
- Expanded the test suite to cover repo detection, static route discovery, repo audit orchestration, preview lifecycle behavior, CLI validation, report/schema compatibility, packaging, and release-gate hardening.

## 0.2.0 - 2026-05-18

- Added the deterministic `openclaw-geo-seo-audit` CLI package with `audit`, `snapshot`, `validate-config`, and `explain-rule` commands.
Expand Down
7 changes: 6 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,9 +37,14 @@ npm run cli -- audit https://example.com --url-list urls.txt --markdown audit-re
npm run cli -- audit https://example.com --mode full --max-pages 25 --max-depth 2 --respect-robots true --sitemap https://example.com/sitemap.xml
npm run cli -- audit https://example.com --mode full --security restricted --timeout-ms 15000 --max-html-bytes 2000000
npm run cli -- audit https://example.com --mode full --fail-on P1 --out audit.json --markdown audit.md
npm run cli -- detect-repo .
npm run cli -- audit-repo . --static-dir dist --out repo-audit.json --markdown repo-audit.md
npm run cli -- audit-repo . --preview-command "npm run preview -- --host 127.0.0.1" --preview-url http://127.0.0.1:4173 --max-pages 25
```

The current `audit` command collects single-page, supplied URL-list, or bounded same-origin crawl evidence, can read `audit.config.json`, can seed from a sitemap, can enforce robots.txt, can filter crawls with include/exclude patterns, evaluates deterministic page and site rules, and can write JSON or Markdown. Extracted page evidence includes metadata, canonicals, hreflang, favicon and site-name signals, preview directives, headings, links, image inventory, JSON-LD blocks, schema types, author/date signals, and internal/external link counts. Browser rendering is available when Playwright is installed or when a renderer is injected by code; otherwise the CLI records rendering as unavailable.
The current `audit` command collects single-page, supplied URL-list, or bounded same-origin crawl evidence, can read `audit.config.json`, can seed from a sitemap, can enforce robots.txt, can filter crawls with include/exclude patterns, evaluates deterministic page and site rules, and can write JSON or Markdown. Extracted page evidence includes metadata, canonicals, hreflang, favicon and site-name signals, preview directives, headings, links, image inventory, JSON-LD blocks, schema types, author/date signals, and internal/external link counts. Browser rendering is available when Playwright is installed or when a renderer is injected by code; otherwise the CLI records rendering as unavailable. The `detect-repo [path]` command reports repository framework, package-manager, route, and build-output signals and defaults to the current directory when no path is supplied. The `audit-repo` command exits 2 when repo source findings are present.

`audit-repo` is intended for source repository audits. In the first repo-to-audit release, static output directories and explicit preview commands are supported. Framework and package-manager signals are reported by `detect-repo`, but the CLI does not automatically install dependencies or run inferred framework scripts.

For untrusted live-site audits or hosted wrappers, use `--security restricted`. Restricted mode blocks local page targets and private-network HTTP targets, requires guarded manual redirects before fetches, disables Playwright URL rendering, and applies request timeouts and response/file byte caps. Supplied URL-list and integration files are still allowed as bounded evidence inputs. Use the default `local` mode for trusted local HTML files or localhost development servers. Restricted mode is a CLI guardrail, not a replacement for hosted network egress controls.

Expand Down
8 changes: 8 additions & 0 deletions examples/fixture-repos/npm-preview/package.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
{
"name": "openclaw-preview-fixture",
"private": true,
"type": "module",
"scripts": {
"preview": "node server.mjs"
}
}
24 changes: 24 additions & 0 deletions examples/fixture-repos/npm-preview/server.mjs
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
import http from "node:http";
import fs from "node:fs";
import path from "node:path";

const port = Number(process.argv[2] || process.env.PORT || 4173);
const root = path.join(process.cwd(), "site");

const fileFor = (urlPath) => {
if (urlPath === "/") return path.join(root, "index.html");
return path.join(root, urlPath.replace(/^\//, ""));
};

const server = http.createServer((request, response) => {
const filePath = fileFor(new URL(request.url, `http://127.0.0.1:${port}`).pathname);
if (!filePath.startsWith(root) || !fs.existsSync(filePath)) {
response.writeHead(404, { "content-type": "text/plain" });
response.end("not found");
return;
}
response.writeHead(200, { "content-type": "text/html" });
response.end(fs.readFileSync(filePath, "utf8"));
});

server.listen(port, "127.0.0.1");
11 changes: 11 additions & 0 deletions examples/fixture-repos/npm-preview/site/about.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
<!doctype html>
<html lang="en">
<head>
<title>Preview Fixture About</title>
<meta name="description" content="Preview fixture about page for repo audits.">
</head>
<body>
<h1>Preview Fixture About</h1>
<p>This about page proves preview crawls can discover linked routes.</p>
</body>
</html>
12 changes: 12 additions & 0 deletions examples/fixture-repos/npm-preview/site/index.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
<!doctype html>
<html lang="en">
<head>
<title>Preview Fixture Home</title>
<meta name="description" content="Preview fixture homepage for repo audits.">
</head>
<body>
<h1>Preview Fixture Home</h1>
<p>This page is served by an explicit preview command during repo audit tests.</p>
<a href="/about.html">About</a>
</body>
</html>
12 changes: 12 additions & 0 deletions examples/fixture-repos/static-basic/dist/about/index.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
<!doctype html>
<html lang="en">
<head>
<title>About Static Basic</title>
<meta name="description" content="About page for static repo audit fixture.">
<link rel="canonical" href="https://example.test/about/">
</head>
<body>
<h1>About Static Basic</h1>
<p>The about page gives the fixture enough internal structure for route discovery.</p>
</body>
</html>
13 changes: 13 additions & 0 deletions examples/fixture-repos/static-basic/dist/index.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
<!doctype html>
<html lang="en">
<head>
<title>Static Basic Home</title>
<meta name="description" content="Static fixture homepage for repo audits.">
<link rel="canonical" href="https://example.test/">
</head>
<body>
<h1>Static Basic Home</h1>
<p>This static fixture explains a deterministic source repository audit workflow.</p>
<a href="/about/">About</a>
</body>
</html>
3 changes: 3 additions & 0 deletions examples/fixture-repos/static-basic/dist/robots.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
User-agent: *
Allow: /
Sitemap: https://example.test/sitemap.xml
5 changes: 5 additions & 0 deletions examples/fixture-repos/static-basic/dist/sitemap.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url><loc>https://example.test/</loc></url>
<url><loc>https://example.test/about/</loc></url>
</urlset>
15 changes: 15 additions & 0 deletions examples/golden/repo-static-summary.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
{
"repo": {
"detectedFramework": "generic-static",
"packageManager": null,
"staticDirRelative": "dist",
"routeSources": [
{ "type": "static_html", "route": "/" },
{ "type": "static_html", "route": "/about/" }
]
},
"pageCount": 2,
"pageTitles": ["Static Basic Home", "About Static Basic"],
"sourceFindingIds": [],
"evidenceGapIds": ["ranking.integrations_missing"]
}
4 changes: 4 additions & 0 deletions packages/cli/src/audit-output-schema.mjs
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ export const auditOutputSchema = {
findings: { type: "array" },
evidenceGaps: { type: "array" },
sources: { type: "array" },
repo: { type: "object" },
},
};

Expand Down Expand Up @@ -70,6 +71,9 @@ export const validateAuditOutput = (audit) => {
}

if ("pages" in audit && !Array.isArray(audit.pages)) errors.push("pages must be an array");
if ("repo" in audit && (!audit.repo || typeof audit.repo !== "object" || Array.isArray(audit.repo))) {
errors.push("repo must be an object");
}
if ("findings" in audit && !Array.isArray(audit.findings)) {
errors.push("findings must be an array");
} else {
Expand Down
66 changes: 44 additions & 22 deletions packages/cli/src/audit.mjs
Original file line number Diff line number Diff line change
Expand Up @@ -13,13 +13,21 @@ import { isHttpUrl } from "./url-utils.mjs";
const toolVersion = "0.2.0";

const readSourceMap = () => {
try {
const file = new URL("../../../skill/geo-seo-audit/source-map.json", import.meta.url);
const sourceMap = JSON.parse(fs.readFileSync(file, "utf8"));
return Object.entries(sourceMap).map(([id, url]) => ({ id, url }));
} catch {
return [];
const candidates = [
new URL("./source-map.json", import.meta.url),
new URL("../../../skill/geo-seo-audit/source-map.json", import.meta.url),
];

for (const file of candidates) {
try {
const sourceMap = JSON.parse(fs.readFileSync(file, "utf8"));
return Object.entries(sourceMap).map(([id, url]) => ({ id, url }));
} catch {
// Try the next source-map location.
}
}

return [];
};

const originFor = (target) => {
Expand Down Expand Up @@ -51,23 +59,36 @@ const crawlSettings = (config) => ({
});

const readUrlList = (config) => {
const normalizeEntries = (entries, baseDir) =>
entries
.map((line) => line.trim())
.filter((line) => line && !line.startsWith("#"))
.map((line) => {
if (isHttpUrl(line)) return line;
if (path.isAbsolute(line) && fs.existsSync(line)) return line;
if (isHttpUrl(config.target)) return new URL(line, config.target).href;
if (path.isAbsolute(line)) return line;
return path.resolve(baseDir, line);
});

if (Array.isArray(config.urlListEntries)) {
return normalizeEntries(
config.urlListEntries.map((entry) => String(entry)),
process.cwd(),
);
}
if (!config.urlList) return [];
const baseDir = path.dirname(config.urlList);
const limits = resolveLimits(config.limits);
return readTextFileLimited(config.urlList, {
security: config.security,
allowRestricted: true,
limits,
maxBytes: limits.maxFileBytes,
})
.split(/\r?\n/)
.map((line) => line.trim())
.filter((line) => line && !line.startsWith("#"))
.map((line) => {
if (isHttpUrl(line)) return line;
if (isHttpUrl(config.target)) return new URL(line, config.target).href;
return path.resolve(baseDir, line);
});
return normalizeEntries(
readTextFileLimited(config.urlList, {
security: config.security,
allowRestricted: true,
limits,
maxBytes: limits.maxFileBytes,
}).split(/\r?\n/),
baseDir,
);
};

const collectUrlList = async (config) => {
Expand Down Expand Up @@ -95,7 +116,8 @@ export const runAudit = async (config) => {
const startedAt = new Date().toISOString();
const settings = crawlSettings(config);
const shouldCrawl = isHttpUrl(config.target) && (settings.mode === "full" || settings.mode === "sample");
const crawlResult = config.urlList
const hasUrlList = config.urlList || Array.isArray(config.urlListEntries);
const crawlResult = hasUrlList
? await collectUrlList(config)
: shouldCrawl
? await crawlSite(config)
Expand Down Expand Up @@ -156,7 +178,7 @@ export const runAudit = async (config) => {
robots: crawlResult.robots,
sitemaps: crawlResult.sitemaps,
skipped: crawlResult.skipped,
notes: config.urlList
notes: hasUrlList
? ["Audit output contains supplied URL-list evidence."]
: shouldCrawl
? ["Audit output contains bounded same-origin crawl evidence."]
Expand Down
Loading
Loading