Query Agent doc rework — with review fixes#437
Merged
Conversation
Follow-up fixes to the Query Agent doc rework (PR #410): - Link validators: update the crawl seed path from `/agents` to `/query-agent` in validate-links-pr.js (CI-wired) and validate-links-branch.js. The old seed no longer exists in the static build, so the new ~30-page /query-agent tree was never being crawled. - Indexability: point the stale `/agents/query/tutorial-ecommerce` reference at `/query-agent/recipes/query-agent-ecommerce-assistant` in both test_docs_indexability.py and README-INDEXABILITY.md. - Terminology: rename "How Weaviate Agents work" -> "How the Query Agent works" and drop the now-redundant "AI agents framework" resource link. - installation.md: remove the stray JS/TS install sentence from the Python section (a dedicated JS/TS section already follows). - Restore trailing newlines on .gitignore and versions-config.json. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Orca Security Scan Summary
| Status | Check | Issues by priority | |
|---|---|---|---|
| Infrastructure as Code | View in Orca | ||
| SAST | View in Orca | ||
| Secrets | View in Orca | ||
| Vulnerabilities | View in Orca |
…inks validate-links-pr.js previously seeded only 3 sections, so /deploy, /engram, and /contributor-guide were never crawled (linkinator only recurses under each seed's own path). Add a seed per top-level docs section so the link check actually covers the whole site. Expanding coverage surfaced pre-existing broken links, now fixed so the validator passes: - /deploy: 5 relative links missing the `.md` extension resolved one directory too deep under the dev build's trailing slashes (export.md -> backups; hardening-eks.md -> authentication/backups/ monitoring). Added the extensions. - query-agent + engram OG images: ~40 pages referenced og/query-agent.jpg and og/engram.png, which didn't exist. Repointed frontmatter to the committed images at og/docs/query-agent.png and og/docs/engram.png. - link-validator.js: broaden the Medium ignore to cover subdomains (e.g. xsreality.medium.com, which 403s bots like the apex domain). Also removes the unused validate-links-branch.js (no workflow/script references it). Verified: `yarn build-dev` is clean and the validator now reports only the GOOGLE_CONTAINER_ID=None GTM placeholder 404s, which are a local-only artifact (set in CI). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The PR link check failed on external domains that 403/429 automated requests, not on real broken links: - huggingface.co — 429 Too Many Requests (69 model/dataset links; the expanded crawl volume tips it into rate-limiting) - docs.aws.amazon.com — 403 (surfaced by the new /deploy coverage) - ieeexplore.ieee.org, db-engines.com — 403 Added them to domainsToIgnore, consistent with existing entries like github.com (429) and medium.com (403). These pages exist but block crawlers, so they can't be validated automatically. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The /weaviate ANN-benchmark pages link to the TEXMEX/SIFT dataset server at corpus-texmex.irisa.fr, which network-times-out from CI (status 0). Same flaky-external-host class as the other ignored domains. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What's being changed
Follow-up fixes from a code review of the Query Agent doc rework (the
/docs/agents→/docs/query-agentmove, already onmain), plus link-validator coverage improvements. Two commits.Commit 1 — review fixes
validate-links-pr.js(CI-wired) seeded crawling from/agents, which no longer exists post-move, so the new/query-agenttree was never crawled. Updated to/query-agent./agents/query/tutorial-ecommercereference to/query-agent/recipes/query-agent-ecommerce-assistantintests/test_docs_indexability.pyandREADME-INDEXABILITY.md.How Weaviate Agents work→How the Query Agent works; removed a redundant resource link; trimmed a stray JS/TS install sentence; restored trailing newlines on.gitignoreandversions-config.json.Commit 2 — validator covers all sections + fixes the breakage it surfaced
validate-links-pr.jsonly seeded 3 sections, and linkinator only recurses under each seed's path — so/deploy,/engram, and/contributor-guidewere never link-checked. Added a seed per top-level docs section./deploy(pre-existing): 5 relative links were missing the.mdextension and resolved one directory too deep under the dev build's trailing slashes (export.md→ backups;hardening-eks.md→ authentication/backups/monitoring). Added the extensions.og/query-agent.jpg/og/engram.png, which didn't exist. Repointed frontmatter to the committed images atog/docs/query-agent.pngandog/docs/engram.png.xsreality.medium.com, which 403s bots like the apex domain).validate-links-branch.js(nothing references it).Testing
yarn build-devpasses; the only remaining Docusaurus warnings are pre-existing onmain(the/weaviate/quickstart?import=links).validate-links-pr.jsacross all six sections: the only "broken" links left are theGOOGLE_CONTAINER_ID=NoneGTM placeholder 404s, which are a local-build artifact (the env var is set in CI). All real broken links are fixed.🤖 Generated with Claude Code