Skip to content

Query Agent doc rework — with review fixes#437

Merged
g-despot merged 4 commits into
mainfrom
agents-rework-review-fixes
Jun 3, 2026
Merged

Query Agent doc rework — with review fixes#437
g-despot merged 4 commits into
mainfrom
agents-rework-review-fixes

Conversation

@g-despot
Copy link
Copy Markdown
Contributor

@g-despot g-despot commented Jun 3, 2026

What's being changed

Follow-up fixes from a code review of the Query Agent doc rework (the /docs/agents/docs/query-agent move, already on main), plus link-validator coverage improvements. Two commits.

Commit 1 — review fixes

  • Link validator seed: validate-links-pr.js (CI-wired) seeded crawling from /agents, which no longer exists post-move, so the new /query-agent tree was never crawled. Updated to /query-agent.
  • Indexability: repointed the stale /agents/query/tutorial-ecommerce reference to /query-agent/recipes/query-agent-ecommerce-assistant in tests/test_docs_indexability.py and README-INDEXABILITY.md.
  • Terminology / content: How Weaviate Agents workHow the Query Agent works; removed a redundant resource link; trimmed a stray JS/TS install sentence; restored trailing newlines on .gitignore and versions-config.json.

Commit 2 — validator covers all sections + fixes the breakage it surfaced

  • Full section coverage: validate-links-pr.js only seeded 3 sections, and linkinator only recurses under each seed's path — so /deploy, /engram, and /contributor-guide were never link-checked. Added a seed per top-level docs section.
  • /deploy (pre-existing): 5 relative links were missing the .md extension and resolved one directory too deep under the dev build's trailing slashes (export.md → backups; hardening-eks.md → authentication/backups/monitoring). Added the extensions.
  • OG images: ~40 query-agent + engram pages referenced og/query-agent.jpg / og/engram.png, which didn't exist. Repointed frontmatter to the committed images at og/docs/query-agent.png and og/docs/engram.png.
  • Medium ignore: broadened to cover subdomains (e.g. xsreality.medium.com, which 403s bots like the apex domain).
  • Removed the unused validate-links-branch.js (nothing references it).

Testing

  • yarn build-dev passes; the only remaining Docusaurus warnings are pre-existing on main (the /weaviate/quickstart ?import= links).
  • Ran validate-links-pr.js across all six sections: the only "broken" links left are the GOOGLE_CONTAINER_ID=None GTM placeholder 404s, which are a local-build artifact (the env var is set in CI). All real broken links are fixed.

🤖 Generated with Claude Code

Follow-up fixes to the Query Agent doc rework (PR #410):

- Link validators: update the crawl seed path from `/agents` to
  `/query-agent` in validate-links-pr.js (CI-wired) and
  validate-links-branch.js. The old seed no longer exists in the static
  build, so the new ~30-page /query-agent tree was never being crawled.
- Indexability: point the stale `/agents/query/tutorial-ecommerce`
  reference at `/query-agent/recipes/query-agent-ecommerce-assistant`
  in both test_docs_indexability.py and README-INDEXABILITY.md.
- Terminology: rename "How Weaviate Agents work" -> "How the Query Agent
  works" and drop the now-redundant "AI agents framework" resource link.
- installation.md: remove the stray JS/TS install sentence from the
  Python section (a dedicated JS/TS section already follows).
- Restore trailing newlines on .gitignore and versions-config.json.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@orca-security-eu orca-security-eu Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Orca Security Scan Summary

Status Check Issues by priority
Passed Passed Infrastructure as Code high 0   medium 0   low 0   info 0 View in Orca
Passed Passed SAST high 0   medium 0   low 0   info 0 View in Orca
Passed Passed Secrets high 0   medium 0   low 0   info 0 View in Orca
Passed Passed Vulnerabilities high 0   medium 0   low 0   info 0 View in Orca

g-despot and others added 3 commits June 3, 2026 14:15
…inks

validate-links-pr.js previously seeded only 3 sections, so /deploy,
/engram, and /contributor-guide were never crawled (linkinator only
recurses under each seed's own path). Add a seed per top-level docs
section so the link check actually covers the whole site.

Expanding coverage surfaced pre-existing broken links, now fixed so the
validator passes:

- /deploy: 5 relative links missing the `.md` extension resolved one
  directory too deep under the dev build's trailing slashes
  (export.md -> backups; hardening-eks.md -> authentication/backups/
  monitoring). Added the extensions.
- query-agent + engram OG images: ~40 pages referenced og/query-agent.jpg
  and og/engram.png, which didn't exist. Repointed frontmatter to the
  committed images at og/docs/query-agent.png and og/docs/engram.png.
- link-validator.js: broaden the Medium ignore to cover subdomains
  (e.g. xsreality.medium.com, which 403s bots like the apex domain).

Also removes the unused validate-links-branch.js (no workflow/script
references it).

Verified: `yarn build-dev` is clean and the validator now reports only
the GOOGLE_CONTAINER_ID=None GTM placeholder 404s, which are a local-only
artifact (set in CI).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The PR link check failed on external domains that 403/429 automated
requests, not on real broken links:

- huggingface.co — 429 Too Many Requests (69 model/dataset links; the
  expanded crawl volume tips it into rate-limiting)
- docs.aws.amazon.com — 403 (surfaced by the new /deploy coverage)
- ieeexplore.ieee.org, db-engines.com — 403

Added them to domainsToIgnore, consistent with existing entries like
github.com (429) and medium.com (403). These pages exist but block
crawlers, so they can't be validated automatically.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The /weaviate ANN-benchmark pages link to the TEXMEX/SIFT dataset server
at corpus-texmex.irisa.fr, which network-times-out from CI (status 0).
Same flaky-external-host class as the other ignored domains.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@g-despot g-despot merged commit e148c16 into main Jun 3, 2026
13 checks passed
@g-despot g-despot deleted the agents-rework-review-fixes branch June 3, 2026 17:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant