feat: add llms.txt and LLM-friendly markdown docs generation#497
feat: add llms.txt and LLM-friendly markdown docs generation#497takaebato wants to merge 4 commits intoapache:masterfrom
Conversation
|
Just a gentle follow-up on this PR. |
|
Hi @plainheart , sorry for the ping. |
There was a problem hiding this comment.
Pull request overview
Adds an LLM-friendly, static Markdown documentation output to the ECharts doc build, including per-language llms.txt indexes and Markdown conversions of the existing documents/*-parts/*.json content.
Changes:
- Add
build/build-llms.jsto convert built part JSON (HTML descriptions) into Markdown and generate per-languagellms.txt. - Integrate the new Markdown generation step into
build/build-doc.jsafter the main doc build. - Add
turndown+turndown-plugin-gfmdev dependencies and ignore generatedpublic/*/llms*outputs.
Reviewed changes
Copilot reviewed 3 out of 5 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| package.json | Adds Turndown dependencies for HTML→Markdown conversion. |
| package-lock.json | Locks new Turndown-related transitive dependencies. |
| build/build-llms.js | New converter/generator for llms-documents Markdown + llms.txt indexes. |
| build/build-doc.js | Runs LLM-doc generation as part of the standard doc build (non-watch mode). |
| .gitignore | Ignores generated public/{en,zh}/llms.txt and llms-documents/. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 5 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| function formatPropertyEntry(key, entry, typeInfo, linkResolver) { | ||
| const heading = '#'.repeat(Math.min(key.split('.').length + 1, MAX_HEADING_DEPTH)) + ' ' + key; | ||
| const meta = [ | ||
| typeInfo && typeInfo.type && `- **Type**: \`${typeInfo.type}\``, | ||
| typeInfo && typeInfo.default != null && `- **Default**: \`${typeInfo.default}\`` | ||
| ].filter(Boolean); | ||
| const body = entry.desc ? htmlToMd(linkResolver(entry.desc)) : ''; | ||
| return [heading, ...meta, ...(body ? ['', body] : []), '']; | ||
| } | ||
|
|
||
| function jsonToMd(data, typeMap, baseName, linkResolver) { | ||
| const lines = Object.entries(data).flatMap(([key, entry]) => { | ||
| const fullKey = baseName ? `${baseName}.${key}` : key; | ||
| return formatPropertyEntry(key, entry, typeMap[fullKey], linkResolver); |
There was a problem hiding this comment.
Generated headings can include dots (e.g. ### textStyle.color), but rewritten link fragments also preserve dots (e.g. ...md#textStyle.color). In common Markdown renderers (including GFM), heading IDs are slugified and punctuation like . is removed/normalized, so these fragments often won’t navigate to the intended section. Consider either slugifying fragments to match the heading-id algorithm you target, or emitting explicit anchors/IDs for each property so #axisLabel.interval/#textStyle.color works reliably.
| function formatPropertyEntry(key, entry, typeInfo, linkResolver) { | |
| const heading = '#'.repeat(Math.min(key.split('.').length + 1, MAX_HEADING_DEPTH)) + ' ' + key; | |
| const meta = [ | |
| typeInfo && typeInfo.type && `- **Type**: \`${typeInfo.type}\``, | |
| typeInfo && typeInfo.default != null && `- **Default**: \`${typeInfo.default}\`` | |
| ].filter(Boolean); | |
| const body = entry.desc ? htmlToMd(linkResolver(entry.desc)) : ''; | |
| return [heading, ...meta, ...(body ? ['', body] : []), '']; | |
| } | |
| function jsonToMd(data, typeMap, baseName, linkResolver) { | |
| const lines = Object.entries(data).flatMap(([key, entry]) => { | |
| const fullKey = baseName ? `${baseName}.${key}` : key; | |
| return formatPropertyEntry(key, entry, typeMap[fullKey], linkResolver); | |
| function escapeHtmlAttribute(value) { | |
| return String(value) | |
| .replace(/&/g, '&') | |
| .replace(/"/g, '"') | |
| .replace(/</g, '<') | |
| .replace(/>/g, '>'); | |
| } | |
| function formatPropertyEntry(key, fullKey, entry, typeInfo, linkResolver) { | |
| const heading = '#'.repeat(Math.min(key.split('.').length + 1, MAX_HEADING_DEPTH)) + ' ' + key; | |
| const anchor = escapeHtmlAttribute(fullKey); | |
| const meta = [ | |
| typeInfo && typeInfo.type && `- **Type**: \`${typeInfo.type}\``, | |
| typeInfo && typeInfo.default != null && `- **Default**: \`${typeInfo.default}\`` | |
| ].filter(Boolean); | |
| const body = entry.desc ? htmlToMd(linkResolver(entry.desc)) : ''; | |
| return [`<a id="${anchor}" name="${anchor}"></a>`, heading, ...meta, ...(body ? ['', body] : []), '']; | |
| } | |
| function jsonToMd(data, typeMap, baseName, linkResolver) { | |
| const lines = Object.entries(data).flatMap(([key, entry]) => { | |
| const fullKey = baseName ? `${baseName}.${key}` : key; | |
| return formatPropertyEntry(key, fullKey, entry, typeMap[fullKey], linkResolver); |
There was a problem hiding this comment.
This seems like a trade-off.
Since this is primarily LLM-facing documentation rather than browser-rendered content, not adding explicit HTML anchors to keep the Markdown clean seems better.
The heading text preserves the original property path (e.g. ### textStyle.color), which should be sufficient for LLM consumption.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 5 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| const heading = '#'.repeat(Math.min(key.split('.').length + 1, MAX_HEADING_DEPTH)) + ' ' + key; | ||
| const meta = [ | ||
| typeInfo && typeInfo.type && `- **Type**: \`${typeInfo.type}\``, | ||
| typeInfo && typeInfo.default != null && `- **Default**: \`${typeInfo.default}\`` |
There was a problem hiding this comment.
When rendering metadata, typeInfo.default != null will suppress a captured default value of null. If you change default-capture to include null, also update this check to test for presence (e.g. 'default' in typeInfo) so the output includes - **Default**: null`` when appropriate.
| typeInfo && typeInfo.default != null && `- **Default**: \`${typeInfo.default}\`` | |
| typeInfo && Object.prototype.hasOwnProperty.call(typeInfo, 'default') && `- **Default**: \`${typeInfo.default}\`` |
There was a problem hiding this comment.
In the source schema, both cases — default field absent and default: null (e.g. = null in source md file) — are displayed as ... (no default shown) on the website.
The != null check intentionally matches this behavior, filtering out both undefined and null to stay consistent with the website.
|
Addressed all feedback from the Copilot review. See individual replies on each comment for details. |
Summary
Add llms.txt and LLM-friendly markdown documentation generation to the ECharts doc build pipeline. Since the current SPA-based docs are difficult for AI agents to access via web fetch, this provides static Markdown alternatives.
build/build-llms.jsthat mechanically convertsdocuments/*-parts/*.jsontollms-documents/*-parts/*.mdby converting HTML descriptions to Markdown via turndown, with type/default info extracted from full schema JSONs (option.json, api.json, etc.)build/build-doc.js(runs after main doc build)llms.txtindex file per language (en/zh) listing all available documentationturndownandturndown-plugin-gfmas dependenciesOutput structure
Output examples
en/llms.txt
llms-documents/option-parts/option.title.md (excerpt)
Link resolution
Internal links in HTML (
href="#property.path"andhref="(option|api|tutorial).html#...") are resolved to relative .md file paths before turndown conversion on a best-effort basis.Out of ~22,600 total links in the source, ~12,000 internal links are resolved to .md file paths (~99%), ~10,600 external links are preserved as-is, and ~50 links with non-standard formats (e.g. missing
#prefix) are left unresolved.