feat(cli): capture-video on-demand fetcher + capture pipeline robustness#1447
Open
ukimsanov wants to merge 1 commit into
Open
feat(cli): capture-video on-demand fetcher + capture pipeline robustness#1447ukimsanov wants to merge 1 commit into
ukimsanov wants to merge 1 commit into
Conversation
This was referenced Jun 14, 2026
| // `flag: "wx"` = exclusive create: throws EEXIST if outPath exists. | ||
| // Race-free check-and-create in one syscall. | ||
| try { | ||
| writeFileSync(outPath, buf, { flag: "wx" }); |
aaefdfd to
e4e8b44
Compare
For the hyperframes.dev website-to-video flow. Real-AI-test runs against
heygen.com, huly.io, and heygen-showcase surfaced two gaps: (1) capture's
logo / asset-captioning signals missed modern React/Tailwind builds; and
(2) there was no CLI surface to pull the videos the manifest references.
New command:
• `hyperframes capture-video <project>` — on-demand downloader for
entries in capture/extracted/video-manifest.json. Capture writes the
manifest + preview PNGs but skips the mp4s; this pulls one entry by
`--index N` (matched against the entry's `index` field, NOT array
offset — gaps are possible when a preview screenshot fails). SSRF-safe
via safeFetch, 250 MB cap, content-type whitelist, race-free
exclusive-create write. Layout-aware (handles both standalone capture
and W2H project layouts).
Capture pipeline fixes:
• Structural logo signals (assetCataloger + tokenExtractor): inBanner /
inHomeLink / matchesTitleBrand. Class-substring alone caught 0/32 SVGs
on heygen.com — modern builds don't put 'logo' / 'brand' in any
className.
• Content-hash SVG slugs (assetDownloader): `svg-<8char-sha1>.svg` —
label-derived slugs mis-attributed partner-logo carousels
(heygen-logo.svg actually contained Google, hubspot-logo.svg contained
Trivago, etc.). Content-hash names are invariant by construction.
• SVG → PNG rasterization before Gemini Vision (contentExtractor): the
raw-SVG-as-text path was hallucinating wordmarks (VIVIENNE for HubSpot,
'wrestling' for Workday). Adds polarity detection so a white-glyph SVG
flattened to a blank PNG gets inverted before captioning. LOGO tag in
asset-descriptions.md when structural signals fire (independent of
Gemini key presence).
• Double-escape \/ inside the page.evaluate template literal in
assetCataloger + tokenExtractor: the original `/^https?:\/\/.../`
collapsed to `/` mid-template and threw `Unexpected token ^`. Capture
was 100% blocked on this until the escape was fixed.
• `asset-descriptions.md` header branches on Gemini-key presence with
an explicit 'Vision OFF — catalog-derived descriptions' warning.
New lint rule:
• `lintMissingLocalAsset` (cli/utils/lintProject): scans <video> / <img>
/ <source> src for local files that don't exist in the project.
Empirically the most common sub-agent mistake across multi-URL runs
(~5+ per run). Uses `resolveExistingLocalAsset` so the existence check
matches the bundler's notion of 'resolves'. Masks comment / style /
script ranges before scanning so a literal `<img src=missing.png>`
inside a tutorial comment isn't reported.
Tests: 17 new for capture-video (safeFilename decoding/sanitization,
VIDEO_CONTENT_TYPE_RE accept/reject, pickManifestEntry index-field lookup
with gaps, URL-mismatch + bad-index rejection, --index over --url
priority); 70 cases under lintProject.test.ts covering the new rule and
existing rules.
Sibling PRs in this stack:
• #PR_A1 — fix(producer): __dirname ESM banner shim
• #PR_A2 — fix(core/lint): findRootTag masks comment/style/script
e4e8b44 to
b56b7c5
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Context
For the hyperframes.dev website-to-video flow. Real-AI-test runs against heygen.com, huly.io, and heygen-showcase surfaced two gaps: (1) capture's logo / asset-captioning signals missed modern React/Tailwind builds; and (2) there was no CLI surface to pull the videos the manifest references.
Summary
New command —
hyperframes capture-video <project>. On-demand downloader for entries incapture/extracted/video-manifest.json. Capture writes the manifest + preview PNGs but skips the mp4s; this pulls one entry by--index N(matched against the entry'sindexfield — NOT array offset; the manifest can have gaps when a preview screenshot fails). SSRF-safe viasafeFetch, 250 MB cap, content-type whitelist, race-free exclusive-create write. Layout-aware (standalone capture + W2H project layouts).Capture pipeline fixes
assetCataloger+tokenExtractor):inBanner/inHomeLink/matchesTitleBrand. Class-substring alone caught 0/32 SVGs on heygen.com — modern builds don't put "logo" or "brand" in any className.assetDownloader):svg-<8char-sha1>.svg. Label-derived slugs mis-attributed partner-logo carousels (heygen-logo.svg actually contained Google, hubspot-logo.svg contained Trivago).contentExtractor): raw-SVG-as-text was hallucinating wordmarks (VIVIENNE for HubSpot). Polarity detection inverts white-glyph SVGs that flatten to blank PNGs before captioning.\/insidepage.evaluatetemplate literal (assetCataloger+tokenExtractor): the original/^https?:\/\/…/collapsed to/mid-template and threwUnexpected token ^. Capture was 100% blocked on this until fixed.asset-descriptions.mdheader branches on Gemini-key presence with an explicit "Vision OFF — catalog-derived" warning.New lint rule —
lintMissingLocalAsset(cli/utils/lintProject.ts). Scans<video>/<img>/<source>srcfor local files that don't exist in the project. Empirically the most common sub-agent mistake across multi-URL runs (~5+ per run). Masks comment / style / script ranges before scanning.Tests
capture-video:safeFilenamedecoding/sanitization,VIDEO_CONTENT_TYPE_REaccept/reject,pickManifestEntryindex-field lookup with gaps, URL-mismatch + bad-index rejection,--indexover--urlprioritylintProject.test.tscovering the new rule + existing rulesTest plan
bun run --filter @hyperframes/cli typecheckpassesbun run --filter @hyperframes/cli testpasseshyperframes capture <url>→capture-video <dir> --list→--index 0downloads cleanlycapture-videopasseshyperframes lint(idattribute present)