-
-
Notifications
You must be signed in to change notification settings - Fork 2k
Plasmate as a lightweight scraping backend - no Chrome needed #1055
Copy link
Copy link
Open
Labels
enhancementNew feature or requestNew feature or request
Description
ScrapeGraph-AI currently uses Chrome/Playwright for fetching pages. For many use cases (especially content extraction and data scraping from server-rendered pages), the full Chrome rendering pipeline is overkill.
Plasmate is an open-source browser engine (Rust, Apache 2.0) that parses HTML and outputs structured semantic content. No rendering, no GPU, no 300MB Chrome process.
For scraping workflows:
- 30MB memory instead of 300MB per instance
- 16.6x fewer tokens per page (saves LLM costs in AI-powered extraction)
- Works as a single binary:
pip install plasmate
Could work as an alternative Fetcher for static pages, with Chrome as fallback for SPAs.
Not a sales pitch - it's free and open source. Just think it could be useful for the project.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request