Skip to content

Plasmate as a lightweight scraping backend - no Chrome needed #1055

@dbhurley

Description

@dbhurley

ScrapeGraph-AI currently uses Chrome/Playwright for fetching pages. For many use cases (especially content extraction and data scraping from server-rendered pages), the full Chrome rendering pipeline is overkill.

Plasmate is an open-source browser engine (Rust, Apache 2.0) that parses HTML and outputs structured semantic content. No rendering, no GPU, no 300MB Chrome process.

For scraping workflows:

  • 30MB memory instead of 300MB per instance
  • 16.6x fewer tokens per page (saves LLM costs in AI-powered extraction)
  • Works as a single binary: pip install plasmate

Could work as an alternative Fetcher for static pages, with Chrome as fallback for SPAs.

Not a sales pitch - it's free and open source. Just think it could be useful for the project.

https://github.com/plasmate-labs/plasmate

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions