Skip to content

Consider removing Rust dependencies from the default install #893

@vdusek

Description

@vdusek

Context

The Python SDK currently pulls in a Rust-based dependency (impit) by default. Users have reported install-time friction caused by these native deps — not blocking, but a source of papercuts (e.g., environments without a Rust toolchain, slower installs, more failure modes on exotic platforms).

A representative user report (source):

User uses the Python SDK and manages scraping via the Apify API. He raised an issue with unexpected Rust dependencies surfacing in recent Python SDK versions — not a blocking issue, but a source of setup friction.

How we got here

About a year ago, we made Impit the default HTTP client in Crawlee so that crawls are stealthy out of the box. Because the SDK depends on Crawlee, Impit became a transitive dependency. We then also switched the Apify API client from HTTPX to Impit to avoid shipping two HTTP clients (HTTP clients are heavy).

Possible direction

  • Extract the shared base used by both Crawlee and the SDK into a standalone package (e.g. apify-shared): storages, storage clients, event managers, service locator, and maybe more.
  • Keep Impit as the default in Crawlee (stealth out of the box stays).
  • Switch the SDK and Apify API client back to a Python HTTP client (HTTPX), so a plain pip install apify does not require a Rust toolchain.

Trade-off

Apify Actors based on Crawlee would end up shipping two HTTP clients (Impit for crawl traffic, HTTPX for Apify API traffic), making those images/installs larger. SDK-only users (no Crawlee) would benefit the most.

When

Not urgent. Worth revisiting later.

Metadata

Metadata

Assignees

No one assigned

    Labels

    solutioningThe issue is not being implemented but only analyzed and planned.t-toolingIssues with this label are in the ownership of the tooling team.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions