diff --git a/packages/actor-scraper/cheerio-scraper/README.md b/packages/actor-scraper/cheerio-scraper/README.md
index 8370736d..7b332997 100644
--- a/packages/actor-scraper/cheerio-scraper/README.md
+++ b/packages/actor-scraper/cheerio-scraper/README.md
@@ -1,16 +1,39 @@
-Cheerio Scraper is a ready-made solution for crawling websites using plain HTTP requests. It retrieves the HTML pages, parses them using the [Cheerio](https://cheerio.js.org) Node.js library and lets you extract any data from them. Fast.
+## What is Cheerio Scraper?
-Cheerio is a server-side version of the popular [jQuery](https://jquery.com) library. It does not require a
-browser but instead constructs a DOM from an HTML string. It then provides the user an API to work with that DOM.
+It's a fast, server-side scraper that pulls plain HTML over HTTP and parses it with [Cheerio](https://cheerio.js.org) — the server-side equivalent of [jQuery](https://jquery.com). No browser, no client-side JavaScript: just the raw HTML response and a familiar selector API. With Cheerio Scraper, you can:
-Cheerio Scraper is ideal for scraping web pages that do not rely on client-side JavaScript to serve their content and can be up to 20 times faster than using a full-browser solution such as Puppeteer.
+⚡ Run **up to 20× faster** than full-browser scrapers — no Chrome to spin up
-If you're unfamiliar with web scraping or web development in general,
-you might prefer to start with [**Scraping with Web Scraper**](https://docs.apify.com/tutorials/apify-scrapers/web-scraper) tutorial from the Apify documentation and then continue with [**Scraping with Cheerio Scraper**](https://docs.apify.com/tutorials/apify-scrapers/cheerio-scraper), a tutorial which will walk you through all the steps and provide a number of examples.
+🧩 Use **jQuery-style selectors** via [Cheerio](https://cheerio.js.org) to extract any data from the parsed DOM
+
+🔗 **Crawl recursively** with Link selector, Glob Patterns, and Pseudo-URLs — pagination, sitemaps, full-site crawls
+
+🛠 Write a **custom page function** in JavaScript to extract data and steer the crawl
+
+📦 Export results as **JSON, CSV, XML, Excel, or HTML**, or pull them via the [Apify API](https://docs.apify.com/api/v2)
+
+🔌 Plug into **Make, Zapier, webhooks, MCP servers**, and the rest of [Apify's integrations](https://apify.com/integrations)
+
+🪪 Fork the [**open-source Actor**](https://github.com/apify/actor-scraper/tree/master/packages/actor-scraper/cheerio-scraper) on GitHub, or build your own with Crawlee's [`CheerioCrawler`](https://crawlee.dev/js/api/cheerio-crawler)
+
+Cheerio Scraper is built for technical users comfortable with [jQuery](https://jquery.com) and Cheerio, and works on **static HTML pages**. For pages that render content with client-side JavaScript, reach for a browser-based scraper instead — [**Web Scraper**](https://apify.com/apify/web-scraper), [**Puppeteer Scraper**](https://apify.com/apify/puppeteer-scraper), or [**Playwright Scraper**](https://apify.com/apify/playwright-scraper). If you're not a developer, [**AI Web Scraper**](https://apify.com/apify/ai-web-scraper) lets you describe what to extract in plain English — no page function required. To learn Cheerio Scraper step by step, follow the [**Scraping with Cheerio Scraper**](https://docs.apify.com/academy/apify-scrapers/cheerio-scraper) tutorial in the Apify Academy.
## Cost of usage
-You can find the average usage cost for this Actor on the [pricing page](https://apify.com/pricing) under the `Which plan do I need?` section. Cheerio Scraper is equivalent to `Simple HTML pages` while Web Scraper, Puppeteer Scraper and Playwright Scraper are equivalent to `Full web pages`. These cost estimates are based on averages and might be lower or higher depending on how heavy the pages you scrape are.
+Cheerio Scraper is billed by [platform usage](https://apify.com/pricing) (compute units, storage operations, data transfer) rather than a flat per-result fee, so the exact cost of a run is hard to predict. It depends on **how many pages you crawl**, **how rich your page function is**, **how many links each page produces**, **page size**, **proxy choice**, and **memory allocation**. Treat the numbers below as illustrative samples, not a guaranteed price — for your own use case, run a small test first and extrapolate.
+
+For a quick orientation, the [pricing page](https://apify.com/pricing) lists average estimates under `Which plan do I need?`. Cheerio Scraper is equivalent to `Simple HTML pages`; Web Scraper, Puppeteer Scraper and Playwright Scraper are equivalent to `Full web pages`.
+
+### Sample runs
+
+Both samples below crawled the same site ([`docs.apify.com`](https://docs.apify.com)) on default settings (1024 MB memory, Apify Proxy, concurrency 50). They differ in page-function complexity and crawl size.
+
+| Sample | Pages | Page function | Runtime | Compute units | Total cost |
+|-----------------------------------------|------:|------------------------------------------------------------------------------------------|---------:|--------------:|-----------:|
+| Lightweight (title, h1, meta description) | 237 | 4 selectors | 3 min 15 s | 0.054 CU | **$0.024** |
+| Heavier (all h2/h3, internal link list, code-block count, word count) | 485 | 8+ selectors plus body word count | 6 min 38 s | 0.111 CU | **$0.048** |
+
+Both samples worked out to roughly **$0.0001 per result** (~$0.10 per 1,000 results) on this site. Cost is dominated by compute units (~46%) and request-queue writes (~40%). On heavier sites — large pages, long link graphs, residential proxy, slower responses — the per-result figure can be several times higher, so use these numbers as a starting point only.
## Usage
@@ -36,32 +59,6 @@ Cheerio Scraper has a number of advanced configuration settings to improve perfo
Under the hood, Cheerio Scraper is built using the [`CheerioCrawler`](https://crawlee.dev/api/cheerio-crawler/class/CheerioCrawler) class
from Crawlee. If you'd like to learn more about the inner workings of the scraper, see the respective documentation.
-## Content types
-
-By default, Cheerio Scraper only processes web pages with the `text/html`, `application/json`, `application/xml`, `application/xhtml+xml` MIME content types (as reported by the `Content-Type` HTTP header),
-and skips pages with other content types.
-If you want the crawler to process other content types,
-use the **Additional MIME types** (`additionalMimeTypes`) input option.
-
-Note that while the default `Accept` HTTP header will allow any content type to be received,
-HTML and XML are preferred over JSON and other types. Thus, if you're allowing additional MIME
-types, and you're still receiving invalid responses, be sure to override the `Accept`
-HTTP header setting in the requests from the scraper,
-either in [**Start URLs**](#start-urls), [**Pseudo URLs**](#pseudo-urls) or in the **Prepare request function**.
-
-The web pages with various content types are parsed differently and
-thus the `context` parameter of the [**Page function**](#page-function) will have different values:
-
-| **Content types** | [`context.body`](#body-stringbuffer) | [`context.$`](#-function) | [`context.json`](#json-object) |
-| ------------------------------------------------------- | ------------------------------------ | ------------------------- | ------------------------------ |
-| `text/html`, `application/xhtml+xml`, `application/xml` | `String` | `Function` | `null` |
-| `application/json` | `String` | `null` | `Object` |
-| Other | `Buffer` | `null` | `null` |
-
-The `Content-Type` HTTP header of the web page is parsed using the
-content-type NPM package
-and the result is stored in the [`context.contentType`](#contenttype-object) object.
-
## Limitations
The Actor does not employ a full-featured web browser such as Chromium or Firefox, so it will not be sufficient for web pages that render their content dynamically using client-side JavaScript. To scrape such sites, you might prefer to use [**Web Scraper**](https://apify.com/apify/web-scraper) (`apify/web-scraper`), which loads pages in a full browser and renders dynamic content.
@@ -70,11 +67,13 @@ Since Cheerio Scraper's **Page function** is executed in the context of the serv
[**Puppeteer Scraper**](https://apify.com/apify/puppeteer-scraper) (`apify/puppeteer-scraper`). If you prefer Firefox and/or [Playwright](https://github.com/microsoft/playwright), check out [**Playwright Scraper**](https://apify.com/apify/playwright-scraper) (`apify/playwright-scraper`). For even more flexibility and control, you might develop a new Actor from scratch in Node.js using [Apify SDK](https://sdk.apify.com/) and [Crawlee](https://crawlee.dev).
In the [**Page function**](#page-function) and **Prepare request function**,
-you can only use NPM modules that are already installed in this Actor.
+you can only use npm modules that are already installed in this Actor.
If you require other modules for your scraping, you'll need to develop a completely new Actor.
You can use the [`CheerioCrawler`](https://crawlee.dev/api/cheerio-crawler/class/CheerioCrawler) class
from Crawlee to get most of the functionality of Cheerio Scraper out of the box.
+Don't know how to code a page function? The [**AI Web Scraper**](https://apify.com/apify/ai-web-scraper) lets you describe what to extract in plain English instead — no JavaScript required.
+
## Input configuration
As input, Cheerio Scraper Actor accepts a number of configurations. These can be entered either manually in the user interface in [Apify Console](https://console.apify.com), or programmatically in a JSON object using the [Apify API](https://apify.com/docs/api/v2#/reference/actors/run-collection/run-actor). For a complete list of input fields and their types, please visit the [Input](https://apify.com/apify/cheerio-scraper/input-schema) tab.
@@ -91,7 +90,7 @@ Optionally, each URL can be associated with custom user data - a JSON object tha
your JavaScript code in the [**Page function**](#page-function) under `context.request.userData`.
This is useful for determining which start URL is currently loaded, in order to perform some page-specific actions. For example, when crawling an online store, you might want to perform different actions on a page listing the products vs. a product detail page. For details, see the [**Web scraping tutorial**](https://docs.apify.com/tutorials/apify-scrapers/getting-started#the-start-url) in the Apify documentation.
-
+Cheerio Scraper uses an Apify [request queue](https://docs.apify.com/platform/storage/request-queue) to track the URLs it has loaded and the URLs it still needs to load. Each request is identified by a `uniqueKey` — by default the request URL, with the URL fragment (`#...`) stripped unless the **Keep URL fragments** option is enabled. Requests whose `uniqueKey` has already been seen are skipped, so the same page isn't loaded twice. You can override `uniqueKey` per request when calling `context.enqueueRequest()` from the page function — useful when you need to scrape the same URL multiple times with different `userData`.
### Link selector
@@ -154,6 +153,33 @@ Note that you don't need to use the **Pseudo-URLs** setting at all,
because you can completely control which pages the scraper will access by calling `await context.enqueueRequest()`
from the **[Page function](#page-function)**.
+### Content types
+
+By default, Cheerio Scraper only processes web pages with the `text/html`, `application/json`, `application/xml`, `application/xhtml+xml` MIME content types (as reported by the `Content-Type` HTTP header),
+and skips pages with other content types. This is an edge-case setting — most users won't need to change it. The most common reason to do so is when paginating through endpoints that return non-default content types (for example, a JSON API that drives the listing pages).
+
+If you want the crawler to process other content types,
+use the **Additional MIME types** (`additionalMimeTypes`) input option.
+
+Note that while the default `Accept` HTTP header will allow any content type to be received,
+HTML and XML are preferred over JSON and other types. Thus, if you're allowing additional MIME
+types, and you're still receiving invalid responses, be sure to override the `Accept`
+HTTP header setting in the requests from the scraper,
+either in [**Start URLs**](#start-urls), [**Pseudo URLs**](#pseudo-urls) or in the **Prepare request function**.
+
+The web pages with various content types are parsed differently and
+thus the `context` parameter of the [**Page function**](#page-function) will have different values:
+
+| **Content types** | [`context.body`](#body-stringbuffer) | [`context.$`](#-function) | [`context.json`](#json-object) |
+| ------------------------------------------------------- | ------------------------------------ | ------------------------- | ------------------------------ |
+| `text/html`, `application/xhtml+xml`, `application/xml` | `String` | `Function` | `null` |
+| `application/json` | `String` | `null` | `Object` |
+| Other | `Buffer` | `null` | `null` |
+
+The `Content-Type` HTTP header of the web page is parsed using the
+content-type npm package
+and the result is stored in the [`context.contentType`](#contenttype-object) object.
+
### Page function
The **Page function** (`pageFunction`) field contains a single JavaScript function that enables the user to extract data from the web page, access its DOM, add new URLs to the request queue, and otherwise control Cheerio Scraper's operation.
@@ -566,6 +592,54 @@ For more information, see [Datasets](https://docs.apify.com/storage#dataset) in
or the [Get dataset items](https://docs.apify.com/api/v2#/reference/datasets/item-collection)
endpoint in Apify API reference.
+## Integrations
+
+Cheerio Scraper can be connected with almost any cloud service or web app thanks to [integrations on the Apify platform](https://apify.com/integrations). You can integrate with Make, Zapier, ChatGPT, Slack, Airbyte, GitHub, Google Sheets, Asana, Google Drive, Keboola, MCP Servers, and more.
+
+You can also use [webhooks](https://docs.apify.com/integrations/webhooks) to carry out an action whenever an event occurs, e.g., get a notification whenever a Cheerio Scraper run successfully finishes.
+
+## FAQ
+
+### How do I build a page function?
+
+The fastest way is the step-by-step [**Scraping with Cheerio Scraper**](https://docs.apify.com/academy/apify-scrapers/cheerio-scraper) tutorial in the Apify Academy. It walks you through selecting elements with Cheerio, returning data, and following links.
+
+If you'd rather skip the page function entirely, try the [**AI Web Scraper**](https://apify.com/apify/ai-web-scraper) — you describe what to extract in plain English and the Actor handles the rest.
+
+### When should I use Puppeteer or Playwright instead of Cheerio?
+
+Use **Cheerio Scraper** for static HTML — it's faster and cheaper because no browser is involved. Cheerio only sees the raw HTML response, so it can't reach content rendered by client-side JavaScript (single-page apps, infinite scroll, lazy-loaded content). Puppeteer- and Playwright-based scrapers run a real browser, so they handle dynamic content, click and scroll interactions, and login flows that Cheerio can't.
+
+The two libraries are similar; the main difference is browser support. **Puppeteer** is Chrome-only. **Playwright** also supports Firefox and WebKit. On Apify, you can choose:
+
+- [**Web Scraper**](https://apify.com/apify/web-scraper) — the simplest browser-based scraper, runs in the browser context, uses Puppeteer under the hood.
+- [**Puppeteer Scraper**](https://apify.com/apify/puppeteer-scraper) — lower-level control over the Puppeteer library.
+- [**Playwright Scraper**](https://apify.com/apify/playwright-scraper) — same idea, with Playwright.
+
+### Can I build my own Actor with Cheerio?
+
+Yes. The Cheerio Scraper Actor is open source — [view the source on Apify](https://apify.com/apify/cheerio-scraper/source-code) or [fork it on GitHub](https://github.com/apify/actor-scraper/tree/master/packages/actor-scraper/cheerio-scraper) to adjust it to your needs. Or build a custom Actor from scratch — start from one of the [Cheerio-based Apify Actor templates](https://apify.com/templates?search=cheerio) and use Crawlee's [`CheerioCrawler`](https://crawlee.dev/js/api/cheerio-crawler) for full control over the crawl, with [Cheerio](https://cheerio.js.org)'s parsing API and Apify's platform features.
+
+### Can I export Cheerio Scraper data using the Apify API?
+
+Yes. The Apify API gives you programmatic access to your runs and datasets. To access the API using Node.js, use the `apify-client` [npm package](https://apify.com/apify/cheerio-scraper/api/javascript). To access the API using Python, use the `apify-client` [PyPI package](https://apify.com/apify/cheerio-scraper/api/python). Check out the [Apify API reference](https://docs.apify.com/api/v2) docs or click on the [API tab](https://apify.com/apify/cheerio-scraper/api) for code examples.
+
+### Can I use Cheerio Scraper through an MCP server?
+
+Yes. With Apify's [MCP server](https://apify.com/apify/cheerio-scraper/api/mcp) you can run Cheerio Scraper inside AI agent workflows from clients like Claude Desktop and LibreChat, or build your own. See the [MCP tab](https://apify.com/apify/cheerio-scraper/api/mcp) for setup details.
+
+### Do I need proxies to use Cheerio Scraper?
+
+You usually do, especially for sites with anti-scraping protections. Cheerio Scraper integrates with [Apify Proxy](https://apify.com/proxy): datacenter proxies are included in the Free plan; residential proxies are available on paid plans. Configure them under [**Proxy configuration**](#proxy-configuration).
+
+### Is it legal to scrape with Cheerio Scraper?
+
+Cheerio Scraper extracts whatever the target site serves over public HTTP — your responsibility is to scrape ethically and respect the site's terms of service, `robots.txt`, and applicable law. You should not scrape personal data unless you have a legitimate reason to do so. Read more on the [legality of web scraping](https://blog.apify.com/is-web-scraping-legal/) and [ethical scraping](https://blog.apify.com/what-is-ethical-web-scraping-and-how-do-you-do-it/).
+
+### Cheerio Scraper is not working?
+
+We're always working on improving the performance of our Actors. If you've got technical feedback or found a bug, please create an issue on the Actor's [Issues tab](https://apify.com/apify/cheerio-scraper/issues/open).
+
## Additional resources
Congratulations! You've learned how Cheerio Scraper works.
@@ -579,11 +653,6 @@ You might also want to see these other resources:
Apify's basic tool for web crawling and scraping. It uses a full Chrome browser to render dynamic content.
A similar web scraping Actor to Puppeteer Scraper, but is simpler to use and only runs in the context of the browser.
Uses the [Puppeteer](https://github.com/GoogleChrome/puppeteer) library.
-- **Puppeteer Scraper** ([apify/puppeteer-scraper](https://apify.com/apify/puppeteer-scraper)) -
- An Actor similar to Web Scraper, which provides lower-level control of the underlying
- [Puppeteer](https://github.com/GoogleChrome/puppeteer) library and the ability to use server-side libraries.
-- **Playwright Scraper** ([apify/playwright-scraper](https://apify.com/apify/playwright-scraper)) -
- A similar web scraping Actor to Puppeteer Scraper, but using the [Playwright](https://github.com/microsoft/playwright) library instead.
- [Actors documentation](https://docs.apify.com/actors) -
Documentation for the Apify Actors cloud computing platform.
- [Apify SDK documentation](https://sdk.apify.com) - Learn more about the tools required to run your own Apify Actors.