From a908b730fd997aabfc4d1b52317a3a861462e5cb Mon Sep 17 00:00:00 2001
From: Marcel Rebro <marcel.rebro@apify.com>
Date: Wed, 6 May 2026 19:00:34 +0200
Subject: [PATCH 1/8] docs(cheerio-scraper): rewrite README per PMM brief
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Implements items #2, #3, #4, #5, #6, #7 from Fabian's PMM brief:

- Adds inline non-technical redirect to AI Web Scraper / Academy tutorial.
- Rewrites "Cost of usage" with two sample test runs against
  docs.apify.com (light vs heavier page function) and a clear caveat
  that exact cost depends on site complexity, page function, link
  graph, proxy, and memory.
- Moves "Content types" under "Input configuration" (right before the
  page function section).
- Adds AI Web Scraper mention to "Limitations".
- Adds an "Integrations" section (Zapier, Make, Apify API).
- Adds an "FAQ" section (page function, Puppeteer vs Cheerio,
  Playwright vs Cheerio, build your own with Crawlee).

Items #1 (new "What is Cheerio Scraper?" H2 framing) and #8
(janitorial — Node.js version staleness, Web Scraper "Puppeteer-only"
description) are still pending SME input and intentionally not in this
draft.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 .../actor-scraper/cheerio-scraper/README.md   | 103 +++++++++++++-----
 1 file changed, 74 insertions(+), 29 deletions(-)

diff --git a/packages/actor-scraper/cheerio-scraper/README.md b/packages/actor-scraper/cheerio-scraper/README.md
index 8370736d..137d3f18 100644
--- a/packages/actor-scraper/cheerio-scraper/README.md
+++ b/packages/actor-scraper/cheerio-scraper/README.md
@@ -5,12 +5,24 @@ browser but instead constructs a DOM from an HTML string. It then provides the u
 
 Cheerio Scraper is ideal for scraping web pages that do not rely on client-side JavaScript to serve their content and can be up to 20 times faster than using a full-browser solution such as Puppeteer.
 
-If you're unfamiliar with web scraping or web development in general,
-you might prefer to start with [**Scraping with Web Scraper**](https://docs.apify.com/tutorials/apify-scrapers/web-scraper) tutorial from the Apify documentation and then continue with [**Scraping with Cheerio Scraper**](https://docs.apify.com/tutorials/apify-scrapers/cheerio-scraper), a tutorial which will walk you through all the steps and provide a number of examples.
+Cheerio Scraper is built for technical users comfortable with [jQuery](https://jquery.com) and [Cheerio](https://cheerio.js.org). If you're not a developer, you'll likely have a better experience with [**AI Web Scraper**](https://apify.com/apify/ai-web-scraper) — describe what you want to extract in plain English, no page function required. If you'd like to learn how Cheerio Scraper works step by step, follow the [**Scraping with Cheerio Scraper**](https://docs.apify.com/academy/apify-scrapers/cheerio-scraper) tutorial in the Apify Academy.
 
 ## Cost of usage
 
-You can find the average usage cost for this Actor on the [pricing page](https://apify.com/pricing) under the `Which plan do I need?` section. Cheerio Scraper is equivalent to `Simple HTML pages` while Web Scraper, Puppeteer Scraper and Playwright Scraper are equivalent to `Full web pages`. These cost estimates are based on averages and might be lower or higher depending on how heavy the pages you scrape are.
+Cheerio Scraper is billed by [platform usage](https://apify.com/pricing) (compute units, storage operations, data transfer) rather than a flat per-result fee, so the exact cost of a run is hard to predict. It depends on **how many pages you crawl**, **how rich your page function is**, **how many links each page produces**, **page size**, **proxy choice**, and **memory allocation**. Treat the numbers below as illustrative samples, not a guaranteed price — for your own use case, run a small test first and extrapolate.
+
+For a quick orientation, the [pricing page](https://apify.com/pricing) lists average estimates under `Which plan do I need?`. Cheerio Scraper is equivalent to `Simple HTML pages`; Web Scraper, Puppeteer Scraper and Playwright Scraper are equivalent to `Full web pages`.
+
+### Sample runs
+
+Both samples below crawled the same site ([`docs.apify.com`](https://docs.apify.com)) on default settings (1024 MB memory, Apify Proxy, concurrency 50). They differ in page-function complexity and crawl size.
+
+| Sample                                  | Pages | Page function                                                                            | Runtime  | Compute units | Total cost |
+|-----------------------------------------|------:|------------------------------------------------------------------------------------------|---------:|--------------:|-----------:|
+| Lightweight (title, h1, meta description) | 237   | 4 selectors                                                                              | 3 min 15 s | 0.054 CU      | **$0.024** |
+| Heavier (all h2/h3, internal link list, code-block count, word count) | 485   | 8+ selectors plus body word count                                                        | 6 min 38 s | 0.111 CU      | **$0.048** |
+
+Both samples worked out to roughly **$0.0001 per result** (~$0.10 per 1,000 results) on this site. Cost is dominated by compute units (~46%) and request-queue writes (~40%). On heavier sites — large pages, long link graphs, residential proxy, slower responses — the per-result figure can be several times higher, so use these numbers as a starting point only.
 
 ## Usage
 
@@ -36,32 +48,6 @@ Cheerio Scraper has a number of advanced configuration settings to improve perfo
 Under the hood, Cheerio Scraper is built using the [`CheerioCrawler`](https://crawlee.dev/api/cheerio-crawler/class/CheerioCrawler) class
 from Crawlee. If you'd like to learn more about the inner workings of the scraper, see the respective documentation.
 
-## Content types
-
-By default, Cheerio Scraper only processes web pages with the `text/html`, `application/json`, `application/xml`, `application/xhtml+xml` MIME content types (as reported by the `Content-Type` HTTP header),
-and skips pages with other content types.
-If you want the crawler to process other content types,
-use the **Additional MIME types** (`additionalMimeTypes`) input option.
-
-Note that while the default `Accept` HTTP header will allow any content type to be received,
-HTML and XML are preferred over JSON and other types. Thus, if you're allowing additional MIME
-types, and you're still receiving invalid responses, be sure to override the `Accept`
-HTTP header setting in the requests from the scraper,
-either in [**Start URLs**](#start-urls), [**Pseudo URLs**](#pseudo-urls) or in the **Prepare request function**.
-
-The web pages with various content types are parsed differently and
-thus the `context` parameter of the [**Page function**](#page-function) will have different values:
-
-| **Content types**                                       | [`context.body`](#body-stringbuffer) | [`context.$`](#-function) | [`context.json`](#json-object) |
-| ------------------------------------------------------- | ------------------------------------ | ------------------------- | ------------------------------ |
-| `text/html`, `application/xhtml+xml`, `application/xml` | `String`                             | `Function`                | `null`                         |
-| `application/json`                                      | `String`                             | `null`                    | `Object`                       |
-| Other                                                   | `Buffer`                             | `null`                    | `null`                         |
-
-The `Content-Type` HTTP header of the web page is parsed using the
-<a href="https://www.npmjs.com/package/content-type" target="_blank">content-type</a> NPM package
-and the result is stored in the [`context.contentType`](#contenttype-object) object.
-
 ## Limitations
 
 The Actor does not employ a full-featured web browser such as Chromium or Firefox, so it will not be sufficient for web pages that render their content dynamically using client-side JavaScript. To scrape such sites, you might prefer to use [**Web Scraper**](https://apify.com/apify/web-scraper) (`apify/web-scraper`), which loads pages in a full browser and renders dynamic content.
@@ -75,6 +61,8 @@ If you require other modules for your scraping, you'll need to develop a complet
 You can use the [`CheerioCrawler`](https://crawlee.dev/api/cheerio-crawler/class/CheerioCrawler) class
 from Crawlee to get most of the functionality of Cheerio Scraper out of the box.
 
+Don't know how to code a page function? The [**AI Web Scraper**](https://apify.com/apify/ai-web-scraper) lets you describe what to extract in plain English instead — no JavaScript required.
+
 ## Input configuration
 
 As input, Cheerio Scraper Actor accepts a number of configurations. These can be entered either manually in the user interface in [Apify Console](https://console.apify.com), or programmatically in a JSON object using the [Apify API](https://apify.com/docs/api/v2#/reference/actors/run-collection/run-actor). For a complete list of input fields and their types, please visit the [Input](https://apify.com/apify/cheerio-scraper/input-schema) tab.
@@ -154,6 +142,33 @@ Note that you don't need to use the **Pseudo-URLs** setting at all,
 because you can completely control which pages the scraper will access by calling `await context.enqueueRequest()`
 from the **[Page function](#page-function)**.
 
+### Content types
+
+By default, Cheerio Scraper only processes web pages with the `text/html`, `application/json`, `application/xml`, `application/xhtml+xml` MIME content types (as reported by the `Content-Type` HTTP header),
+and skips pages with other content types. This is an edge-case setting — most users won't need to change it. The most common reason to do so is when paginating through endpoints that return non-default content types (for example, a JSON API that drives the listing pages).
+
+If you want the crawler to process other content types,
+use the **Additional MIME types** (`additionalMimeTypes`) input option.
+
+Note that while the default `Accept` HTTP header will allow any content type to be received,
+HTML and XML are preferred over JSON and other types. Thus, if you're allowing additional MIME
+types, and you're still receiving invalid responses, be sure to override the `Accept`
+HTTP header setting in the requests from the scraper,
+either in [**Start URLs**](#start-urls), [**Pseudo URLs**](#pseudo-urls) or in the **Prepare request function**.
+
+The web pages with various content types are parsed differently and
+thus the `context` parameter of the [**Page function**](#page-function) will have different values:
+
+| **Content types**                                       | [`context.body`](#body-stringbuffer) | [`context.$`](#-function) | [`context.json`](#json-object) |
+| ------------------------------------------------------- | ------------------------------------ | ------------------------- | ------------------------------ |
+| `text/html`, `application/xhtml+xml`, `application/xml` | `String`                             | `Function`                | `null`                         |
+| `application/json`                                      | `String`                             | `null`                    | `Object`                       |
+| Other                                                   | `Buffer`                             | `null`                    | `null`                         |
+
+The `Content-Type` HTTP header of the web page is parsed using the
+<a href="https://www.npmjs.com/package/content-type" target="_blank">content-type</a> NPM package
+and the result is stored in the [`context.contentType`](#contenttype-object) object.
+
 ### Page function
 
 The **Page function** (`pageFunction`) field contains a single JavaScript function that enables the user to extract data from the web page, access its DOM, add new URLs to the request queue, and otherwise control Cheerio Scraper's operation.
@@ -566,6 +581,36 @@ For more information, see [Datasets](https://docs.apify.com/storage#dataset) in
 or the [Get dataset items](https://docs.apify.com/api/v2#/reference/datasets/item-collection)
 endpoint in Apify API reference.
 
+## Integrations
+
+Cheerio Scraper plugs into the rest of your stack through Apify's integrations layer. The most common ways to wire it up:
+
+- **[Zapier](https://apify.com/integrations/zapier)** — trigger runs and route scraped data to thousands of Zapier-compatible apps (Google Sheets, Airtable, Slack, HubSpot, and more) without writing code.
+- **[Make](https://apify.com/integrations/make)** — build no-code automations that start a Cheerio Scraper run, transform the dataset, and forward results to other services.
+- **[Apify API](https://docs.apify.com/api/v2)** — call the Actor programmatically, pass input as JSON, and pull results from the dataset. Ideal for embedding scraping into your own backend.
+
+For the full list, see [Apify integrations](https://docs.apify.com/platform/integrations).
+
+## FAQ
+
+### How do I build a page function?
+
+The fastest way is the step-by-step [**Scraping with Cheerio Scraper**](https://docs.apify.com/academy/apify-scrapers/cheerio-scraper) tutorial in the Apify Academy. It walks you through selecting elements with Cheerio, returning data, and following links.
+
+If you'd rather skip the page function entirely, try the [**AI Web Scraper**](https://apify.com/apify/ai-web-scraper) — you describe what to extract in plain English and the Actor handles the rest.
+
+### When should I use Puppeteer Scraper instead of Cheerio Scraper?
+
+Use **Cheerio Scraper** for static HTML pages — it's faster and cheaper because it doesn't run a browser. Use [**Puppeteer Scraper**](https://apify.com/apify/puppeteer-scraper) (or the simpler [**Web Scraper**](https://apify.com/apify/web-scraper)) when the content you need is rendered by client-side JavaScript and isn't present in the raw HTML response.
+
+### When should I use Playwright Scraper instead of Cheerio Scraper?
+
+Same trade-off as above: if the page needs a real browser to render its content, reach for [**Playwright Scraper**](https://apify.com/apify/playwright-scraper). Playwright also has stronger support for Firefox and WebKit than Puppeteer if your target site behaves differently across browsers.
+
+### Can I build my own Actor with Cheerio?
+
+Yes. Cheerio Scraper is open source — see the [source on GitHub](https://github.com/apify/actor-scraper/tree/master/packages/actor-scraper/cheerio-scraper). To build a custom Actor with the same engine, use Crawlee's [`CheerioCrawler`](https://crawlee.dev/js/api/cheerio-crawler) class directly — you get full control over the crawl while keeping Cheerio's parsing and Apify's platform features.
+
 ## Additional resources
 
 Congratulations! You've learned how Cheerio Scraper works.

From 36fb789729be9aa829bd5b95511fd3816631ef3f Mon Sep 17 00:00:00 2001
From: Marcel Rebro <marcel.rebro@apify.com>
Date: Wed, 6 May 2026 19:07:23 +0200
Subject: [PATCH 2/8] docs(cheerio-scraper): align README with Apify-owned
 house pattern

Applies precedents established in Marcel's prior Actor README updates
(Facebook Groups/Reviews/Posts, Instagram Comment Scraper):

- Rewrites "What is Cheerio Scraper?" as a single H2 with a short
  lead, emoji feature bullets, and a closing audience/use-case
  paragraph (replaces the previous multi-paragraph academic intro,
  while keeping the technical-audience framing and the inline
  AI Web Scraper redirect).
- Moves the top-level "Integrations" H2 into the FAQ as a sub-
  question with the standard service list and webhooks line, matching
  the house pattern.
- Adds the standard FAQ sub-questions used across all Apify-owned
  scraper READMEs: API access, MCP server, proxies, legality, and
  "not working?".

Items #1 (overlap with existing "What is Cheerio Scraper?" framing)
and #5 (trailing "What is Cheerio web scraper?" mention) are
reconciled as a single section, per house pattern. Items #1 and #8
janitorial cleanups still pending SME input.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 .../actor-scraper/cheerio-scraper/README.md   | 57 ++++++++++++++-----
 1 file changed, 42 insertions(+), 15 deletions(-)

diff --git a/packages/actor-scraper/cheerio-scraper/README.md b/packages/actor-scraper/cheerio-scraper/README.md
index 137d3f18..197243b5 100644
--- a/packages/actor-scraper/cheerio-scraper/README.md
+++ b/packages/actor-scraper/cheerio-scraper/README.md
@@ -1,11 +1,22 @@
-Cheerio Scraper is a ready-made solution for crawling websites using plain HTTP requests. It retrieves the HTML pages, parses them using the [Cheerio](https://cheerio.js.org) Node.js library and lets you extract any data from them. Fast.
+## What is Cheerio Scraper?
 
-Cheerio is a server-side version of the popular [jQuery](https://jquery.com) library. It does not require a
-browser but instead constructs a DOM from an HTML string. It then provides the user an API to work with that DOM.
+It's a fast, server-side scraper that pulls plain HTML over HTTP and parses it with [Cheerio](https://cheerio.js.org) — the server-side equivalent of [jQuery](https://jquery.com). No browser, no client-side JavaScript: just the raw HTML response and a familiar selector API. With Cheerio Scraper, you can:
 
-Cheerio Scraper is ideal for scraping web pages that do not rely on client-side JavaScript to serve their content and can be up to 20 times faster than using a full-browser solution such as Puppeteer.
+⚡ Run **up to 20× faster** than full-browser scrapers — no Chrome to spin up
 
-Cheerio Scraper is built for technical users comfortable with [jQuery](https://jquery.com) and [Cheerio](https://cheerio.js.org). If you're not a developer, you'll likely have a better experience with [**AI Web Scraper**](https://apify.com/apify/ai-web-scraper) — describe what you want to extract in plain English, no page function required. If you'd like to learn how Cheerio Scraper works step by step, follow the [**Scraping with Cheerio Scraper**](https://docs.apify.com/academy/apify-scrapers/cheerio-scraper) tutorial in the Apify Academy.
+🧩 Use **jQuery-style selectors** via [Cheerio](https://cheerio.js.org) to extract any data from the parsed DOM
+
+🔗 **Crawl recursively** with Link selector, Glob Patterns, and Pseudo-URLs — pagination, sitemaps, full-site crawls
+
+🛠 Write a **custom page function** in JavaScript with full access to Cheerio, the request, the response, the dataset, and the request queue
+
+📦 Export results as **JSON, CSV, XML, Excel, or HTML**, or pull them via the [Apify API](https://docs.apify.com/api/v2)
+
+🔌 Plug into **Make, Zapier, webhooks, MCP servers**, and the rest of [Apify's integrations](https://apify.com/integrations)
+
+🪪 **Open source** — see the [source on GitHub](https://github.com/apify/actor-scraper/tree/master/packages/actor-scraper/cheerio-scraper), or build your own with Crawlee's [`CheerioCrawler`](https://crawlee.dev/js/api/cheerio-crawler)
+
+Cheerio Scraper is built for technical users comfortable with [jQuery](https://jquery.com) and Cheerio. If you're not a developer, you'll likely have a better experience with [**AI Web Scraper**](https://apify.com/apify/ai-web-scraper) — describe what you want to extract in plain English, no page function required. To learn how Cheerio Scraper works step by step, follow the [**Scraping with Cheerio Scraper**](https://docs.apify.com/academy/apify-scrapers/cheerio-scraper) tutorial in the Apify Academy.
 
 ## Cost of usage
 
@@ -581,16 +592,6 @@ For more information, see [Datasets](https://docs.apify.com/storage#dataset) in
 or the [Get dataset items](https://docs.apify.com/api/v2#/reference/datasets/item-collection)
 endpoint in Apify API reference.
 
-## Integrations
-
-Cheerio Scraper plugs into the rest of your stack through Apify's integrations layer. The most common ways to wire it up:
-
-- **[Zapier](https://apify.com/integrations/zapier)** — trigger runs and route scraped data to thousands of Zapier-compatible apps (Google Sheets, Airtable, Slack, HubSpot, and more) without writing code.
-- **[Make](https://apify.com/integrations/make)** — build no-code automations that start a Cheerio Scraper run, transform the dataset, and forward results to other services.
-- **[Apify API](https://docs.apify.com/api/v2)** — call the Actor programmatically, pass input as JSON, and pull results from the dataset. Ideal for embedding scraping into your own backend.
-
-For the full list, see [Apify integrations](https://docs.apify.com/platform/integrations).
-
 ## FAQ
 
 ### How do I build a page function?
@@ -611,6 +612,32 @@ Same trade-off as above: if the page needs a real browser to render its content,
 
 Yes. Cheerio Scraper is open source — see the [source on GitHub](https://github.com/apify/actor-scraper/tree/master/packages/actor-scraper/cheerio-scraper). To build a custom Actor with the same engine, use Crawlee's [`CheerioCrawler`](https://crawlee.dev/js/api/cheerio-crawler) class directly — you get full control over the crawl while keeping Cheerio's parsing and Apify's platform features.
 
+### Can I export Cheerio Scraper data using the Apify API?
+
+Yes. The Apify API gives you programmatic access to your runs and datasets. To access the API using Node.js, use the `apify-client` [NPM package](https://apify.com/apify/cheerio-scraper/api/javascript). To access the API using Python, use the `apify-client` [PyPI package](https://apify.com/apify/cheerio-scraper/api/python). Check out the [Apify API reference](https://docs.apify.com/api/v2) docs or click on the [API tab](https://apify.com/apify/cheerio-scraper/api) for code examples.
+
+### Can I use Cheerio Scraper through an MCP server?
+
+Yes. With Apify's [MCP server](https://apify.com/apify/cheerio-scraper/api/mcp) you can run Cheerio Scraper inside AI agent workflows from clients like Claude Desktop and LibreChat, or build your own. See the [MCP tab](https://apify.com/apify/cheerio-scraper/api/mcp) for setup details.
+
+### Do I need proxies to use Cheerio Scraper?
+
+You usually do, especially for sites with anti-scraping protections. Cheerio Scraper integrates with [Apify Proxy](https://apify.com/proxy): datacenter proxies are included in the Free plan; residential proxies are available on paid plans. Configure them under [**Proxy configuration**](#proxy-configuration).
+
+### Can I integrate Cheerio Scraper with other apps?
+
+Yes. Cheerio Scraper can be connected with almost any cloud service or web app thanks to [integrations on the Apify platform](https://apify.com/integrations). You can integrate with Make, Zapier, ChatGPT, Slack, Airbyte, GitHub, Google Sheets, Asana, Google Drive, Keboola, MCP Servers, and more.
+
+You can also use [webhooks](https://docs.apify.com/integrations/webhooks) to carry out an action whenever an event occurs, e.g., get a notification whenever a Cheerio Scraper run successfully finishes.
+
+### Is it legal to scrape with Cheerio Scraper?
+
+Cheerio Scraper extracts whatever the target site serves over public HTTP — your responsibility is to scrape ethically and respect the site's terms of service, `robots.txt`, and applicable law. You should not scrape personal data unless you have a legitimate reason to do so. Read more on the [legality of web scraping](https://blog.apify.com/is-web-scraping-legal/) and [ethical scraping](https://blog.apify.com/what-is-ethical-web-scraping-and-how-do-you-do-it/).
+
+### Cheerio Scraper is not working?
+
+We're always working on improving the performance of our Actors. If you've got technical feedback or found a bug, please create an issue on the Actor's [Issues tab](https://apify.com/apify/cheerio-scraper/issues/open).
+
 ## Additional resources
 
 Congratulations! You've learned how Cheerio Scraper works.

From d2506992b219c4c0e6228faa447a62b9b8658513 Mon Sep 17 00:00:00 2001
From: Marcel Rebro <marcel.rebro@apify.com>
Date: Wed, 6 May 2026 19:12:39 +0200
Subject: [PATCH 3/8] docs(cheerio-scraper): describe request queue and
 uniqueKey behavior
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Replaces the long-standing "TODO: Describe how the queue works,
unique key etc. plus link" placeholder with a short paragraph that
covers what the request queue is, how uniqueKey deduplication works,
the URL-fragment stripping default, and how to override uniqueKey
from enqueueRequest() in the page function.

The second pre-existing TODO (lines 425-426 about
prepareRequestFunction) is intentionally left in place — it depends
on whether the feature still exists in the current build, which is
a question for the dev team.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 packages/actor-scraper/cheerio-scraper/README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/packages/actor-scraper/cheerio-scraper/README.md b/packages/actor-scraper/cheerio-scraper/README.md
index 197243b5..1f74fcb5 100644
--- a/packages/actor-scraper/cheerio-scraper/README.md
+++ b/packages/actor-scraper/cheerio-scraper/README.md
@@ -90,7 +90,7 @@ Optionally, each URL can be associated with custom user data - a JSON object tha
 your JavaScript code in the [**Page function**](#page-function) under `context.request.userData`.
 This is useful for determining which start URL is currently loaded, in order to perform some page-specific actions. For example, when crawling an online store, you might want to perform different actions on a page listing the products vs. a product detail page. For details, see the [**Web scraping tutorial**](https://docs.apify.com/tutorials/apify-scrapers/getting-started#the-start-url) in the Apify documentation.
 
-<!-- TODO: Describe how the queue works, unique key etc. plus link -->
+Cheerio Scraper uses an Apify [request queue](https://docs.apify.com/platform/storage/request-queue) to track the URLs it has loaded and the URLs it still needs to load. Each request is identified by a `uniqueKey` — by default the request URL, with the URL fragment (`#...`) stripped unless the **Keep URL fragments** option is enabled. Requests whose `uniqueKey` has already been seen are skipped, so the same page isn't loaded twice. You can override `uniqueKey` per request when calling `context.enqueueRequest()` from the page function — useful when you need to scrape the same URL multiple times with different `userData`.
 
 ### Link selector
 

From a8b8d6c841dbec05a7ba431adf2527dd72bd76cc Mon Sep 17 00:00:00 2001
From: Marcel Rebro <marcel.rebro@apify.com>
Date: Wed, 6 May 2026 19:22:30 +0200
Subject: [PATCH 4/8] docs(cheerio-scraper): promote Integrations to top-level
 section

Moves the integrations content out of the FAQ ("Can I integrate
Cheerio Scraper with other apps?") and into a dedicated `##
Integrations` H2 placed between `## Results` and `## FAQ`. Same
copy, just promoted from a sub-question.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 packages/actor-scraper/cheerio-scraper/README.md | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/packages/actor-scraper/cheerio-scraper/README.md b/packages/actor-scraper/cheerio-scraper/README.md
index 1f74fcb5..b1181ae8 100644
--- a/packages/actor-scraper/cheerio-scraper/README.md
+++ b/packages/actor-scraper/cheerio-scraper/README.md
@@ -592,6 +592,12 @@ For more information, see [Datasets](https://docs.apify.com/storage#dataset) in
 or the [Get dataset items](https://docs.apify.com/api/v2#/reference/datasets/item-collection)
 endpoint in Apify API reference.
 
+## Integrations
+
+Cheerio Scraper can be connected with almost any cloud service or web app thanks to [integrations on the Apify platform](https://apify.com/integrations). You can integrate with Make, Zapier, ChatGPT, Slack, Airbyte, GitHub, Google Sheets, Asana, Google Drive, Keboola, MCP Servers, and more.
+
+You can also use [webhooks](https://docs.apify.com/integrations/webhooks) to carry out an action whenever an event occurs, e.g., get a notification whenever a Cheerio Scraper run successfully finishes.
+
 ## FAQ
 
 ### How do I build a page function?
@@ -624,12 +630,6 @@ Yes. With Apify's [MCP server](https://apify.com/apify/cheerio-scraper/api/mcp)
 
 You usually do, especially for sites with anti-scraping protections. Cheerio Scraper integrates with [Apify Proxy](https://apify.com/proxy): datacenter proxies are included in the Free plan; residential proxies are available on paid plans. Configure them under [**Proxy configuration**](#proxy-configuration).
 
-### Can I integrate Cheerio Scraper with other apps?
-
-Yes. Cheerio Scraper can be connected with almost any cloud service or web app thanks to [integrations on the Apify platform](https://apify.com/integrations). You can integrate with Make, Zapier, ChatGPT, Slack, Airbyte, GitHub, Google Sheets, Asana, Google Drive, Keboola, MCP Servers, and more.
-
-You can also use [webhooks](https://docs.apify.com/integrations/webhooks) to carry out an action whenever an event occurs, e.g., get a notification whenever a Cheerio Scraper run successfully finishes.
-
 ### Is it legal to scrape with Cheerio Scraper?
 
 Cheerio Scraper extracts whatever the target site serves over public HTTP — your responsibility is to scrape ethically and respect the site's terms of service, `robots.txt`, and applicable law. You should not scrape personal data unless you have a legitimate reason to do so. Read more on the [legality of web scraping](https://blog.apify.com/is-web-scraping-legal/) and [ethical scraping](https://blog.apify.com/what-is-ethical-web-scraping-and-how-do-you-do-it/).

From 625cb5c636c068e2881826eb9426cf5ec0220b9a Mon Sep 17 00:00:00 2001
From: Marcel Rebro <marcel.rebro@apify.com>
Date: Wed, 6 May 2026 19:39:09 +0200
Subject: [PATCH 5/8] docs(cheerio-scraper): apply brief's literal FAQ Q2/Q3
 link targets
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Realigns FAQ Q2 ("Puppeteer vs Cheerio?") and Q3 ("Playwright vs
Cheerio?") with the PMM brief's literal instructions, now that the
Actor/library relationship is verified:

- Web Scraper uses Puppeteer (confirmed via package.json + README
  internal references in repos/actor-scraper/packages/actor-scraper/
  web-scraper/), so the brief's instruction to link Web Scraper for
  the "Puppeteer vs Cheerio?" question is internally consistent.
- Q2 now links Web Scraper as the primary on-ramp (per brief), with
  Puppeteer Scraper mentioned as the lower-level option.
- Headings switched to library-level phrasing ("Puppeteer instead of
  Cheerio") to match the brief's wording for both Q2 and Q3.
- Q3 simplified — there's only one Playwright-based Actor.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 packages/actor-scraper/cheerio-scraper/README.md | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/packages/actor-scraper/cheerio-scraper/README.md b/packages/actor-scraper/cheerio-scraper/README.md
index b1181ae8..cc82965c 100644
--- a/packages/actor-scraper/cheerio-scraper/README.md
+++ b/packages/actor-scraper/cheerio-scraper/README.md
@@ -606,13 +606,13 @@ The fastest way is the step-by-step [**Scraping with Cheerio Scraper**](https://
 
 If you'd rather skip the page function entirely, try the [**AI Web Scraper**](https://apify.com/apify/ai-web-scraper) — you describe what to extract in plain English and the Actor handles the rest.
 
-### When should I use Puppeteer Scraper instead of Cheerio Scraper?
+### When should I use Puppeteer instead of Cheerio?
 
-Use **Cheerio Scraper** for static HTML pages — it's faster and cheaper because it doesn't run a browser. Use [**Puppeteer Scraper**](https://apify.com/apify/puppeteer-scraper) (or the simpler [**Web Scraper**](https://apify.com/apify/web-scraper)) when the content you need is rendered by client-side JavaScript and isn't present in the raw HTML response.
+Use **Cheerio Scraper** for static HTML — it's faster and cheaper because no browser is involved. When the content needs client-side JavaScript to render, reach for a Puppeteer-based browser scraper: [**Web Scraper**](https://apify.com/apify/web-scraper) is the easy on-ramp (simpler API, runs in the browser context), while [**Puppeteer Scraper**](https://apify.com/apify/puppeteer-scraper) gives you lower-level control over the Puppeteer library directly.
 
-### When should I use Playwright Scraper instead of Cheerio Scraper?
+### When should I use Playwright instead of Cheerio?
 
-Same trade-off as above: if the page needs a real browser to render its content, reach for [**Playwright Scraper**](https://apify.com/apify/playwright-scraper). Playwright also has stronger support for Firefox and WebKit than Puppeteer if your target site behaves differently across browsers.
+If the page needs a real browser to render its content — and you want stronger support for Firefox and WebKit than Puppeteer offers — reach for [**Playwright Scraper**](https://apify.com/apify/playwright-scraper).
 
 ### Can I build my own Actor with Cheerio?
 

From 149aa09936f0d94c545b22ff017380f60ad141f9 Mon Sep 17 00:00:00 2001
From: Marcel Rebro <marcel.rebro@apify.com>
Date: Wed, 6 May 2026 19:43:28 +0200
Subject: [PATCH 6/8] docs(cheerio-scraper): link Cheerio (not Cheerio Scraper
 source) in FAQ Q4
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The FAQ question "Can I build my own Actor with Cheerio?" is asking
about building a custom Actor on top of the Cheerio library — not
about forking Cheerio Scraper. Swap the Cheerio Scraper source link
for a link to cheerio.js.org so the answer points at what the user
actually needs. Cheerio Scraper's open-source link is still
referenced in the intro section's "Open source" bullet.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 packages/actor-scraper/cheerio-scraper/README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/packages/actor-scraper/cheerio-scraper/README.md b/packages/actor-scraper/cheerio-scraper/README.md
index cc82965c..6124e5dc 100644
--- a/packages/actor-scraper/cheerio-scraper/README.md
+++ b/packages/actor-scraper/cheerio-scraper/README.md
@@ -616,7 +616,7 @@ If the page needs a real browser to render its content — and you want stronger
 
 ### Can I build my own Actor with Cheerio?
 
-Yes. Cheerio Scraper is open source — see the [source on GitHub](https://github.com/apify/actor-scraper/tree/master/packages/actor-scraper/cheerio-scraper). To build a custom Actor with the same engine, use Crawlee's [`CheerioCrawler`](https://crawlee.dev/js/api/cheerio-crawler) class directly — you get full control over the crawl while keeping Cheerio's parsing and Apify's platform features.
+Yes. Build a custom Actor on top of [Cheerio](https://cheerio.js.org) using Crawlee's [`CheerioCrawler`](https://crawlee.dev/js/api/cheerio-crawler) class — you get full control over the crawl while keeping Cheerio's parsing API and Apify's platform features.
 
 ### Can I export Cheerio Scraper data using the Apify API?
 

From 53e90a3c4eb814ab2f3a0c6ab18e706006367d33 Mon Sep 17 00:00:00 2001
From: Marcel Rebro <marcel.rebro@apify.com>
Date: Thu, 7 May 2026 13:06:49 +0200
Subject: [PATCH 7/8] docs(cheerio-scraper): mention forking the Actor in FAQ
 Q4

Fabian flagged that the open-source/fork path matters for users
asking "Can I build my own Actor with Cheerio?". Restores the link
to Cheerio Scraper's source as the primary "fork and adjust" path,
keeping the build-from-scratch route (Crawlee + Cheerio) as the
secondary option.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 packages/actor-scraper/cheerio-scraper/README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/packages/actor-scraper/cheerio-scraper/README.md b/packages/actor-scraper/cheerio-scraper/README.md
index 6124e5dc..3da32644 100644
--- a/packages/actor-scraper/cheerio-scraper/README.md
+++ b/packages/actor-scraper/cheerio-scraper/README.md
@@ -616,7 +616,7 @@ If the page needs a real browser to render its content — and you want stronger
 
 ### Can I build my own Actor with Cheerio?
 
-Yes. Build a custom Actor on top of [Cheerio](https://cheerio.js.org) using Crawlee's [`CheerioCrawler`](https://crawlee.dev/js/api/cheerio-crawler) class — you get full control over the crawl while keeping Cheerio's parsing API and Apify's platform features.
+Yes. The Cheerio Scraper Actor is open source — you can [fork it](https://github.com/apify/actor-scraper/tree/master/packages/actor-scraper/cheerio-scraper) and adjust it to your needs. Or build a custom Actor from scratch on top of [Cheerio](https://cheerio.js.org) using Crawlee's [`CheerioCrawler`](https://crawlee.dev/js/api/cheerio-crawler) class — you get full control over the crawl with Cheerio's parsing API and Apify's platform features.
 
 ### Can I export Cheerio Scraper data using the Apify API?
 

From e478220ea96835263e9e323ddd3cb9d731828603 Mon Sep 17 00:00:00 2001
From: Marcel Rebro <marcel.rebro@apify.com>
Date: Thu, 7 May 2026 16:28:55 +0200
Subject: [PATCH 8/8] docs(cheerio-scraper): apply Fabian's review feedback
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Round of edits after PMM review session:

- Intro emoji bullets: drop Apify-jargon list from the page-function
  bullet ("the request, the response, the dataset, the request queue")
  and shorten to "extract data and steer the crawl". Verb-led "Fork"
  bullet links the GitHub repo (where forking is a real action).
- Intro closing paragraph: add JS-heavy redirect to Web/Puppeteer/
  Playwright Scraper, in addition to the existing AI Web Scraper
  redirect for non-developers.
- FAQ: combine "Puppeteer vs Cheerio" and "Playwright vs Cheerio"
  into a single entry. Lead with the advantage of browser-based
  scrapers over Cheerio (dynamic content, interactions, login flows),
  then quickly cover the Puppeteer vs Playwright difference (browser
  support).
- FAQ "Can I build my own Actor with Cheerio?": split source-code
  links by intent — view on Apify, fork on GitHub. Replace generic
  templates link with the Cheerio-filtered query.
- Additional resources: drop the Puppeteer Scraper and Playwright
  Scraper bullets — they're already covered in the combined FAQ entry
  and the intro closing paragraph.
- Lowercase npm in 3 places per Apify docs convention.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 .../actor-scraper/cheerio-scraper/README.md   | 29 +++++++++----------
 1 file changed, 13 insertions(+), 16 deletions(-)

diff --git a/packages/actor-scraper/cheerio-scraper/README.md b/packages/actor-scraper/cheerio-scraper/README.md
index 3da32644..7b332997 100644
--- a/packages/actor-scraper/cheerio-scraper/README.md
+++ b/packages/actor-scraper/cheerio-scraper/README.md
@@ -8,15 +8,15 @@ It's a fast, server-side scraper that pulls plain HTML over HTTP and parses it w
 
 🔗 **Crawl recursively** with Link selector, Glob Patterns, and Pseudo-URLs — pagination, sitemaps, full-site crawls
 
-🛠 Write a **custom page function** in JavaScript with full access to Cheerio, the request, the response, the dataset, and the request queue
+🛠 Write a **custom page function** in JavaScript to extract data and steer the crawl
 
 📦 Export results as **JSON, CSV, XML, Excel, or HTML**, or pull them via the [Apify API](https://docs.apify.com/api/v2)
 
 🔌 Plug into **Make, Zapier, webhooks, MCP servers**, and the rest of [Apify's integrations](https://apify.com/integrations)
 
-🪪 **Open source** — see the [source on GitHub](https://github.com/apify/actor-scraper/tree/master/packages/actor-scraper/cheerio-scraper), or build your own with Crawlee's [`CheerioCrawler`](https://crawlee.dev/js/api/cheerio-crawler)
+🪪 Fork the [**open-source Actor**](https://github.com/apify/actor-scraper/tree/master/packages/actor-scraper/cheerio-scraper) on GitHub, or build your own with Crawlee's [`CheerioCrawler`](https://crawlee.dev/js/api/cheerio-crawler)
 
-Cheerio Scraper is built for technical users comfortable with [jQuery](https://jquery.com) and Cheerio. If you're not a developer, you'll likely have a better experience with [**AI Web Scraper**](https://apify.com/apify/ai-web-scraper) — describe what you want to extract in plain English, no page function required. To learn how Cheerio Scraper works step by step, follow the [**Scraping with Cheerio Scraper**](https://docs.apify.com/academy/apify-scrapers/cheerio-scraper) tutorial in the Apify Academy.
+Cheerio Scraper is built for technical users comfortable with [jQuery](https://jquery.com) and Cheerio, and works on **static HTML pages**. For pages that render content with client-side JavaScript, reach for a browser-based scraper instead — [**Web Scraper**](https://apify.com/apify/web-scraper), [**Puppeteer Scraper**](https://apify.com/apify/puppeteer-scraper), or [**Playwright Scraper**](https://apify.com/apify/playwright-scraper). If you're not a developer, [**AI Web Scraper**](https://apify.com/apify/ai-web-scraper) lets you describe what to extract in plain English — no page function required. To learn Cheerio Scraper step by step, follow the [**Scraping with Cheerio Scraper**](https://docs.apify.com/academy/apify-scrapers/cheerio-scraper) tutorial in the Apify Academy.
 
 ## Cost of usage
 
@@ -67,7 +67,7 @@ Since Cheerio Scraper's **Page function** is executed in the context of the serv
 [**Puppeteer Scraper**](https://apify.com/apify/puppeteer-scraper) (`apify/puppeteer-scraper`). If you prefer Firefox and/or [Playwright](https://github.com/microsoft/playwright), check out [**Playwright Scraper**](https://apify.com/apify/playwright-scraper) (`apify/playwright-scraper`). For even more flexibility and control, you might develop a new Actor from scratch in Node.js using [Apify SDK](https://sdk.apify.com/) and [Crawlee](https://crawlee.dev).
 
 In the [**Page function**](#page-function) and **Prepare request function**,
-you can only use NPM modules that are already installed in this Actor.
+you can only use npm modules that are already installed in this Actor.
 If you require other modules for your scraping, you'll need to develop a completely new Actor.
 You can use the [`CheerioCrawler`](https://crawlee.dev/api/cheerio-crawler/class/CheerioCrawler) class
 from Crawlee to get most of the functionality of Cheerio Scraper out of the box.
@@ -177,7 +177,7 @@ thus the `context` parameter of the [**Page function**](#page-function) will hav
 | Other                                                   | `Buffer`                             | `null`                    | `null`                         |
 
 The `Content-Type` HTTP header of the web page is parsed using the
-<a href="https://www.npmjs.com/package/content-type" target="_blank">content-type</a> NPM package
+<a href="https://www.npmjs.com/package/content-type" target="_blank">content-type</a> npm package
 and the result is stored in the [`context.contentType`](#contenttype-object) object.
 
 ### Page function
@@ -606,21 +606,23 @@ The fastest way is the step-by-step [**Scraping with Cheerio Scraper**](https://
 
 If you'd rather skip the page function entirely, try the [**AI Web Scraper**](https://apify.com/apify/ai-web-scraper) — you describe what to extract in plain English and the Actor handles the rest.
 
-### When should I use Puppeteer instead of Cheerio?
+### When should I use Puppeteer or Playwright instead of Cheerio?
 
-Use **Cheerio Scraper** for static HTML — it's faster and cheaper because no browser is involved. When the content needs client-side JavaScript to render, reach for a Puppeteer-based browser scraper: [**Web Scraper**](https://apify.com/apify/web-scraper) is the easy on-ramp (simpler API, runs in the browser context), while [**Puppeteer Scraper**](https://apify.com/apify/puppeteer-scraper) gives you lower-level control over the Puppeteer library directly.
+Use **Cheerio Scraper** for static HTML — it's faster and cheaper because no browser is involved. Cheerio only sees the raw HTML response, so it can't reach content rendered by client-side JavaScript (single-page apps, infinite scroll, lazy-loaded content). Puppeteer- and Playwright-based scrapers run a real browser, so they handle dynamic content, click and scroll interactions, and login flows that Cheerio can't.
 
-### When should I use Playwright instead of Cheerio?
+The two libraries are similar; the main difference is browser support. **Puppeteer** is Chrome-only. **Playwright** also supports Firefox and WebKit. On Apify, you can choose:
 
-If the page needs a real browser to render its content — and you want stronger support for Firefox and WebKit than Puppeteer offers — reach for [**Playwright Scraper**](https://apify.com/apify/playwright-scraper).
+- [**Web Scraper**](https://apify.com/apify/web-scraper) — the simplest browser-based scraper, runs in the browser context, uses Puppeteer under the hood.
+- [**Puppeteer Scraper**](https://apify.com/apify/puppeteer-scraper) — lower-level control over the Puppeteer library.
+- [**Playwright Scraper**](https://apify.com/apify/playwright-scraper) — same idea, with Playwright.
 
 ### Can I build my own Actor with Cheerio?
 
-Yes. The Cheerio Scraper Actor is open source — you can [fork it](https://github.com/apify/actor-scraper/tree/master/packages/actor-scraper/cheerio-scraper) and adjust it to your needs. Or build a custom Actor from scratch on top of [Cheerio](https://cheerio.js.org) using Crawlee's [`CheerioCrawler`](https://crawlee.dev/js/api/cheerio-crawler) class — you get full control over the crawl with Cheerio's parsing API and Apify's platform features.
+Yes. The Cheerio Scraper Actor is open source — [view the source on Apify](https://apify.com/apify/cheerio-scraper/source-code) or [fork it on GitHub](https://github.com/apify/actor-scraper/tree/master/packages/actor-scraper/cheerio-scraper) to adjust it to your needs. Or build a custom Actor from scratch — start from one of the [Cheerio-based Apify Actor templates](https://apify.com/templates?search=cheerio) and use Crawlee's [`CheerioCrawler`](https://crawlee.dev/js/api/cheerio-crawler) for full control over the crawl, with [Cheerio](https://cheerio.js.org)'s parsing API and Apify's platform features.
 
 ### Can I export Cheerio Scraper data using the Apify API?
 
-Yes. The Apify API gives you programmatic access to your runs and datasets. To access the API using Node.js, use the `apify-client` [NPM package](https://apify.com/apify/cheerio-scraper/api/javascript). To access the API using Python, use the `apify-client` [PyPI package](https://apify.com/apify/cheerio-scraper/api/python). Check out the [Apify API reference](https://docs.apify.com/api/v2) docs or click on the [API tab](https://apify.com/apify/cheerio-scraper/api) for code examples.
+Yes. The Apify API gives you programmatic access to your runs and datasets. To access the API using Node.js, use the `apify-client` [npm package](https://apify.com/apify/cheerio-scraper/api/javascript). To access the API using Python, use the `apify-client` [PyPI package](https://apify.com/apify/cheerio-scraper/api/python). Check out the [Apify API reference](https://docs.apify.com/api/v2) docs or click on the [API tab](https://apify.com/apify/cheerio-scraper/api) for code examples.
 
 ### Can I use Cheerio Scraper through an MCP server?
 
@@ -651,11 +653,6 @@ You might also want to see these other resources:
   Apify's basic tool for web crawling and scraping. It uses a full Chrome browser to render dynamic content.
   A similar web scraping Actor to Puppeteer Scraper, but is simpler to use and only runs in the context of the browser.
   Uses the [Puppeteer](https://github.com/GoogleChrome/puppeteer) library.
-- **Puppeteer Scraper** ([apify/puppeteer-scraper](https://apify.com/apify/puppeteer-scraper)) -
-  An Actor similar to Web Scraper, which provides lower-level control of the underlying
-  [Puppeteer](https://github.com/GoogleChrome/puppeteer) library and the ability to use server-side libraries.
-- **Playwright Scraper** ([apify/playwright-scraper](https://apify.com/apify/playwright-scraper)) -
-  A similar web scraping Actor to Puppeteer Scraper, but using the [Playwright](https://github.com/microsoft/playwright) library instead.
 - [Actors documentation](https://docs.apify.com/actors) -
   Documentation for the Apify Actors cloud computing platform.
 - [Apify SDK documentation](https://sdk.apify.com) - Learn more about the tools required to run your own Apify Actors.