Lightweight toolbox to build reusable "scrapers":
- Declare a Request class annotated with the PHP attribute
#[Scraper(...)]. - Provide the corresponding Api class (replace "Request" with "Api" in the name) which extends
\Scraper\Scraper\Api\AbstractApiand implementsexecute(). - Use
\Scraper\Scraper\Clientwith anHttpClientInterfaceto execute the request and retrieve the deserialized object.
composer require rem42/scraper "^3.0"The package centralizes the following logic:
- A Request (under
src/Request/) defines the necessary data and exposes getters used in path variables. - The attribute
#[\Scraper\Scraper\Attribute\Scraper(...)](on the Request) describesmethod,scheme,host,path. \Scraper\Scraper\Client::send()reads this attribute (viaExtractAttribute), builds the HTTP options (headers, query, body, json, auth) according to the interfaces implemented by the Request, then performs the HTTP call.- The matching Api class (eg:
FooApi) is instantiated and itsexecute()method returns the final object/array/string.
Schematic example (adapt according to your autoload/imports). Examples use use imports:
use Symfony\Component\HttpClient\HttpClient;
use Scraper\Scraper\Client;
use Scraper\Scraper\Request\ScraperRequest;
use Scraper\Scraper\Attribute\Scraper;
use Scraper\Scraper\Attribute\Method;
use Scraper\Scraper\Attribute\Scheme;
use Scraper\Scraper\Api\AbstractApi;
#[Scraper(
method: Method::GET,
scheme: Scheme::HTTPS,
host: 'example.com',
path: '/items/{id}'
)]
class ItemRequest extends ScraperRequest
{
public function __construct(private string $id) {}
public function getId(): string { return $this->id; }
}
// Provide a matching Api: ItemApi extends AbstractApi
$http = HttpClient::create();
$client = new Client($http);
$result = $client->send(new ItemRequest('42'));- PSR-4 root namespace:
Scraper\\Scraper\\->src/(seecomposer.json). - Naming convention:
XRequest->XApi(Client performs this replacement automatically using reflection). - In the
pathattribute, variables{name}are replaced by callinggetName()on the Request instance (seesrc/Attribute/ExtractAttribute.php). - Implement the interfaces in
src/Request/to enable options:RequestHeaders,RequestQuery,RequestBody,RequestBodyJson,RequestAuthBearer,RequestAuthBasic.
- Run unit tests:
composer run unit-test
# or
./vendor/bin/phpunit- Static analysis (phpstan):
composer run static-analysis- Check / apply coding style (php-cs-fixer):
composer run code-style-check
composer run code-style-fixcomposer.json requires php: ^8.4 — the code uses enums and recent types, so PHP 8.4+ is recommended.
- Agent helper file:
AGENTS.md(tips, patterns, commands). Seepackages/scraper/AGENTS.md. - Key code points:
src/Client.php,src/Attribute/ExtractAttribute.php,src/Factory/SerializerFactory.php.
- rem42/scraper-allocine
- rem42/scraper-colissimo
- rem42/scraper-deezer
- rem42/scraper-giantbomb
- rem42/scraper-jeuxvideo
- rem42/scraper-prestashop
- rem42/scraper-shopify
- rem42/scraper-tmdb
- rem42/scraper-tnt
See AGENTS.md for rules and patterns to follow. For PRs: green tests + highest phpstan level.