Skip to content

Negative pending request count when using RedisRequestQueueClient #1873

@bdm83

Description

@bdm83

Background

I am using a Sitemap crawler with the Redis storage client to manage the request queue.

Bug

During a long running job, the pending_request_count went negative, preventing the crawler from terminating.

As you can see from my Redis state:

LLEN  request_queues:<name>:queue           = 0
HLEN  request_queues:<name>:in_progress     = 0
SCARD request_queues:<name>:pending_set     = 0
JSON.GET ...:metadata $.pending_request_count   = -6
JSON.GET ...:metadata $.total_request_count     = 89582
JSON.GET ...:metadata $.handled_request_count   = 89588   # > total

The pending_request_count = -6, this causes the RedisRequestQueueClient.is_empty() to return False:

return metadata.pending_request_count == 0

This results in crawlee continuing to endlessly query redis every 1-3ms (across each connection).

Possible cause

mark_request_as_handled guards on hexists(in_progress, unique_key) in a separate round-trip from the pipeline that does delta_handled_request_count=+1, delta_pending_request_count=-1. If two coroutines race on the same unique_key (or a retry/error path calls it twice), both hexists checks can pass before either pipeline executes, and the counters get double-applied.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working.t-toolingIssues with this label are in the ownership of the tooling team.

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions