Feature Request: Integrated Audio-to-Text CAPTCHA Solver using Whisper

Hi,

I am an active user of SeleniumBase and have developed a highly efficient, lightweight solution for bypassing ReCAPTCHA challenges that I would like to propose as an integration.

The Solution:
Instead of using heavy Computer Vision models, I have implemented an Audio-to-Text solver that leverages the Whisper model (the lightest and fastest version).

How it works: It switches to the audio challenge and uses the lightweight Whisper model to transcribe the prompt instantly.

Performance: It consistently achieves a 99% success rate on the first attempt, with the remaining 1% resolved on the second.

Latency: The transcription takes only 1.0 to 1.5 seconds, making it extremely fast for large-scale automation and scraping pipelines.

Demo:
You can watch the video: https://www.youtube.com/watch?v=MVS8HyvTkS0

Proposed Integration:
I would love to contribute this logic to SeleniumBase so other users can handle CAPTCHA blocks natively without needing external heavy dependencies.

I have the core logic ready. If this aligns with your project goals, I would be happy to share a Pull Request. Looking forward to your feedback!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature Request: Integrated Audio-to-Text CAPTCHA Solver using Whisper #4369

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Feature Request: Integrated Audio-to-Text CAPTCHA Solver using Whisper #4369

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions