Skip to content

Feature Request: Integrated Audio-to-Text CAPTCHA Solver using Whisper #4369

@Adilnawaz8881212

Description

@Adilnawaz8881212

Hi,

I am an active user of SeleniumBase and have developed a highly efficient, lightweight solution for bypassing ReCAPTCHA challenges that I would like to propose as an integration.

The Solution:
Instead of using heavy Computer Vision models, I have implemented an Audio-to-Text solver that leverages the Whisper model (the lightest and fastest version).

How it works: It switches to the audio challenge and uses the lightweight Whisper model to transcribe the prompt instantly.

Performance: It consistently achieves a 99% success rate on the first attempt, with the remaining 1% resolved on the second.

Latency: The transcription takes only 1.0 to 1.5 seconds, making it extremely fast for large-scale automation and scraping pipelines.

Demo:
You can watch the video: https://www.youtube.com/watch?v=MVS8HyvTkS0

Proposed Integration:
I would love to contribute this logic to SeleniumBase so other users can handle CAPTCHA blocks natively without needing external heavy dependencies.

I have the core logic ready. If this aligns with your project goals, I would be happy to share a Pull Request. Looking forward to your feedback!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions