Skip to main content

Neural net CAPTCHA solver. MobileNetV2 + OpenCV grid splitting. CLI and Python API. Built for the hell of it.

Project description

captcha-solver-ai

Neural net CAPTCHA solver. MobileNetV2 + OpenCV. Built for the hell of it.

Takes a reCAPTCHA image grid, splits it into cells, classifies each cell with a MobileNetV2 ImageNet classifier, and tells you which ones match the prompt. Also works live on Playwright pages.

Install

pip install captcha-solver-ai

With Playwright support (for solving CAPTCHAs on live pages):

pip install captcha-solver-ai[browser]
playwright install chromium

CLI

# Solve a CAPTCHA grid image
captcha-solver solve captcha.png --prompt "traffic lights"
captcha-solver solve captcha.png --prompt "buses" --grid 4 --verbose

# Classify any image
captcha-solver classify photo.png --top 10

# Pre-download the model (~13MB, auto-downloads on first use anyway)
captcha-solver download-model

Output:

Prompt: traffic lights
Grid:   3x3
Match:  [0, 3, 6]

[X] [ ] [ ]
[X] [ ] [ ]
[X] [ ] [ ]

Found 3 matching cell(s)

Python API

from captcha_solver import CaptchaSolver

solver = CaptchaSolver()

# From an image file
result = solver.solve_file("captcha.png", prompt="Select all images with traffic lights")
print(result.matching_cells)  # [0, 3, 6]
print(result.grid_display())

# From a numpy array (OpenCV)
import cv2
img = cv2.imread("captcha.png")
result = solver.solve(img, prompt="buses", grid_size=3)

# From raw bytes
with open("captcha.png", "rb") as f:
    result = solver.solve_bytes(f.read(), prompt="bicycles")

# Check result
if result.solved:
    print(f"Found matches at cells: {result.matching_cells}")

Lower-level API

from captcha_solver import split_grid, classify_cells, classify_image
import cv2

# Split a grid image into cells
img = cv2.imread("captcha.png")
cells = split_grid(img, grid_size=3)  # returns 9 cell images

# Classify cells against a prompt
results = classify_cells(cells, prompt="traffic lights")
for r in results:
    print(f"Cell {r['index']}: match={r['match']}, confidence={r['target_max_prob']:.1%}")

# Classify a single image (raw ImageNet predictions)
preds = classify_image(img, top_k=5)
for class_idx, prob in preds:
    print(f"Class {class_idx}: {prob:.1%}")

Playwright (live CAPTCHA solving)

from playwright.async_api import async_playwright
from captcha_solver import CaptchaSolver

solver = CaptchaSolver()

async with async_playwright() as pw:
    browser = await pw.chromium.launch(headless=True)
    page = await browser.new_page()
    await page.goto("https://some-page-with-recaptcha.com")

    solved = await solver.solve_on_page(page)
    if solved:
        print("CAPTCHA solved!")

The solve_on_page method handles the full flow: clicks the checkbox with human-like mouse movement, and if Google serves an image challenge it screenshots the grid, classifies each cell, clicks the matches, and hits verify.

Supported CAPTCHA categories

22 categories mapped to ImageNet classes:

traffic lights, buses, bicycles, motorcycles, cars, taxis, bridges, boats, ships, airplanes, trains, trucks, fire hydrants, parking meters, stairs, mountains, palm trees, tractors

Will it solve every CAPTCHA? No. Google sometimes serves categories that don't map well to ImageNet, and image challenges can require multiple rounds. But it handles the common ones.

How it works

  1. MobileNetV2 (pre-trained on ImageNet, 1000 classes) runs as an ONNX model (~13MB)
  2. OpenCV splits the CAPTCHA grid into individual cells
  3. Each cell is resized to 224x224, normalized, and fed through the network
  4. Top-10 predictions are checked against a mapping of CAPTCHA keywords to ImageNet class indices
  5. Cells where a target class appears in the top-10 or exceeds 5% probability are marked as matches

The model auto-downloads on first use and is cached at ~/.captcha_solver/.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

captcha_solver_ai-0.1.1.tar.gz (1.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

captcha_solver_ai-0.1.1-py3-none-any.whl (9.7 kB view details)

Uploaded Python 3

File details

Details for the file captcha_solver_ai-0.1.1.tar.gz.

File metadata

  • Download URL: captcha_solver_ai-0.1.1.tar.gz
  • Upload date:
  • Size: 1.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for captcha_solver_ai-0.1.1.tar.gz
Algorithm Hash digest
SHA256 86082cb03550e58ddee5f6cd360886e5aff6608f055ae09a5a4581f8551c3f71
MD5 264ac336fcd5db414266671ef2c64d86
BLAKE2b-256 538f48bad081599d27189e06057e1e1ad6d77d9103085ddbc43a114d3bd287d9

See more details on using hashes here.

File details

Details for the file captcha_solver_ai-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for captcha_solver_ai-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 19caddd683b663457866a77f02a11ddd40a4132ff7899544573aa3fde2fe8e07
MD5 da03ff4abadaae5287881a74e221f821
BLAKE2b-256 5ef9c6d6649ac46faaad446cffc860280181177230b5902981c73327ae377c0c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page