Neural net CAPTCHA solver. MobileNetV2 + OpenCV grid splitting. CLI and Python API. Built for the hell of it.
Project description
captcha-solver-ai
Neural net CAPTCHA solver. MobileNetV2 + OpenCV. Built for the hell of it.
Takes a reCAPTCHA image grid, splits it into cells, classifies each cell with a MobileNetV2 ImageNet classifier, and tells you which ones match the prompt. Also works live on Playwright pages.
Install
pip install captcha-solver-ai
With Playwright support (for solving CAPTCHAs on live pages):
pip install captcha-solver-ai[browser]
playwright install chromium
CLI
# Solve a CAPTCHA grid image
captcha-solver solve captcha.png --prompt "traffic lights"
captcha-solver solve captcha.png --prompt "buses" --grid 4 --verbose
# Classify any image
captcha-solver classify photo.png --top 10
# Pre-download the model (~13MB, auto-downloads on first use anyway)
captcha-solver download-model
Output:
Prompt: traffic lights
Grid: 3x3
Match: [0, 3, 6]
[X] [ ] [ ]
[X] [ ] [ ]
[X] [ ] [ ]
Found 3 matching cell(s)
Python API
from captcha_solver import CaptchaSolver
solver = CaptchaSolver()
# From an image file
result = solver.solve_file("captcha.png", prompt="Select all images with traffic lights")
print(result.matching_cells) # [0, 3, 6]
print(result.grid_display())
# From a numpy array (OpenCV)
import cv2
img = cv2.imread("captcha.png")
result = solver.solve(img, prompt="buses", grid_size=3)
# From raw bytes
with open("captcha.png", "rb") as f:
result = solver.solve_bytes(f.read(), prompt="bicycles")
# Check result
if result.solved:
print(f"Found matches at cells: {result.matching_cells}")
Lower-level API
from captcha_solver import split_grid, classify_cells, classify_image
import cv2
# Split a grid image into cells
img = cv2.imread("captcha.png")
cells = split_grid(img, grid_size=3) # returns 9 cell images
# Classify cells against a prompt
results = classify_cells(cells, prompt="traffic lights")
for r in results:
print(f"Cell {r['index']}: match={r['match']}, confidence={r['target_max_prob']:.1%}")
# Classify a single image (raw ImageNet predictions)
preds = classify_image(img, top_k=5)
for class_idx, prob in preds:
print(f"Class {class_idx}: {prob:.1%}")
Playwright (live CAPTCHA solving)
from playwright.async_api import async_playwright
from captcha_solver import CaptchaSolver
solver = CaptchaSolver()
async with async_playwright() as pw:
browser = await pw.chromium.launch(headless=True)
page = await browser.new_page()
await page.goto("https://some-page-with-recaptcha.com")
solved = await solver.solve_on_page(page)
if solved:
print("CAPTCHA solved!")
The solve_on_page method handles the full flow: clicks the checkbox with human-like mouse movement, and if Google serves an image challenge it screenshots the grid, classifies each cell, clicks the matches, and hits verify.
Supported CAPTCHA categories
22 categories mapped to ImageNet classes:
traffic lights, buses, bicycles, motorcycles, cars, taxis, bridges, boats, ships, airplanes, trains, trucks, fire hydrants, parking meters, stairs, mountains, palm trees, tractors
Will it solve every CAPTCHA? No. Google sometimes serves categories that don't map well to ImageNet, and image challenges can require multiple rounds. But it handles the common ones.
How it works
- MobileNetV2 (pre-trained on ImageNet, 1000 classes) runs as an ONNX model (~13MB)
- OpenCV splits the CAPTCHA grid into individual cells
- Each cell is resized to 224x224, normalized, and fed through the network
- Top-10 predictions are checked against a mapping of CAPTCHA keywords to ImageNet class indices
- Cells where a target class appears in the top-10 or exceeds 5% probability are marked as matches
The model auto-downloads on first use and is cached at ~/.captcha_solver/.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file captcha_solver_ai-0.1.1.tar.gz.
File metadata
- Download URL: captcha_solver_ai-0.1.1.tar.gz
- Upload date:
- Size: 1.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
86082cb03550e58ddee5f6cd360886e5aff6608f055ae09a5a4581f8551c3f71
|
|
| MD5 |
264ac336fcd5db414266671ef2c64d86
|
|
| BLAKE2b-256 |
538f48bad081599d27189e06057e1e1ad6d77d9103085ddbc43a114d3bd287d9
|
File details
Details for the file captcha_solver_ai-0.1.1-py3-none-any.whl.
File metadata
- Download URL: captcha_solver_ai-0.1.1-py3-none-any.whl
- Upload date:
- Size: 9.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
19caddd683b663457866a77f02a11ddd40a4132ff7899544573aa3fde2fe8e07
|
|
| MD5 |
da03ff4abadaae5287881a74e221f821
|
|
| BLAKE2b-256 |
5ef9c6d6649ac46faaad446cffc860280181177230b5902981c73327ae377c0c
|