Industrial-grade slider CAPTCHA recognition system based on deep learning

These details have not been verified by PyPI

Project links

Project description

Industrial-Grade Slider CAPTCHA Recognition System

A high-precision slider CAPTCHA recognition solution based on deep learning, utilizing an improved CenterNet architecture to achieve 80% accuracy on real CAPTCHA datasets.

📑 Table of Contents

📋 Project Overview
- 🎯 Core Features
- 🖼️ Recognition Performance Demo
🚀 Quick Start
📊 Data Generation Process
🏗️ Network Architecture
📈 Performance Metrics
🛠️ Main Features
⚠️ Disclaimer
📁 Project Structure
🔧 Tech Stack
📞 Contact

📋 Project Overview

This project is an industrial-grade slider CAPTCHA recognition system that overcomes the accuracy bottleneck of traditional template matching algorithms through deep learning methods. The system is trained on over 300,000 synthetic CAPTCHA images, employing a lightweight CNN architecture that ensures high precision while maintaining real-time inference capabilities.

🎯 Core Features

High-Precision Recognition: 80% accuracy with 7px error tolerance, 73% with 5px on real CAPTCHAs
Real-Time Inference: GPU inference 1.30ms (RTX 5090), CPU inference 5.21ms (AMD Ryzen 9 9950X), supporting real-time applications
Lightweight Architecture: Only 3.5M parameters, model file approximately 36MB
Industrial-Grade Design: Complete data generation, training, and evaluation pipeline
Sub-pixel Precision: Achieves sub-pixel level localization using CenterNet offset mechanism

🖼️ Recognition Performance Demo

Real CAPTCHA Dataset Recognition Results

Real Dataset Recognition Results

Figure: Recognition results on real website CAPTCHAs, with red circles marking gap positions and blue circles marking slider positions

Test Dataset Recognition Results

Figure: Recognition results on synthetic test set, demonstrating the model's adaptability to different shapes and lighting conditions

🚀 Quick Start

Requirements

# Python 3.8+
pip install -r requirements.txt

Installation

Install via pip

pip install sider-captcha-solver

Basic Usage

After pip installation, you can directly import and use:

1. Basic Prediction - Get Sliding Distance

from sider_captcha_solver import CaptchaPredictor

# Initialize predictor
predictor = CaptchaPredictor(
    model_path='best',  # Use built-in best model, or specify custom model path
    device='auto'       # Auto-select GPU/CPU
)

# Predict single image
result = predictor.predict('path/to/captcha.png')

# Get sliding distance
if result['slider_x'] and result['gap_x']:
    sliding_distance = result['gap_x'] - result['slider_x']
    print(f"Sliding distance: {sliding_distance:.2f} px")
    print(f"Gap position: ({result['gap_x']:.2f}, {result['gap_y']:.2f})")
    print(f"Slider position: ({result['slider_x']:.2f}, {result['slider_y']:.2f})")
else:
    print("Detection failed")

2. Batch Processing - Process Multiple Images

from sider_captcha_solver import CaptchaPredictor
import glob
import os

# Initialize predictor
predictor = CaptchaPredictor(model_path='best', device='auto')

# Batch process CAPTCHAs
captcha_folder = 'path/to/captchas'

for img_path in glob.glob(os.path.join(captcha_folder, '*.png')):
    result = predictor.predict(img_path)

    if result['slider_x'] and result['gap_x']:
        distance = result['gap_x'] - result['slider_x']
        confidence = (result['slider_confidence'] + result['gap_confidence']) / 2
        print(f"{os.path.basename(img_path)}: Slide {distance:.1f} px (Confidence: {confidence:.3f})")
    else:
        print(f"{os.path.basename(img_path)}: Detection failed")

3. Visualization and Debugging

from sider_captcha_solver import CaptchaPredictor
import matplotlib.pyplot as plt

# Initialize predictor
predictor = CaptchaPredictor(model_path='best', device='auto')

# Test image path
test_image = 'path/to/captcha.png'

# Generate and save prediction visualization
predictor.visualize_prediction(
    test_image,
    save_path='prediction_result.png',  # Save path
    show=True                           # Display window
)

# Generate heatmap visualization (view model internal activations)
predictor.visualize_heatmaps(
    test_image,
    save_path='heatmap_result.png',    # Save 4-panel heatmap
    show=True
)

# Compare different threshold effects
thresholds = [0.0, 0.1, 0.3, 0.5]
fig, axes = plt.subplots(1, len(thresholds), figsize=(15, 4))

for idx, threshold in enumerate(thresholds):
    # Create predictor with different thresholds
    pred = CaptchaPredictor(model_path='best', hm_threshold=threshold)
    result = pred.predict(test_image)

    # Visualize to subplot
    ax = axes[idx]
    img = plt.imread(test_image)
    ax.imshow(img)
    ax.set_title(f'Threshold={threshold}')

    if result['slider_x'] and result['gap_x']:
        ax.plot(result['slider_x'], result['slider_y'], 'bo', markersize=10)
        ax.plot(result['gap_x'], result['gap_y'], 'ro', markersize=10)
    ax.axis('off')

plt.tight_layout()
plt.savefig('threshold_comparison.png')
plt.show()

4. Complete Production Environment Example

from sider_captcha_solver import CaptchaPredictor
import logging
import time
from typing import Optional, Dict

# Configure logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

class CaptchaSolver:
    """Production environment CAPTCHA solver wrapper"""

    def __init__(self, model_path: str = 'best', device: str = 'auto'):
        self.predictor = CaptchaPredictor(
            model_path=model_path,
            device=device,
            hm_threshold=0.1  # Balance accuracy and recall
        )
        logger.info(f"CAPTCHA solver initialized, device: {device}")

    def solve(self, image_path: str, max_retries: int = 3) -> Optional[Dict]:
        """
        Solve CAPTCHA with retry mechanism

        Args:
            image_path: CAPTCHA image path
            max_retries: Maximum retry attempts

        Returns:
            Dictionary containing sliding distance and confidence, None on failure
        """
        for attempt in range(max_retries):
            try:
                # Record start time
                start_time = time.time()

                # Execute prediction
                result = self.predictor.predict(image_path)

                # Calculate elapsed time
                elapsed_time = (time.time() - start_time) * 1000

                # Check result validity
                if result['slider_x'] and result['gap_x']:
                    sliding_distance = result['gap_x'] - result['slider_x']
                    confidence = (result['slider_confidence'] + result['gap_confidence']) / 2

                    logger.info(f"Solve success: distance={sliding_distance:.1f}px, "
                              f"confidence={confidence:.3f}, time={elapsed_time:.1f}ms")

                    return {
                        'success': True,
                        'sliding_distance': sliding_distance,
                        'confidence': confidence,
                        'elapsed_ms': elapsed_time,
                        'details': result
                    }
                else:
                    logger.warning(f"Attempt {attempt + 1} failed: no valid result detected")

            except Exception as e:
                logger.error(f"Attempt {attempt + 1} error: {str(e)}")

            # Brief delay if not last attempt
            if attempt < max_retries - 1:
                time.sleep(0.1)

        logger.error(f"Solve failed: reached maximum retries {max_retries}")
        return None

# Usage example
if __name__ == "__main__":
    solver = CaptchaSolver()

    # Solve single CAPTCHA
    result = solver.solve('path/to/captcha.png')

    if result and result['success']:
        print(f"Sliding distance: {result['sliding_distance']:.1f} px")
        print(f"Confidence: {result['confidence']:.3f}")
        print(f"Processing time: {result['elapsed_ms']:.1f} ms")
    else:
        print("CAPTCHA solving failed")

Advanced Features

1. Custom Model and Configuration

from sider_captcha_solver import CaptchaPredictor
import torch

# Use your own trained model
custom_predictor = CaptchaPredictor(
    model_path='path/to/your_trained_model.pth',
    device='cuda:0',    # Specify GPU
    hm_threshold=0.15   # Adjust based on model characteristics
)

# Check model info
if torch.cuda.is_available():
    print(f"Using GPU: {torch.cuda.get_device_name(0)}")
    print(f"VRAM usage: {torch.cuda.memory_allocated(0) / 1024**2:.1f} MB")

# Predict
result = custom_predictor.predict('captcha.png')

2. Performance Benchmarking

from sider_captcha_solver import CaptchaPredictor
import time
import numpy as np

# Initialize predictor
predictor = CaptchaPredictor(model_path='best', device='auto')

# Test image list
test_images = ['captcha1.png', 'captcha2.png', 'captcha3.png']

# Warm up (first inference is slower)
_ = predictor.predict(test_images[0])

# Performance test
times = []
for _ in range(10):  # Test each image 10 times
    for img_path in test_images:
        start = time.time()
        result = predictor.predict(img_path)
        elapsed = (time.time() - start) * 1000  # Convert to milliseconds
        times.append(elapsed)

# Statistics
print(f"Average inference time: {np.mean(times):.2f} ms")
print(f"Fastest: {np.min(times):.2f} ms")
print(f"Slowest: {np.max(times):.2f} ms")
print(f"Std deviation: {np.std(times):.2f} ms")
print(f"FPS: {1000 / np.mean(times):.1f}")

📊 Data Generation Process

1. Data Collection

Downloaded high-quality images from Pixabay across 10 categories as backgrounds: Minecraft, Pixel Food, Block Public Square, Block Illustration, Backgrounds, Buildings, Nature, Anime Cityscape, Abstract Geometric Art, etc. Up to 200 images per category, totaling approximately 2,000 raw images.

2. CAPTCHA Generation Logic

Raw Images (2000+) → Resize(320×160) → Puzzle Generation
                                        ↓
                              11 shapes × 3 sizes × 4 positions
                                        ↓
                              132 CAPTCHAs per original image
                                        ↓
                              Total: 354,024 training images generated

Puzzle Shape Design:

5 regular puzzle shapes (combinations of convex, concave, and flat edges)
6 special shapes (circle, square, triangle, hexagon, pentagon, star)

Random Parameters:

Puzzle size: 40-70 pixels (3 random sizes)
Position distribution: x-axis beyond slider width + 10px to avoid overlap
Lighting effects: Randomly added lighting variations for robustness

3. Dataset Split

Training set: 90% (split by original images to avoid data leakage)
Test set: 10% (Test Set 1)
Real CAPTCHA test set: 100 NetEase Yidun CAPTCHAs (Test Set 2)

🏗️ Network Architecture

Model Structure

Input (3×160×320)
    │
    ├─ Stem Conv (3×3, stride=2) ──────→ 32×80×160
    │
    ├─ ResBlock Stage-1 (×2, stride=2) ─→ 64×40×80
    │
    ├─ ResBlock Stage-2 (×2, stride=2) ─→ 128×20×40
    │
    ├─ ResBlock Stage-3 (×2, stride=2) ─→ 256×10×20
    │
    ├─ Neck (1×1 Conv) ─────────────────→ 128×10×20
    │
    ├─ UpConv-1 (3×3, stride=2) ────────→ 64×20×40
    │
    ├─ UpConv-2 (3×3, stride=2) ────────→ 64×40×80
    │
    └─┬─ Gap Detection Head ────┐
        │   ├─ Heatmap (1×40×80)   │
        │   └─ Offset (2×40×80)    │
        │                              │
        └─ Piece Detection Head ───┤
             ├─ Heatmap (1×40×80)   │
             └─ Offset (2×40×80)    │

Key Design Elements

Backbone: ResNet18-Lite, removed global pooling and fully connected layers
Detection Heads: Dual-branch CenterNet design, detecting gap and slider centers separately
Loss Function: Focal Loss (heatmap) + L1 Loss (offset regression)
Downsampling Rate: 4x, output resolution 80×40
Activation: ReLU (except output layers)
Normalization: BatchNorm

Model Parameters

Component	Parameters	Description
Backbone	~3.0M	ResNet18-Lite
Neck + UpConv	~0.3M	Feature fusion
Detection Heads	~0.2M	Dual-branch heads
Total	~3.5M	FP32 ~36MB

📈 Performance Metrics

Accuracy (Based on Sliding Distance Error)

Dataset	5px Threshold	7px Threshold	Best Epoch
Test Set (Synthetic)	99.4%	99.4%	16
Real CAPTCHAs	73%	80%	15/16

Inference Performance

Hardware	Inference Time	FPS	Batch (×32)
RTX 5090	1.30ms	771	11.31ms
AMD Ryzen 9 9950X	5.21ms	192	144.89ms

Mean Absolute Error (MAE)

Test set: Slider 0.30px, Gap 1.14px
Real CAPTCHAs: Slider 2.84px, Gap 9.98px

🛠️ Main Features

1. Data Generation

Auto-download Pixabay images
Batch generate slider CAPTCHAs
Support multiple puzzle shapes

2. Model Training

Automatic learning rate scheduling
Training process visualization

3. Inference Deployment

Support batch prediction
REST API interface
Heatmap visualization support

4. Evaluation Analysis

Training curve analysis

⚠️ Disclaimer

This project is for learning and research purposes only. Commercial or illegal use is prohibited.

This project aims to promote academic research in computer vision and deep learning
Users must comply with relevant laws and regulations, and must not use this project to bypass website security mechanisms
Any legal liability arising from the use of this project shall be borne by the user
Please do not use this project for any behavior that may harm others' interests

📁 Project Structure

ider_CAPTCHA_Solver/
│
├── configs/                       # Configuration files
│   └── config.yaml               # Project configuration
│
├── data/                          # Data directory
│   ├── captchas/                  # Generated CAPTCHAs (354,024 images)
│   │   └── Pic*.png              # Format: Pic{XXXX}_Bgx{X}Bgy{Y}_Sdx{X}Sdy{Y}_{hash}.png
│   ├── raw_images/                # Raw images (2000 images)
│   ├── real_captchas/             # Real CAPTCHA test set
│   │   └── annotated/             # Annotated data (100 images)
│   ├── annotations.json           # Training set annotations
│   ├── test_annotations.json      # Test set annotations
│   ├── generation_stats.json      # Generation statistics
│   └── dataset_split_stats.json   # Dataset split statistics
│
├── logs/                          # Log files
│   ├── training_accuracy_curves_all.png    # Training accuracy curves
│   ├── accuracy_comparison.png             # Test set vs real data comparison
│   ├── training_analysis_report.txt        # Training analysis report
│   ├── training_accuracy_results.csv       # Accuracy CSV data
│   ├── training_accuracy_results.json      # Accuracy JSON data
│   ├── evaluation_*.log                    # Evaluation logs
│   ├── training_log.txt                    # Training log
│   └── benchmark_results_*.json            # Performance benchmark results
│
├── results/                       # Evaluation results
│   └── best_model_evaluation/     # Best model evaluation
│       ├── test_dataset/          # Test set results
│       │   ├── evaluation_results.json     # Evaluation metrics
│       │   └── visualizations/             # Visualizations (100 images)
│       ├── real_captchas/         # Real CAPTCHA results
│       │   ├── evaluation_results.json     # Evaluation metrics
│       │   └── visualizations/             # Visualizations (50 images)
│       └── summary_report.json    # Summary report
│
├── scripts/                       # Core scripts
│   ├── annotation/                # Annotation tools
│   │   ├── annotate_captchas_matplotlib.py  # Matplotlib annotation UI
│   │   └── annotate_captchas_web.py         # Web annotation UI
│   │
│   ├── data_generation/           # Data generation scripts
│   │   ├── geometry_generator.py  # Geometry shape generator
│   │   └── puzzle_background_generator.py   # Puzzle background generator
│   │
│   ├── training/                  # Training related
│   │   ├── train.py              # Main training script
│   │   ├── dataset.py            # PyTorch dataset class
│   │   └── analyze_training.py   # Training analysis tool
│   │
│   ├── inference/                 # Inference related
│   │   └── predict.py            # Prediction interface (CaptchaPredictor class)
│   │
│   ├── evaluation/                # Evaluation scripts
│   │   └── evaluate_model.py      # Comprehensive evaluation tool (multi-mode support)
│   │
│   ├── download_images.py         # Pixabay image downloader
│   ├── generate_captchas.py       # Batch CAPTCHA generator
│   └── split_dataset.py           # Dataset splitting script
│
├── src/                          # Source code
│   ├── __init__.py
│   │
│   ├── checkpoints/               # Model weights
│   │   ├── best_model.pth         # Best model (highest accuracy)
│   │   ├── checkpoint_epoch_0001.pth ~ checkpoint_epoch_0020.pth  # Epoch checkpoints
│   │   ├── latest_checkpoint.pth  # Latest checkpoint
│   │   ├── training_log_*.txt     # Training logs
│   │   └── logs/                  # TensorBoard logs
│   │       └── events.out.tfevents.*
│   │
│   ├── captcha_generator/         # CAPTCHA generation module
│   │   ├── __init__.py
│   │   ├── batch_generator.py    # Batch generator
│   │   ├── lighting_effects.py   # Lighting effects
│   │   ├── simple_puzzle_generator.py  # Puzzle generator
│   │   └── slider_effects.py     # Slider effects
│   │
│   ├── data_collection/           # Data collection module
│   │   ├── __init__.py
│   │   └── pixabay_downloader.py # Pixabay downloader
│   │
│   ├── models/                   # Model definitions
│   │   ├── __init__.py
│   │   ├── captcha_solver.py     # CaptchaSolver main model
│   │   ├── centernet_heads.py    # CenterNet detection heads
│   │   ├── losses.py             # Loss functions (Focal Loss + L1)
│   │   └── resnet18_lite.py      # ResNet18-Lite backbone
│   │
│   └── utils/                    # Utility functions
│       ├── __init__.py
│       └── logger.py             # Logging utilities
│
├── tests/                        # Test scripts
│   ├── benchmark_inference.py     # Inference performance benchmark
│   ├── merge_real_captchas.py     # Real CAPTCHA merge tool
│   ├── test_all_puzzle_shapes.py  # All puzzle shapes test
│   ├── test_captcha_generation.py # CAPTCHA generation test
│   ├── test_darkness_levels.py    # Brightness level test
│   ├── test_distance_error_visualization.py  # Distance error visualization
│   ├── test_generate_captchas.py  # Generation function test
│   ├── test_model_architecture.py # Model architecture test
│   ├── test_real_captchas.py     # Real CAPTCHA test
│   └── test_slider_effects.py    # Slider effects test
│
├── outputs/                      # Test output files
│   └── *.png                     # Various test result images
│
├── api_example.py                # API usage examples
├── requirements.txt              # Dependencies
├── README.md                     # English documentation
└── README_zh.md                  # Chinese documentation

🔧 Tech Stack

Deep Learning Framework: PyTorch 2.0+
Image Processing: OpenCV, Pillow
Data Processing: NumPy, Pandas
Visualization: Matplotlib, Seaborn
Web Framework: FastAPI
Others: tqdm, requests, psutil

📞 Contact

For questions or suggestions, please submit an Issue or Pull Request.

This project is licensed under MIT License, for learning and research purposes only

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.0.3

Jul 27, 2025

1.0.2

Jul 23, 2025

This version

1.0.1

Jul 23, 2025

1.0.0

Jul 23, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sider_captcha_solver-1.0.1.tar.gz (33.6 MB view details)

Uploaded Jul 23, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

sider_captcha_solver-1.0.1-py3-none-any.whl (33.6 MB view details)

Uploaded Jul 23, 2025 Python 3

File details

Details for the file sider_captcha_solver-1.0.1.tar.gz.

File metadata

Download URL: sider_captcha_solver-1.0.1.tar.gz
Upload date: Jul 23, 2025
Size: 33.6 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.9

File hashes

Hashes for sider_captcha_solver-1.0.1.tar.gz
Algorithm	Hash digest
SHA256	`584c20aee0a5b84f73793020eefc49a06bbf66d8de039583b834ebca4b9f9e9b`
MD5	`4d853cb5a1a5754261a9b23851809c4f`
BLAKE2b-256	`30861feca8ce7392687f5c0c714ce53db7310a1aa1d1cab26191158de9347741`

See more details on using hashes here.

File details

Details for the file sider_captcha_solver-1.0.1-py3-none-any.whl.

File metadata

Download URL: sider_captcha_solver-1.0.1-py3-none-any.whl
Upload date: Jul 23, 2025
Size: 33.6 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.9

File hashes

Hashes for sider_captcha_solver-1.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`035a63b4252cf0845fa2cf57e434764d917949225ffa5d58a4160f39a4abddeb`
MD5	`820ae4d99321c8af3074d23a38d670c7`
BLAKE2b-256	`72bb05366ccae2aaf0f691275e45090a7ec51bdfb369e343f6e1406a2bc33b6b`

See more details on using hashes here.

sider-captcha-solver 1.0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Industrial-Grade Slider CAPTCHA Recognition System

📑 Table of Contents

📋 Project Overview

🎯 Core Features

🖼️ Recognition Performance Demo

Real CAPTCHA Dataset Recognition Results

Test Dataset Recognition Results

🚀 Quick Start

Requirements

Installation

Install via pip

Basic Usage

1. Basic Prediction - Get Sliding Distance

2. Batch Processing - Process Multiple Images

3. Visualization and Debugging

4. Complete Production Environment Example

Advanced Features

1. Custom Model and Configuration

2. Performance Benchmarking

📊 Data Generation Process

1. Data Collection

2. CAPTCHA Generation Logic

3. Dataset Split

🏗️ Network Architecture

Model Structure

Key Design Elements

Model Parameters

📈 Performance Metrics

Accuracy (Based on Sliding Distance Error)

Inference Performance

Mean Absolute Error (MAE)

🛠️ Main Features

1. Data Generation

2. Model Training

3. Inference Deployment

4. Evaluation Analysis

⚠️ Disclaimer

📁 Project Structure

🔧 Tech Stack

📞 Contact

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes