Skip to main content

Advanced Scrapy framework with multi-engine support

Project description

Advanced Scrapy framework with multi-engine support and intelligent proxy management.

Features

  • Multi-Engine Support: HTTP (curl_cffi), Camoufox, Undetected Chrome
  • Intelligent Proxy Management: API, file, database providers with auto-rotation
  • JSON Configuration: Zero-code spider setup
  • Advanced Anti-Detection: Human-like behaviors and stealth features
  • Flexible Data Extraction: CSS, XPath, JSON, derived fields

Installation

# Basic installation
pip install crawlerforge

# With browser support
pip install crawlerforge[browser]

# With all features
pip install crawlerforge[all]

Quick Start

# Generate configuration
crawlerforge genconfig --template ecommerce --output config.json

# Run spider
crawlerforge crawl myspider -c config.json -o products.json

Example Configuration

{
  "engine": "camoufox",
  "start_url": ["https://example.com/sitemap.xml"],
  "products_list_selector": ".product",
  "fields": {
    "name": {"type": "text", "tags": [".title::text"], "required": true},
    "price": {"type": "price", "tags": [".price::text"], "required": true}
  }
}

Documentation

Visit GitHub for full documentation and examples.

License

MIT License

Commands for publication:

1. Install build tools

pip install build twine

2. Build package

python -m build

3. Upload to TestPyPI (testing)

python -m twine upload --repository testpypi dist/*

4. Upload to PyPI (production)

python -m twine upload dist/*

5. Install from PyPI

pip install crawlerforge

6. Install from GitHub (development)

pip install git+https://github.com/fabiocantone/crawlerforge.git

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crawlerforge-1.0.1.tar.gz (30.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

crawlerforge-1.0.1-py3-none-any.whl (25.8 kB view details)

Uploaded Python 3

File details

Details for the file crawlerforge-1.0.1.tar.gz.

File metadata

  • Download URL: crawlerforge-1.0.1.tar.gz
  • Upload date:
  • Size: 30.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.10

File hashes

Hashes for crawlerforge-1.0.1.tar.gz
Algorithm Hash digest
SHA256 7f4d4eb5dc0b2b15cda692879cdcf5c7fcb309335ea2a2535ff72512a6c31c98
MD5 9e2744101eae95049602773f750ab47a
BLAKE2b-256 f5628a69a1708ddb36640b0bc529723e295c4d2306b0354b3e22569bacc08092

See more details on using hashes here.

File details

Details for the file crawlerforge-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: crawlerforge-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 25.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.10

File hashes

Hashes for crawlerforge-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 18fb54a745f4f99ad4d5b836dc90bee9043de0fe72385817e1a9b1163d694252
MD5 2652f4f1d78cb00033d0b59216c02fec
BLAKE2b-256 7c541b28a74433f7d08d80086c5d733384b72c75a10e1058289861d1bc669c86

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page