Skip to main content

Browserless crawler base: curl_cffi transport, TLS/AIA, identity, proxy, captcha, BaseCrawler/BaseParser.

Project description

crawlerkit-core

PyPI version Python versions CI License: MIT

A standalone, browserless crawler base (crawlerkit.core): fingerprinted curl_cffi transport, per-host TLS with AIA repair + .pfx client certs, browserforge identity (UA snapped to the impersonate target), proxy providers, a pluggable captcha registry, an error taxonomy with retry+rotation, and the BaseCrawler.flow() / BaseParser.parse() hooks. Zero non-PyPI dependencies — parse() returns your own type, not one the library dictates.

Install

pip install crawlerkit-core

Use

from crawlerkit.core import BaseCrawler, BaseParser, RawResponse, Transport, Profile
from crawlerkit.core.captcha import default_registry, McaptchaPowSolver, mcaptcha_hint
from crawlerkit.core.proxy import StaticProxyProvider, BrightDataProxyProvider
from crawlerkit.core.errors import BlockedError, TransientError, raise_for_block

HTTP is curl_cffi only — requests is never used. Deps: curl_cffi, browserforge, cryptography, certifi, selectolax, lxml, beautifulsoup4, weasyprint, structlog, tenacity.

Logging

Logging is opt-in and off by default — crawlerkit emits nothing unless you ask. Set enable_logs = True on your crawler or parser to turn on structlog events:

class MyCrawler(BaseCrawler):
    enable_logs = True   # default is False

Build a crawler: GETTING_STARTED.md. Run the demos: examples/ (quotes.py — a full crawl+parse; fingerprint_demo.py — identity proof). Reference: docs/ (identity, transport-tls, proxy, captcha, cracking-govbr-turnstile, errors, api). License: MIT.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crawlerkit_core-0.2.0.tar.gz (22.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

crawlerkit_core-0.2.0-py3-none-any.whl (25.5 kB view details)

Uploaded Python 3

File details

Details for the file crawlerkit_core-0.2.0.tar.gz.

File metadata

  • Download URL: crawlerkit_core-0.2.0.tar.gz
  • Upload date:
  • Size: 22.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for crawlerkit_core-0.2.0.tar.gz
Algorithm Hash digest
SHA256 ecca5dac5baaf94d51348217d9e4404ebacd557f266add06a970041de4a20a51
MD5 f73e6f7621c605b873b1a3c8396b8a0c
BLAKE2b-256 2d633707aa8cf1c50123271ab3f321a4935848615fe604d67c5035efc66c6609

See more details on using hashes here.

File details

Details for the file crawlerkit_core-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for crawlerkit_core-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9e4b7bbb11f37dea653f4c179cd54bf5978d8fd0443692aa2481cd8106ea641c
MD5 e6d197e13dff441c359add61d8f2c2c3
BLAKE2b-256 cfc6e965209b723ae81586cf5f0290e613fad13bcb079bb6f72f7258f67df789

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page