Skip to main content

Browserless crawler base: curl_cffi transport, TLS/AIA, identity, proxy, captcha, BaseCrawler/BaseParser.

Project description

crawlerkit-core

PyPI version Python versions CI License: MIT

A standalone, browserless crawler base (crawlerkit.core): fingerprinted curl_cffi transport, per-host TLS with AIA repair + .pfx client certs, browserforge identity (UA snapped to the impersonate target), proxy providers, a pluggable captcha registry, an error taxonomy with retry+rotation, and the BaseCrawler.flow() / BaseParser.parse() hooks. Zero non-PyPI dependencies — parse() returns your own type, not one the library dictates.

Install

pip install crawlerkit-core

Use

from crawlerkit.core import BaseCrawler, BaseParser, RawResponse, Transport, Profile
from crawlerkit.core.captcha import default_registry, McaptchaPowSolver, mcaptcha_hint
from crawlerkit.core.proxy import StaticProxyProvider, BrightDataProxyProvider
from crawlerkit.core.errors import BlockedError, TransientError, raise_for_block

HTTP is curl_cffi only — requests is never used. Deps: curl_cffi, browserforge, cryptography, certifi, selectolax, lxml, beautifulsoup4, weasyprint, structlog, tenacity.

Build a crawler: GETTING_STARTED.md. Run the demos: examples/ (quotes.py — a full crawl+parse; fingerprint_demo.py — identity proof). Reference: docs/ (identity, transport-tls, proxy, captcha, cracking-govbr-turnstile, errors, api). License: MIT.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crawlerkit_core-0.1.0.tar.gz (21.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

crawlerkit_core-0.1.0-py3-none-any.whl (24.6 kB view details)

Uploaded Python 3

File details

Details for the file crawlerkit_core-0.1.0.tar.gz.

File metadata

  • Download URL: crawlerkit_core-0.1.0.tar.gz
  • Upload date:
  • Size: 21.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for crawlerkit_core-0.1.0.tar.gz
Algorithm Hash digest
SHA256 1b88563933978ce92e797173391e40e2bcb1b26dc4c6e14d39356a49d0791e71
MD5 a2cd8f96248e3d5f8dd23ce2e117fa4e
BLAKE2b-256 32fd63ba393fde2cd06dba60ddf93fd8ce30ef2dc5870984e4bf33a1cb3682e1

See more details on using hashes here.

File details

Details for the file crawlerkit_core-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for crawlerkit_core-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3a48de4472cf52bad79c3897310814586bef4a3b95ffed17405d1daedd305253
MD5 d47a1037dfc9a6812e534c96a2e5f54c
BLAKE2b-256 977bc96c8af31eec40155f7f38e65134f2b20b974df887735aa20d88264f10cd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page