Skip to main content

Open-source web crawling framework with specialized crawler agents.

Project description

WebCreeper: Crawl. Extract. Discover.

WebCreeper is an open-source crawling framework built around agents. Each agent is a crawler specialized for a specific task, and all agents share core crawling primitives from creeper_core.

Agent Model

  • Agents are modular crawler units with clear responsibilities.
  • Each agent can expose its own settings and extraction behavior.
  • Shared infrastructure (robots handling, retries, rate limits, hooks, policies) lives in the core.

This makes it easy to:

  • Start simple with one agent.
  • Add new agents without rewriting crawl infrastructure.
  • Compose custom extraction logic through callbacks and hooks.

Agent Selection

Use this table to choose the right agent.

Agent When To Use It Documentation
Atlas Crawl website structure, build link graphs, and run custom per-page extraction callbacks/hooks. docs/agents/atlas.md

All agent-specific setup and code examples are documented in each agent page.

Documentation

  • Installation and project docs index: docs/README.md
  • Agent docs index: docs/agents/README.md

License

MIT. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

webcreeper-0.2.0.tar.gz (16.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

webcreeper-0.2.0-py3-none-any.whl (15.4 kB view details)

Uploaded Python 3

File details

Details for the file webcreeper-0.2.0.tar.gz.

File metadata

  • Download URL: webcreeper-0.2.0.tar.gz
  • Upload date:
  • Size: 16.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for webcreeper-0.2.0.tar.gz
Algorithm Hash digest
SHA256 5870a11da60907df84632a5c90b7da6fc566ea668a475c6d5eb76cd961314ca7
MD5 93fe2f78f1db688450ab2780665db9c7
BLAKE2b-256 a8ee8b061cc315257328a06f31aab919ff2c755f00ba32db42cf92b293270c82

See more details on using hashes here.

Provenance

The following attestation bundles were made for webcreeper-0.2.0.tar.gz:

Publisher: publish.yml on Y-Elsayed/WebCreeper

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file webcreeper-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: webcreeper-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 15.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for webcreeper-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e01b4ac118db3fa7166c68de5d7b097fa7bd07ef66cf253d95646ada12a623af
MD5 3b1aa063a1ebdbd807993a91e0438422
BLAKE2b-256 8664f9918d5b66c7abe89256920b8193807bf60050f92af186d6131919cd2912

See more details on using hashes here.

Provenance

The following attestation bundles were made for webcreeper-0.2.0-py3-none-any.whl:

Publisher: publish.yml on Y-Elsayed/WebCreeper

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page