🕷️ A lightweight, generic parallel runner for custom scrapers
Project description
spidur 🕷️
🕷️ spidur is a tiny, hackable framework for running custom scrapers in parallel.
- No business logic
- Just a base class + registry + runner
- Multiprocessing + async friendly
✨ Features
- Zero assumptions — bring your own scraper code.
- Base class for scrapers — implement 2 methods and you’re done.
- Parallel execution — run across all CPU cores.
- OSS-style — small, clean, and easy to hack.
📦 Install
pip install spidur
Or install with poetry
poetry add spidur
Quickstart
from spidur.base import Target, Scraper
from spidur.factory import ScraperFactory
from spidur.runner import Runner
class MyScraper(Scraper):
async def discover_urls(self, page, known, overwrite=False):
return ["http://example.com/1", "http://example.com/2"]
async def scrape_page(self, page, url):
return {"url": url, "data": "demo"}
async def fetch(self, known, overwrite=False):
urls = await self.discover_urls(None, known)
return [await self.scrape_page(None, u) for u in urls]
# register scraper
ScraperFactory.register("example", MyScraper)
# run
target = Target(name="example", start_url="http://example.com")
results = Runner.run([target], seen=set(), overwrite=True)
print(results)
Tests
poetry install
poetry run pytest
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
spidur-0.1.0.tar.gz
(3.2 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file spidur-0.1.0.tar.gz.
File metadata
- Download URL: spidur-0.1.0.tar.gz
- Upload date:
- Size: 3.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.2.0 CPython/3.12.6 Darwin/24.5.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ed194e56f8564326012081b594e8ae6ab50035ed8224f0047995393b833bd272
|
|
| MD5 |
651100fff692a8ee30b09af87c8f81b4
|
|
| BLAKE2b-256 |
8e32ada10c27166c2c9ceb7a80ec59c9dfc701cfa37896bcad63dfcfb7df2d33
|
File details
Details for the file spidur-0.1.0-py3-none-any.whl.
File metadata
- Download URL: spidur-0.1.0-py3-none-any.whl
- Upload date:
- Size: 4.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.2.0 CPython/3.12.6 Darwin/24.5.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8f9f12f03fb99095899f4ab99d4b72835c69ab91dcd30333b6903257d45380a9
|
|
| MD5 |
c2e0e6600f016b0aa66376d76f3e4cc5
|
|
| BLAKE2b-256 |
89e3a24976d6d93b9738a1628971abc60a23a109f2f675b4f9d5e1cc20ebaca0
|