The compact web crawling toolkit
Project description
myrmex
The compact web crawling toolkit.
Unlike full-featured frameworks, myrmex does not implement an entire scraping pipeline. Instead, it focuses exclusively on core crawling functionality. Higher-level scraping logic is left to the specific implementation of your scraper.
If you're looking for a complete scraping framework, consider Scrapy.
myrmex provides a minimal interface through two crawler classes — Crawler and TorCrawler — for regular HTTP crawling and Tor-based anonymous crawling, respectively.
Key Capabilities
- Asynchronous context management for automatic resource handling
- Built on
aiohttpfor HTTP requests - Executes synchronous operations using the native asyncio thread pool (non-blocking)
- Functional-style error handling via Result
- Configurable per-operation timeouts for robust request management
Installation
Install via pip:
pip install myrmex
Or using uv:
uv add myrmex
Please note that the following libraries will be installed alongside myrmex:
aiohttp– for HTTP requestsaiohttp-socks– for SOCKS5 proxy supportstem– for Tor control port integrationresult– for functional-style error handling
Configuration
Crawler accepts the following options:
| Parameter | Type | Default | Description |
|---|---|---|---|
timeout |
int |
10 |
Timeout (in seconds) for HTTP requests. |
headers |
dict |
None |
HTTP headers to include with each request. |
…and TorCrawler accepts the following options during initialization:
| Parameter | Type | Default | Description |
|---|---|---|---|
address |
str |
None |
SOCKS5 proxy address for routing traffic through Tor. |
password |
str |
None |
Control port password for authenticating with the Tor proxy. |
timeout |
int |
10 |
Timeout (in seconds) for HTTP requests. |
headers |
dict |
None |
HTTP headers to include with each request. |
Usage Example
The example below demonstrates how to fetch your current IP address over the Tor network:
import asyncio
from myrmex import TorCrawler
async def main():
async with TorCrawler("socks5h://127.0.0.1:9050", password="password") as crawler:
await crawler.rotate_ip() # optional: rotates IP before request
result = await crawler.fetch("http://httpbin.org/ip")
if result.is_ok():
print("Current IP:", result.unwrap())
asyncio.run(main())
Tor Setup
Since TorCrawler is strictly associated with Tor network usage, ensure that you have a configured and running Tor instance before using it.
Update your torrc configuration file with the following:
SocksPort 0.0.0.0:9050
ControlPort 0.0.0.0:9051
HashedControlPassword ***
To generate a hashed password:
tor --hash-password your_password
Start Tor manually in the background:
tor &
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file myrmex-0.1.1.tar.gz.
File metadata
- Download URL: myrmex-0.1.1.tar.gz
- Upload date:
- Size: 4.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.4.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a47fcd8f94bfe1f238ee4efba100b8f9695a4ce59544fc78533044cfda703c1f
|
|
| MD5 |
5bc2217727c0e65d5d0ea4d0a1dbb670
|
|
| BLAKE2b-256 |
e34a2b64e544e999066a279e77ebe91f80f34074f216cbb2ed455572cf97ebd9
|
File details
Details for the file myrmex-0.1.1-py3-none-any.whl.
File metadata
- Download URL: myrmex-0.1.1-py3-none-any.whl
- Upload date:
- Size: 7.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.4.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c383158817783cf7d9ec9976d0c006faa5b2faa23d1b187c79129572385451ba
|
|
| MD5 |
5d719e975746393f48b7b766331a5b36
|
|
| BLAKE2b-256 |
28a32f153852b3ce407ef446b49cf413f1a724cfc58d446d486b13f1062efad6
|