Skip to main content

The compact web crawling toolkit

Project description

Alt Text

myrmex

The compact web crawling toolkit.

Unlike full-featured frameworks, myrmex does not implement an entire scraping pipeline. Instead, it focuses exclusively on core crawling functionality. Higher-level scraping logic is left to the specific implementation of your scraper.

If you're looking for a complete scraping framework, consider Scrapy.

myrmex provides a minimal interface through two crawler classes — Crawler and TorCrawler — for regular HTTP crawling and Tor-based anonymous crawling, respectively.

Key Capabilities

  • Asynchronous context management for automatic resource handling
  • Built on aiohttp for HTTP requests
  • Executes synchronous operations using the native asyncio thread pool (non-blocking)
  • Functional-style error handling via Result
  • Configurable per-operation timeouts for robust request management

Installation

Install via pip:

pip install myrmex

Or using uv:

uv add myrmex

Please note that the following libraries will be installed alongside myrmex:

  • aiohttp – for HTTP requests
  • aiohttp-socks – for SOCKS5 proxy support
  • stem – for Tor control port integration
  • result – for functional-style error handling

Configuration

Crawler accepts the following options:

Parameter Type Default Description
timeout int 10 Timeout (in seconds) for HTTP requests.
headers dict None HTTP headers to include with each request.

…and TorCrawler accepts the following options during initialization:

Parameter Type Default Description
address str None SOCKS5 proxy address for routing traffic through Tor.
password str None Control port password for authenticating with the Tor proxy.
timeout int 10 Timeout (in seconds) for HTTP requests.
headers dict None HTTP headers to include with each request.

Usage Example

The example below demonstrates how to fetch your current IP address over the Tor network:

import asyncio
from myrmex import TorCrawler

async def main():
    async with TorCrawler("socks5h://127.0.0.1:9050", password="password") as crawler:
        await crawler.rotate_ip()  # optional: rotates IP before request
        result = await crawler.fetch("http://httpbin.org/ip")
        if result.is_ok():
            print("Current IP:", result.unwrap())

asyncio.run(main())

Tor Setup

Since TorCrawler is strictly associated with Tor network usage, ensure that you have a configured and running Tor instance before using it.

Update your torrc configuration file with the following:

SocksPort 0.0.0.0:9050
ControlPort 0.0.0.0:9051
HashedControlPassword ***

To generate a hashed password:

tor --hash-password your_password

Start Tor manually in the background:

tor &

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

myrmex-0.1.5.tar.gz (4.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

myrmex-0.1.5-py3-none-any.whl (7.3 kB view details)

Uploaded Python 3

File details

Details for the file myrmex-0.1.5.tar.gz.

File metadata

  • Download URL: myrmex-0.1.5.tar.gz
  • Upload date:
  • Size: 4.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.4.18

File hashes

Hashes for myrmex-0.1.5.tar.gz
Algorithm Hash digest
SHA256 f171bc922ae18a3d88a5178edfe4960cebdfaf23d41578f32b9a499042dbe85c
MD5 3a9368109ab67b9f081abf2a0a9dbea3
BLAKE2b-256 142bac5dec11b028d4c3b6234de4ef42a234b7f7beee7c4cbf106297df1329a5

See more details on using hashes here.

File details

Details for the file myrmex-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: myrmex-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 7.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.4.18

File hashes

Hashes for myrmex-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 cac49a7c41e2218a8aaf5d2bd7d4e3fe5b89f03532eed22a61b3f94aa3f1ff5d
MD5 f72b55ba461b04889c9e06e26d6a1313
BLAKE2b-256 ff768d67bc0a6a7997a26039d52ca6abb31a94d19759a8b07c0b0b23931004f3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page