High-scale Medium scraper: request abstraction and HTML->Markdown parser

Project description

Medium Scraper

A high-scale, async Medium scraper with request abstraction and HTML-to-Markdown parser. Quickly discover and convert Medium articles to clean Markdown with our intuitive web interface.

🌐 Web Interface (Recommended)

The easiest way to use Medium Scraper is through our web interface:

# Install web dependencies  
pip install medium-scraper[web]

# Run web server
cd web && python app.py

Then open your browser to http://localhost:8000

Docker Deployment (Easiest Setup)

cd web && docker build -t medium-scraper-web . && docker run -p 8000:8000 medium-scraper-web

Features

Intuitive GUI for scraping Medium articles
Real-time progress tracking via WebSocket
Download results as ZIP files
Job history and persistent storage
Multiple request modes:
- Decodo API: Smart managed scraping (requires Decodo API key)
- Custom Proxies: Bring your own proxy list
- Proxyless: Direct requests with your IP

🔧 Core Features

All components share these powerful features:

Async/await support for high-performance operations
Multiple request backends: Direct requests, custom proxies, or Decodo Scraper API
Intelligent caching with configurable backends
Progress tracking with callbacks
Concurrent processing with rate limiting
Robust error handling and retries

Web Interface

Intuitive GUI for scraping Medium articles
Real-time progress tracking via WebSocket
Download results as ZIP files
Job history and persistent storage
Multiple request modes:
- Decodo API: Smart managed scraping (requires Decodo API key)
- Custom Proxies: Bring your own proxy list
- Proxyless: Direct requests with your IP

CLI Tool

Interactive prompts with rich formatting
Progress bars and detailed statistics
Multiple output formats (JSON, Markdown files)
Proxy support and custom configurations
Tag pagination and article filtering

Core Library

These features are also available when using the library programmatically. See our Library Documentation for details.

🛠️ Installation Options

Basic Installation

pip install medium-scraper

📚 Request Senders

The library supports multiple request backends:

RequestsRequestSender: Standard requests library (works with custom proxies or proxyless)
DecodoScraperRequestSender: Advanced scraping with Decodo API (requires API key)
CachedRequestSender: Adds caching to any sender

Choose the appropriate sender based on your needs:

from medium_scraper import RequestsRequestSender, DecodoScraperRequestSender

# For simple use cases (proxyless or with custom proxies)
sender = RequestsRequestSender()

# For advanced scraping with Decodo (requires API key from https://decodo.com)
sender = DecodoScraperRequestSender(api_key="your-decodo-api-key")

🔄 Async Usage

The library is designed to be fully async:

import asyncio
from medium_scraper import MediumExplorer

async def main():
    explorer = MediumExplorer()
    articles = await explorer.get_tag_articles("python", limit=5)
    
    for article in articles:
        print(f"Title: {article.title}")
        print(f"URL: {article.url}")

asyncio.run(main())

📚 Library Usage

For programmatic usage of the core library, please refer to our Library Documentation which provides detailed examples.

🖥️ CLI Tool

The command-line interface offers powerful scraping capabilities. See our CLI Documentation for comprehensive usage instructions.

📖 Documentation

For more detailed information about each component, please see the documentation in the docs folder:

🔗 Links

Project details

Release history Release notifications | RSS feed

0.1.1

Aug 26, 2025

This version

0.1.0

Aug 26, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

medium_scraper-0.1.0.tar.gz (21.6 kB view details)

Uploaded Aug 26, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

medium_scraper-0.1.0-py3-none-any.whl (20.8 kB view details)

Uploaded Aug 26, 2025 Python 3

File details

Details for the file medium_scraper-0.1.0.tar.gz.

File metadata

Download URL: medium_scraper-0.1.0.tar.gz
Upload date: Aug 26, 2025
Size: 21.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.8.11

File hashes

Hashes for medium_scraper-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`16d851cd62014a5cc33acbb4f4427067f0541930911b17f4fb960a1db5feb47a`
MD5	`508edea2286dec19d136da3835b412e8`
BLAKE2b-256	`a3acdaf1b0f6ee1528346cb6a6330c5d207ac4af06c1a793486a3faf0196f04a`

See more details on using hashes here.

File details

Details for the file medium_scraper-0.1.0-py3-none-any.whl.

File metadata

Download URL: medium_scraper-0.1.0-py3-none-any.whl
Upload date: Aug 26, 2025
Size: 20.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.8.11

File hashes

Hashes for medium_scraper-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`31e8aa6071c94332df6b0b17a0270bc7a842bfd6f3727585dd73cd77dbb3085a`
MD5	`a761213f4d3ad2b3069c6b5511f06ac9`
BLAKE2b-256	`9ee6cf3abbabb43afbfd8e652e4742b3a2504aadc25d56f53159d28a702eab46`

See more details on using hashes here.

medium-scraper 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

Medium Scraper

🌐 Web Interface (Recommended)

Docker Deployment (Easiest Setup)

Features

🔧 Core Features

Web Interface

CLI Tool

Core Library

🛠️ Installation Options

Basic Installation

📚 Request Senders

🔄 Async Usage

📚 Library Usage

🖥️ CLI Tool

📖 Documentation

🔗 Links

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes