Skip to main content

High-scale Medium scraper: request abstraction and HTML->Markdown parser

Project description

Medium Scraper

A high-scale, async Medium scraper with request abstraction and HTML-to-Markdown parser. Quickly discover and convert Medium articles to clean Markdown with our intuitive web interface.

🌐 Web Interface (Recommended)

The easiest way to use Medium Scraper is through our web interface:

# Install web dependencies  
pip install medium-scraper[web]

# Run web server
cd web && python app.py

Then open your browser to http://localhost:8000

Docker Deployment (Easiest Setup)

cd web && docker build -t medium-scraper-web . && docker run -p 8000:8000 medium-scraper-web

Features

  • Intuitive GUI for scraping Medium articles
  • Real-time progress tracking via WebSocket
  • Download results as ZIP files
  • Job history and persistent storage
  • Multiple request modes:
    • Decodo API: Smart managed scraping (requires Decodo API key)
    • Custom Proxies: Bring your own proxy list
    • Proxyless: Direct requests with your IP

🔧 Core Features

All components share these powerful features:

  • Async/await support for high-performance operations
  • Multiple request backends: Direct requests, custom proxies, or Decodo Scraper API
  • Intelligent caching with configurable backends
  • Progress tracking with callbacks
  • Concurrent processing with rate limiting
  • Robust error handling and retries

Web Interface

  • Intuitive GUI for scraping Medium articles
  • Real-time progress tracking via WebSocket
  • Download results as ZIP files
  • Job history and persistent storage
  • Multiple request modes:
    • Decodo API: Smart managed scraping (requires Decodo API key)
    • Custom Proxies: Bring your own proxy list
    • Proxyless: Direct requests with your IP

CLI Tool

  • Interactive prompts with rich formatting
  • Progress bars and detailed statistics
  • Multiple output formats (JSON, Markdown files)
  • Proxy support and custom configurations
  • Tag pagination and article filtering

Core Library

These features are also available when using the library programmatically. See our Library Documentation for details.

🛠️ Installation Options

Basic Installation

pip install medium-scraper

📚 Request Senders

The library supports multiple request backends:

  1. RequestsRequestSender: Standard requests library (works with custom proxies or proxyless)
  2. DecodoScraperRequestSender: Advanced scraping with Decodo API (requires API key)
  3. CachedRequestSender: Adds caching to any sender

Choose the appropriate sender based on your needs:

from medium_scraper import RequestsRequestSender, DecodoScraperRequestSender

# For simple use cases (proxyless or with custom proxies)
sender = RequestsRequestSender()

# For advanced scraping with Decodo (requires API key from https://decodo.com)
sender = DecodoScraperRequestSender(api_key="your-decodo-api-key")

🔄 Async Usage

The library is designed to be fully async:

import asyncio
from medium_scraper import MediumExplorer

async def main():
    explorer = MediumExplorer()
    articles = await explorer.get_tag_articles("python", limit=5)
    
    for article in articles:
        print(f"Title: {article.title}")
        print(f"URL: {article.url}")

asyncio.run(main())

📚 Library Usage

For programmatic usage of the core library, please refer to our Library Documentation which provides detailed examples.

🖥️ CLI Tool

The command-line interface offers powerful scraping capabilities. See our CLI Documentation for comprehensive usage instructions.

📖 Documentation

For more detailed information about each component, please see the documentation in the docs folder:

🔗 Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

medium_scraper-0.1.0.tar.gz (21.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

medium_scraper-0.1.0-py3-none-any.whl (20.8 kB view details)

Uploaded Python 3

File details

Details for the file medium_scraper-0.1.0.tar.gz.

File metadata

  • Download URL: medium_scraper-0.1.0.tar.gz
  • Upload date:
  • Size: 21.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.11

File hashes

Hashes for medium_scraper-0.1.0.tar.gz
Algorithm Hash digest
SHA256 16d851cd62014a5cc33acbb4f4427067f0541930911b17f4fb960a1db5feb47a
MD5 508edea2286dec19d136da3835b412e8
BLAKE2b-256 a3acdaf1b0f6ee1528346cb6a6330c5d207ac4af06c1a793486a3faf0196f04a

See more details on using hashes here.

File details

Details for the file medium_scraper-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: medium_scraper-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 20.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.11

File hashes

Hashes for medium_scraper-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 31e8aa6071c94332df6b0b17a0270bc7a842bfd6f3727585dd73cd77dbb3085a
MD5 a761213f4d3ad2b3069c6b5511f06ac9
BLAKE2b-256 9ee6cf3abbabb43afbfd8e652e4742b3a2504aadc25d56f53159d28a702eab46

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page