Skip to main content

High-scale Medium scraper: request abstraction and HTML->Markdown parser

Project description

Medium Scraper

A high-scale, async Medium scraper with request abstraction and HTML-to-Markdown parser. Quickly discover and convert Medium articles to clean Markdown with our intuitive web interface.

🌐 Web Interface (Recommended)

The easiest way to use Medium Scraper is through our web interface:

# Install web dependencies  
pip install medium-scraper[web]

# Run web server
cd web && python app.py

Then open your browser to http://localhost:8000

Docker Deployment (Easiest Setup)

cd web && docker build -t medium-scraper-web . && docker run -p 8000:8000 medium-scraper-web

Features

  • Intuitive GUI for scraping Medium articles
  • Real-time progress tracking via WebSocket
  • Download results as ZIP files
  • Job history and persistent storage
  • Multiple request modes:
    • Decodo API: Smart managed scraping (requires Decodo API key)
    • Custom Proxies: Bring your own proxy list
    • Proxyless: Direct requests with your IP

🔧 Core Features

All components share these powerful features:

  • Async/await support for high-performance operations
  • Multiple request backends: Direct requests, custom proxies, or Decodo Scraper API
  • Intelligent caching with configurable backends
  • Progress tracking with callbacks
  • Concurrent processing with rate limiting
  • Robust error handling and retries

Web Interface

  • Intuitive GUI for scraping Medium articles
  • Real-time progress tracking via WebSocket
  • Download results as ZIP files
  • Job history and persistent storage
  • Multiple request modes:
    • Decodo API: Smart managed scraping (requires Decodo API key)
    • Custom Proxies: Bring your own proxy list
    • Proxyless: Direct requests with your IP

CLI Tool

  • Interactive prompts with rich formatting
  • Progress bars and detailed statistics
  • Multiple output formats (JSON, Markdown files)
  • Proxy support and custom configurations
  • Tag pagination and article filtering

Core Library

These features are also available when using the library programmatically. See our Library Documentation for details.

🛠️ Installation Options

Basic Installation

pip install medium-scraper

📚 Request Senders

The library supports multiple request backends:

  1. RequestsRequestSender: Standard requests library (works with custom proxies or proxyless)
  2. DecodoScraperRequestSender: Advanced scraping with Decodo API (requires API key)
  3. CachedRequestSender: Adds caching to any sender

Choose the appropriate sender based on your needs:

from medium_scraper import RequestsRequestSender, DecodoScraperRequestSender

# For simple use cases (proxyless or with custom proxies)
sender = RequestsRequestSender()

# For advanced scraping with Decodo (requires API key from https://decodo.com)
sender = DecodoScraperRequestSender(api_key="your-decodo-api-key")

🔄 Async Usage

The library is designed to be fully async:

import asyncio
from medium_scraper import MediumExplorer

async def main():
    explorer = MediumExplorer()
    articles = await explorer.get_tag_articles("python", limit=5)
    
    for article in articles:
        print(f"Title: {article.title}")
        print(f"URL: {article.url}")

asyncio.run(main())

📚 Library Usage

For programmatic usage of the core library, please refer to our Library Documentation which provides detailed examples.

🖥️ CLI Tool

The command-line interface offers powerful scraping capabilities. See our CLI Documentation for comprehensive usage instructions.

📖 Documentation

For more detailed information about each component, please see the documentation in the docs folder:

🔗 Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

medium_scraper-0.1.1.tar.gz (21.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

medium_scraper-0.1.1-py3-none-any.whl (20.8 kB view details)

Uploaded Python 3

File details

Details for the file medium_scraper-0.1.1.tar.gz.

File metadata

  • Download URL: medium_scraper-0.1.1.tar.gz
  • Upload date:
  • Size: 21.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.11

File hashes

Hashes for medium_scraper-0.1.1.tar.gz
Algorithm Hash digest
SHA256 90fc34a96e9de4eafa9e07c641c46cf947edb59fcb6dda595c76b8f817976bf7
MD5 bcb60bc68d09ab5c7013095ef0f910f7
BLAKE2b-256 44c4f96412140febfa4bdf5e3f651631627ea8536bcbd0c2ec781886c5bc3d1c

See more details on using hashes here.

File details

Details for the file medium_scraper-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: medium_scraper-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 20.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.11

File hashes

Hashes for medium_scraper-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 ca995a6c8a6a41912ca6190f4f179a31edd48fac616b8ae992196adbcd47e34d
MD5 cd1921d9eb7df8ceab523293aadc705e
BLAKE2b-256 56ce74cca083431abb741f72118146fe07e085b5fe1871e2c27feb3ff39d6f88

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page