High-scale Medium scraper: request abstraction and HTML->Markdown parser
Project description
Medium Scraper
A high-scale, async Medium scraper with request abstraction and HTML-to-Markdown parser. Quickly discover and convert Medium articles to clean Markdown with our intuitive web interface.
🌐 Web Interface (Recommended)
The easiest way to use Medium Scraper is through our web interface:
# Install web dependencies
pip install medium-scraper[web]
# Run web server
cd web && python app.py
Then open your browser to http://localhost:8000
Docker Deployment (Easiest Setup)
cd web && docker build -t medium-scraper-web . && docker run -p 8000:8000 medium-scraper-web
Features
- Intuitive GUI for scraping Medium articles
- Real-time progress tracking via WebSocket
- Download results as ZIP files
- Job history and persistent storage
- Multiple request modes:
- Decodo API: Smart managed scraping (requires Decodo API key)
- Custom Proxies: Bring your own proxy list
- Proxyless: Direct requests with your IP
🔧 Core Features
All components share these powerful features:
- Async/await support for high-performance operations
- Multiple request backends: Direct requests, custom proxies, or Decodo Scraper API
- Intelligent caching with configurable backends
- Progress tracking with callbacks
- Concurrent processing with rate limiting
- Robust error handling and retries
Web Interface
- Intuitive GUI for scraping Medium articles
- Real-time progress tracking via WebSocket
- Download results as ZIP files
- Job history and persistent storage
- Multiple request modes:
- Decodo API: Smart managed scraping (requires Decodo API key)
- Custom Proxies: Bring your own proxy list
- Proxyless: Direct requests with your IP
CLI Tool
- Interactive prompts with rich formatting
- Progress bars and detailed statistics
- Multiple output formats (JSON, Markdown files)
- Proxy support and custom configurations
- Tag pagination and article filtering
Core Library
These features are also available when using the library programmatically. See our Library Documentation for details.
🛠️ Installation Options
Basic Installation
pip install medium-scraper
📚 Request Senders
The library supports multiple request backends:
- RequestsRequestSender: Standard requests library (works with custom proxies or proxyless)
- DecodoScraperRequestSender: Advanced scraping with Decodo API (requires API key)
- CachedRequestSender: Adds caching to any sender
Choose the appropriate sender based on your needs:
from medium_scraper import RequestsRequestSender, DecodoScraperRequestSender
# For simple use cases (proxyless or with custom proxies)
sender = RequestsRequestSender()
# For advanced scraping with Decodo (requires API key from https://decodo.com)
sender = DecodoScraperRequestSender(api_key="your-decodo-api-key")
🔄 Async Usage
The library is designed to be fully async:
import asyncio
from medium_scraper import MediumExplorer
async def main():
explorer = MediumExplorer()
articles = await explorer.get_tag_articles("python", limit=5)
for article in articles:
print(f"Title: {article.title}")
print(f"URL: {article.url}")
asyncio.run(main())
📚 Library Usage
For programmatic usage of the core library, please refer to our Library Documentation which provides detailed examples.
🖥️ CLI Tool
The command-line interface offers powerful scraping capabilities. See our CLI Documentation for comprehensive usage instructions.
📖 Documentation
For more detailed information about each component, please see the documentation in the docs folder:
🔗 Links
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file medium_scraper-0.1.1.tar.gz.
File metadata
- Download URL: medium_scraper-0.1.1.tar.gz
- Upload date:
- Size: 21.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.8.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
90fc34a96e9de4eafa9e07c641c46cf947edb59fcb6dda595c76b8f817976bf7
|
|
| MD5 |
bcb60bc68d09ab5c7013095ef0f910f7
|
|
| BLAKE2b-256 |
44c4f96412140febfa4bdf5e3f651631627ea8536bcbd0c2ec781886c5bc3d1c
|
File details
Details for the file medium_scraper-0.1.1-py3-none-any.whl.
File metadata
- Download URL: medium_scraper-0.1.1-py3-none-any.whl
- Upload date:
- Size: 20.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.8.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ca995a6c8a6a41912ca6190f4f179a31edd48fac616b8ae992196adbcd47e34d
|
|
| MD5 |
cd1921d9eb7df8ceab523293aadc705e
|
|
| BLAKE2b-256 |
56ce74cca083431abb741f72118146fe07e085b5fe1871e2c27feb3ff39d6f88
|