A professional Weibo crawler library
Project description
Crawl4Weibo
中文文档 | English
Crawl4Weibo is a ready-to-use Weibo (微博) web scraper Python library that simulates mobile requests, handles common anti-scraping strategies, and returns structured data models—ideal for data collection, analysis, and monitoring scenarios.
✨ Features
- No Cookie Required: Runs without cookies, automatically initializes session with mobile User-Agent
- Built-in 432 Protection: Handles anti-scraping protection with exponential backoff retry mechanism
- Unified Proxy Pool Management: Supports both dynamic and static IP proxy pools with configurable TTL, polling strategies, and automatic cleanup
- Standardized Data Models: Clean
UserandPostdata models with recursive access to reposted content - Long Text Expansion: Supports expanding truncated long posts, keyword search, user list fetching, and batch pagination
- Image Download Utilities: Download images from single posts, batches, or entire pages with duplicate file detection
- Unified Logging & Error Types: Quickly locate network, parsing, or authentication issues
Installation
pip install crawl4weibo
Or use the faster uv:
uv pip install crawl4weibo
Quick Start
from crawl4weibo import WeiboClient
client = WeiboClient()
uid = "2656274875"
# Get user information
user = client.get_user_by_uid(uid)
print(f"{user.screen_name} - Followers: {user.followers_count}")
# Get user posts (with long text expansion)
posts = client.get_user_posts(uid, page=1, expand=True)
for post in posts[:3]:
print(f"{post.text[:50]}... - Likes: {post.attitudes_count}")
# Search users
users = client.search_users("新浪")
for user in users[:3]:
print(f"{user.screen_name} - Followers: {user.followers_count}")
# Search posts
results = client.search_posts("人工智能", page=1)
print(f"Found {len(results)} results")
For more examples, see examples/simple_example.py.
Run the example:
# Clone the repository first
python examples/simple_example.py
# Or using uv
uv run python examples/simple_example.py
Image Download Example
from crawl4weibo import WeiboClient
client = WeiboClient()
# Method 1: Download images from a single post
post = client.get_post_by_bid("Q6FyDtbQc")
if post.pic_urls:
results = client.download_post_images(
post,
download_dir="./downloads",
subdir="single_post"
)
print(f"Successfully downloaded {sum(1 for p in results.values() if p)} images")
# Method 2: Batch download images from user posts
posts = client.get_user_posts("2656274875", page=1)
results = client.download_posts_images(
posts[:3], # Download images from first 3 posts
download_dir="./downloads"
)
# Method 3: Download images from multiple pages of user posts
results = client.download_user_posts_images(
uid="2656274875",
pages=2, # Download from first 2 pages
download_dir="./downloads"
)
For more usage details, see examples/download_images_example.py.
Run the example:
python examples/download_images_example.py
Proxy Pool Configuration Example
from crawl4weibo import WeiboClient
# Method 1: Use dynamic proxy API (pooling mode - default)
client = WeiboClient(
proxy_api_url="http://api.proxy.com/get?format=json",
dynamic_proxy_ttl=300, # Dynamic proxy TTL in seconds
proxy_pool_size=10, # Proxy pool capacity
proxy_fetch_strategy="random" # random or round_robin
)
# Method 2: One-time proxy mode (for single-use IP providers)
client = WeiboClient(
proxy_api_url="http://api.proxy.com/get",
use_once_proxy=True,
)
# Efficient: Uses all returned IPs before fetching new batch
# Method 3: Manually add static proxies
client = WeiboClient()
client.add_proxy("http://1.2.3.4:8080", ttl=600) # With TTL
client.add_proxy("http://5.6.7.8:8080") # Never expires
# Method 4: Mix dynamic and static proxies
client = WeiboClient(
proxy_api_url="http://api.proxy.com/get",
proxy_pool_size=20
)
client.add_proxy("http://1.2.3.4:8080", ttl=None)
# Method 5: Custom parser (adapt to different proxy providers)
def custom_parser(data):
return [f"http://{data['result']['ip']}:{data['result']['port']}"]
client = WeiboClient(
proxy_api_url="http://custom-api.com/proxy",
proxy_api_parser=custom_parser
)
# Flexible control of proxy usage per request
user = client.get_user_by_uid("2656274875", use_proxy=False)
posts = client.get_user_posts("2656274875", page=1) # Uses proxy
API Overview
get_user_by_uid(uid): Get user profile and statisticsget_user_posts(uid, page=1, expand=False): Fetch user timeline posts with optional long text expansionget_post_by_bid(bid): Get full content and media info for a single postsearch_users(query, page=1, count=10)/search_posts(query, page=1): Keyword searchdownload_post_images(post, ...),download_user_posts_images(uid, pages=2, ...): Download image assets- Unified Exceptions:
NetworkError,RateLimitError,UserNotFoundError, etc., for business-level error handling
Development & Testing
uv sync --dev # Install dev dependencies
uv run pytest # Run all tests (includes unit/integration/slow markers)
uv run ruff check crawl4weibo --fix
uv run ruff format crawl4weibo
uv run python examples/simple_example.py
For project structure, contribution guidelines, and more workflows, see docs/DEVELOPMENT.md and AGENTS.md.
License
MIT License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file crawl4weibo-0.2.2.tar.gz.
File metadata
- Download URL: crawl4weibo-0.2.2.tar.gz
- Upload date:
- Size: 117.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d2288d5000d8efc0258b6e0925b8ce9252f7206b78060a416d4511a63444356b
|
|
| MD5 |
bf4f647205457ea1a102fa8ded08a68a
|
|
| BLAKE2b-256 |
059dc29c313b595d2b03ac75b14b7d119d308db9a31d2d7c74d68aaf3eb7ca58
|
File details
Details for the file crawl4weibo-0.2.2-py3-none-any.whl.
File metadata
- Download URL: crawl4weibo-0.2.2-py3-none-any.whl
- Upload date:
- Size: 22.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d8464b3a16f123632e031589047a7e7aad85faa8240545eafebb6dcde9bb70f4
|
|
| MD5 |
063aaf2dc60da8f0b6101e8bf4141718
|
|
| BLAKE2b-256 |
69f0694cf10d3cec44d1ae0236e9cacf480430c23b6339194c539a14853a2aaa
|