Skip to main content

A professional Weibo crawler library

Project description

Crawl4Weibo

中文文档 | English


Crawl4Weibo is a ready-to-use Weibo (微博) web scraper Python library that simulates mobile requests, handles common anti-scraping strategies, and returns structured data models—ideal for data collection, analysis, and monitoring scenarios.

✨ Features

  • No Cookie Required: Runs without cookies, automatically initializes session with mobile User-Agent
  • Built-in 432 Protection: Handles anti-scraping protection with exponential backoff retry mechanism
  • Unified Proxy Pool Management: Supports both dynamic and static IP proxy pools with configurable TTL, polling strategies, and automatic cleanup
  • Standardized Data Models: Clean User and Post data models with recursive access to reposted content
  • Long Text Expansion: Supports expanding truncated long posts, keyword search, user list fetching, and batch pagination
  • Image Download Utilities: Download images from single posts, batches, or entire pages with duplicate file detection
  • Unified Logging & Error Types: Quickly locate network, parsing, or authentication issues

Installation

pip install crawl4weibo

Or use the faster uv:

uv pip install crawl4weibo

Quick Start

from crawl4weibo import WeiboClient

client = WeiboClient()
uid = "2656274875"

# Get user information
user = client.get_user_by_uid(uid)
print(f"{user.screen_name} - Followers: {user.followers_count}")

# Get user posts (with long text expansion)
posts = client.get_user_posts(uid, page=1, expand=True)
for post in posts[:3]:
    print(f"{post.text[:50]}... - Likes: {post.attitudes_count}")

# Search users
users = client.search_users("新浪")
for user in users[:3]:
    print(f"{user.screen_name} - Followers: {user.followers_count}")

# Search posts
results = client.search_posts("人工智能", page=1)
print(f"Found {len(results)} results")

For more examples, see examples/simple_example.py.

Run the example:

# Clone the repository first
python examples/simple_example.py

# Or using uv
uv run python examples/simple_example.py

Image Download Example

from crawl4weibo import WeiboClient

client = WeiboClient()

# Method 1: Download images from a single post
post = client.get_post_by_bid("Q6FyDtbQc")
if post.pic_urls:
    results = client.download_post_images(
        post,
        download_dir="./downloads",
        subdir="single_post"
    )
    print(f"Successfully downloaded {sum(1 for p in results.values() if p)} images")

# Method 2: Batch download images from user posts
posts = client.get_user_posts("2656274875", page=1)
results = client.download_posts_images(
    posts[:3],  # Download images from first 3 posts
    download_dir="./downloads"
)

# Method 3: Download images from multiple pages of user posts
results = client.download_user_posts_images(
    uid="2656274875",
    pages=2,  # Download from first 2 pages
    download_dir="./downloads"
)

For more usage details, see examples/download_images_example.py.

Run the example:

python examples/download_images_example.py

Proxy Pool Configuration Example

from crawl4weibo import WeiboClient

# Method 1: Use dynamic proxy API
client = WeiboClient(
    proxy_api_url="http://api.proxy.com/get?format=json",
    dynamic_proxy_ttl=300,      # Dynamic proxy TTL in seconds
    proxy_pool_size=10,         # Proxy pool capacity
    proxy_fetch_strategy="random"  # random or round_robin
)

# Method 2: Manually add static proxies
client = WeiboClient()
client.add_proxy("http://1.2.3.4:8080", ttl=600)  # With TTL
client.add_proxy("http://5.6.7.8:8080")  # Never expires

# Method 3: Mix dynamic and static proxies
client = WeiboClient(
    proxy_api_url="http://api.proxy.com/get",
    proxy_pool_size=20
)
client.add_proxy("http://1.2.3.4:8080", ttl=None)

# Method 4: Custom parser (adapt to different proxy providers)
def custom_parser(data):
    return f"http://{data['result']['ip']}:{data['result']['port']}"

client = WeiboClient(
    proxy_api_url="http://custom-api.com/proxy",
    proxy_api_parser=custom_parser
)

# Flexible control of proxy usage per request
user = client.get_user_by_uid("2656274875", use_proxy=False)
posts = client.get_user_posts("2656274875", page=1)  # Uses proxy

API Overview

  • get_user_by_uid(uid): Get user profile and statistics
  • get_user_posts(uid, page=1, expand=False): Fetch user timeline posts with optional long text expansion
  • get_post_by_bid(bid): Get full content and media info for a single post
  • search_users(query, page=1, count=10) / search_posts(query, page=1): Keyword search
  • download_post_images(post, ...), download_user_posts_images(uid, pages=2, ...): Download image assets
  • Unified Exceptions: NetworkError, RateLimitError, UserNotFoundError, etc., for business-level error handling

Development & Testing

uv sync --dev                # Install dev dependencies
uv run pytest                # Run all tests (includes unit/integration/slow markers)
uv run ruff check crawl4weibo --fix
uv run ruff format crawl4weibo
uv run python examples/simple_example.py

For project structure, contribution guidelines, and more workflows, see docs/DEVELOPMENT.md and AGENTS.md.

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crawl4weibo-0.2.1.tar.gz (111.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

crawl4weibo-0.2.1-py3-none-any.whl (21.6 kB view details)

Uploaded Python 3

File details

Details for the file crawl4weibo-0.2.1.tar.gz.

File metadata

  • Download URL: crawl4weibo-0.2.1.tar.gz
  • Upload date:
  • Size: 111.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.14

File hashes

Hashes for crawl4weibo-0.2.1.tar.gz
Algorithm Hash digest
SHA256 c42f99193d7a81d7f789c8d85442775963cab5333a7f8151c7e4dc15206650cf
MD5 71228d1dd2462ef3a86d0b44a251ff16
BLAKE2b-256 1db4f0e1985b129f4ae16a3797d3d1052356eecfad06da8f3f1fc92029583cd0

See more details on using hashes here.

File details

Details for the file crawl4weibo-0.2.1-py3-none-any.whl.

File metadata

File hashes

Hashes for crawl4weibo-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 4c8c2983770b1210fd05cf4b45a2df99851e17ee6d3f64028c16946a07e13e1c
MD5 813c15510eb1b3c3bfdbcf0433373cda
BLAKE2b-256 c182a2bbe696f77c6e7959b7b2a3dc00773ab8d69d57ed5192e9f59920251ce1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page