Skip to main content

Official Python SDK for ScrapeBadger - Async web scraping APIs for Twitter and more

Project description

ScrapeBadger

ScrapeBadger Python SDK

PyPI version Python versions License Tests Coverage Code style: ruff Type checked: mypy

The official Python SDK for ScrapeBadger - async web scraping APIs for Twitter and more.

Features

  • Async-first design - Built with asyncio for high-performance concurrent scraping
  • Type-safe - Full type hints and Pydantic models for all API responses
  • Automatic pagination - Iterator methods for seamless pagination through large datasets
  • Smart rate limit handling - Reads API rate limit headers and automatically throttles pagination to avoid hitting limits
  • Resilient retries - 10 automatic retries with exponential backoff on 502/503/504 errors, with console warnings on each retry
  • Comprehensive coverage - Access to 37+ Twitter endpoints (tweets, users, lists, communities, trends, geo)

Installation

pip install scrapebadger

Or with uv:

uv add scrapebadger

Quick Start

import asyncio
from scrapebadger import ScrapeBadger

async def main():
    async with ScrapeBadger(api_key="your-api-key") as client:
        # Get a user profile
        user = await client.twitter.users.get_by_username("elonmusk")
        print(f"{user.name} has {user.followers_count:,} followers")

        # Search tweets
        tweets = await client.twitter.tweets.search("python programming")
        for tweet in tweets.data:
            print(f"@{tweet.username}: {tweet.text[:100]}...")

asyncio.run(main())

Authentication

Get your API key from scrapebadger.com and pass it to the client:

from scrapebadger import ScrapeBadger

client = ScrapeBadger(api_key="sb_live_xxxxxxxxxxxxx")

You can also set the SCRAPEBADGER_API_KEY environment variable:

export SCRAPEBADGER_API_KEY="sb_live_xxxxxxxxxxxxx"

Usage Examples

Twitter Users

async with ScrapeBadger(api_key="your-key") as client:
    # Get user by username
    user = await client.twitter.users.get_by_username("elonmusk")
    print(f"{user.name} (@{user.username})")
    print(f"Followers: {user.followers_count:,}")
    print(f"Following: {user.following_count:,}")
    print(f"Bio: {user.description}")

    # Get user by ID
    user = await client.twitter.users.get_by_id("44196397")

    # Get extended "About" information
    about = await client.twitter.users.get_about("elonmusk")
    print(f"Account based in: {about.account_based_in}")
    print(f"Username changes: {about.username_changes}")

Twitter Tweets

async with ScrapeBadger(api_key="your-key") as client:
    # Get a single tweet
    tweet = await client.twitter.tweets.get_by_id("1234567890")
    print(f"@{tweet.username}: {tweet.text}")
    print(f"Likes: {tweet.favorite_count:,}, Retweets: {tweet.retweet_count:,}")

    # Get multiple tweets
    tweets = await client.twitter.tweets.get_by_ids([
        "1234567890",
        "0987654321"
    ])

    # Search tweets
    from scrapebadger.twitter import QueryType

    results = await client.twitter.tweets.search(
        "python programming",
        query_type=QueryType.LATEST  # TOP, LATEST, or MEDIA
    )

    # Get user's timeline
    tweets = await client.twitter.tweets.get_user_tweets("elonmusk")

Automatic Pagination

All paginated endpoints support both manual pagination and automatic iteration:

async with ScrapeBadger(api_key="your-key") as client:
    # Manual pagination
    followers = await client.twitter.users.get_followers("elonmusk")
    for user in followers.data:
        print(f"@{user.username}")

    if followers.has_more:
        more = await client.twitter.users.get_followers(
            "elonmusk",
            cursor=followers.next_cursor
        )

    # Automatic pagination with async iterator
    async for follower in client.twitter.users.get_followers_all(
        "elonmusk",
        max_items=1000  # Optional limit
    ):
        print(f"@{follower.username}")

    # Collect all results into a list
    all_followers = [
        user async for user in client.twitter.users.get_followers_all(
            "elonmusk",
            max_pages=10
        )
    ]

Twitter Lists

async with ScrapeBadger(api_key="your-key") as client:
    # Search for lists
    lists = await client.twitter.lists.search("tech leaders")
    for lst in lists.data:
        print(f"{lst.name}: {lst.member_count} members")

    # Get list details
    lst = await client.twitter.lists.get_detail("123456")

    # Get list tweets
    tweets = await client.twitter.lists.get_tweets("123456")

    # Get list members
    members = await client.twitter.lists.get_members("123456")

Twitter Communities

async with ScrapeBadger(api_key="your-key") as client:
    from scrapebadger.twitter import CommunityTweetType

    # Search communities
    communities = await client.twitter.communities.search("python developers")

    # Get community details
    community = await client.twitter.communities.get_detail("123456")
    print(f"{community.name}: {community.member_count:,} members")
    print(f"Rules: {len(community.rules or [])}")

    # Get community tweets
    tweets = await client.twitter.communities.get_tweets(
        "123456",
        tweet_type=CommunityTweetType.LATEST
    )

    # Get members
    members = await client.twitter.communities.get_members("123456")

Trending Topics

async with ScrapeBadger(api_key="your-key") as client:
    from scrapebadger.twitter import TrendCategory

    # Get global trends
    trends = await client.twitter.trends.get_trends()
    for trend in trends.data:
        count = f"{trend.tweet_count:,}" if trend.tweet_count else "N/A"
        print(f"{trend.name}: {count} tweets")

    # Get trends by category
    news = await client.twitter.trends.get_trends(category=TrendCategory.NEWS)
    sports = await client.twitter.trends.get_trends(category=TrendCategory.SPORTS)

    # Get trends for a specific location (WOEID)
    us_trends = await client.twitter.trends.get_place_trends(23424977)  # US
    print(f"Trends in {us_trends.name}:")
    for trend in us_trends.trends:
        print(f"  - {trend.name}")

    # Get available trend locations
    locations = await client.twitter.trends.get_available_locations()
    us_cities = [loc for loc in locations.data if loc.country_code == "US"]

Geographic Places

async with ScrapeBadger(api_key="your-key") as client:
    # Search places by name
    places = await client.twitter.geo.search(query="San Francisco")
    for place in places.data:
        print(f"{place.full_name} ({place.place_type})")

    # Search by coordinates
    places = await client.twitter.geo.search(
        lat=37.7749,
        long=-122.4194,
        granularity="city"
    )

    # Get place details
    place = await client.twitter.geo.get_detail("5a110d312052166f")

Twitter Streams (Real-Time Monitoring)

Monitor Twitter accounts in real-time with WebSocket delivery:

import asyncio
from scrapebadger import ScrapeBadger

async def main():
    async with ScrapeBadger(api_key="your-key") as client:
        # Create a stream monitor
        monitor = await client.twitter.stream.create_monitor(
            name="Tech CEOs",
            usernames=["elonmusk", "sama", "naval"],
            poll_interval_seconds=5.0,
        )
        print(f"Monitor '{monitor.name}' created (tier: {monitor.pricing_tier})")
        print(f"Estimated cost: {monitor.estimated_credits_per_hour:.0f} credits/hour")

        # List monitors
        result = await client.twitter.stream.list_monitors(status="active")
        print(f"{result.total} active monitors")

        # Stream tweets via WebSocket
        async with client.twitter.stream.connect(reconnect=True) as events:
            async for event in events:
                if event.type == "tweet":
                    print(f"@{event.author_username}: {event.tweet.text}")
                    print(f"  Detected in {event.latency_ms}ms")
                elif event.type == "connected":
                    print(f"Connected (id: {event.connection_id})")

        # Pause/resume/delete
        await client.twitter.stream.pause_monitor(monitor.id)
        await client.twitter.stream.delete_monitor(monitor.id)

asyncio.run(main())

Webhook Verification

Verify incoming webhook signatures in your receiver:

from scrapebadger.twitter.stream import verify_webhook_signature

@app.post("/webhook")
async def handle_webhook(request):
    signature = request.headers["x-scrapebadger-signature"]
    body = await request.body()
    if not verify_webhook_signature("your-secret", body, signature):
        return JSONResponse({"error": "Invalid signature"}, status_code=401)
    event = json.loads(body)
    # Process event...

Error Handling

The SDK provides specific exception types for different error scenarios:

from scrapebadger import (
    ScrapeBadger,
    ScrapeBadgerError,
    AuthenticationError,
    RateLimitError,
    InsufficientCreditsError,
    NotFoundError,
    ValidationError,
    ServerError,
)

async with ScrapeBadger(api_key="your-key") as client:
    try:
        user = await client.twitter.users.get_by_username("elonmusk")
    except AuthenticationError:
        print("Invalid API key")
    except RateLimitError as e:
        print(f"Rate limited. Retry after {e.retry_after} seconds")
        print(f"Limit: {e.limit}, Remaining: {e.remaining}")
    except InsufficientCreditsError:
        print("Out of credits! Purchase more at scrapebadger.com")
    except NotFoundError:
        print("User not found")
    except ValidationError as e:
        print(f"Invalid parameters: {e}")
    except ServerError:
        print("Server error, try again later")
    except ScrapeBadgerError as e:
        print(f"API error: {e}")

Configuration

Custom Timeout and Retries

from scrapebadger import ScrapeBadger

client = ScrapeBadger(
    api_key="your-key",
    timeout=120.0,      # Request timeout in seconds (default: 300)
    max_retries=5,      # Retry attempts (default: 10)
)

Advanced Configuration

from scrapebadger import ScrapeBadger
from scrapebadger._internal import ClientConfig

config = ClientConfig(
    api_key="your-key",
    base_url="https://scrapebadger.com",
    timeout=300.0,
    connect_timeout=10.0,
    max_retries=10,
    retry_on_status=(502, 503, 504),
    headers={"X-Custom-Header": "value"},
)

client = ScrapeBadger(config=config)

Retry Behavior

The SDK automatically retries requests that fail with 502, 503, or 504 status codes using exponential backoff (1s, 2s, 4s, 8s, ...). Each retry logs a warning:

⚠ 503 Service Unavailable — retrying in 4s (attempt 3/10)

To see these warnings, configure Python logging:

import logging
logging.basicConfig(level=logging.WARNING)

Rate Limit Aware Pagination

When using *_all pagination methods, the SDK reads X-RateLimit-Remaining and X-RateLimit-Reset headers from each response. When remaining requests drop below 20% of your tier's limit, pagination automatically slows down to spread requests across the remaining window — preventing 429 errors. A warning is logged when throttling activates:

⚠ Rate limit: 25/300 remaining (resets in 42s), throttling pagination to ~0.6 req/s

This works transparently with all tier levels (Free: 60/min, Basic: 300/min, Pro: 1000/min, Enterprise: 5000/min).

API Reference

Twitter Endpoints

Category Methods
Tweets get_by_id, get_by_ids, search, search_all, get_user_tweets, get_user_tweets_all, get_replies, get_retweeters, get_favoriters, get_similar
Users get_by_id, get_by_username, get_about, search, search_all, get_followers, get_followers_all, get_following, get_following_all, get_follower_ids, get_following_ids, get_latest_followers, get_latest_following, get_verified_followers, get_followers_you_know, get_subscriptions, get_highlights
Lists get_detail, search, get_tweets, get_tweets_all, get_members, get_members_all, get_subscribers, get_my_lists
Communities get_detail, search, get_tweets, get_tweets_all, get_members, get_moderators, search_tweets, get_timeline
Trends get_trends, get_place_trends, get_available_locations
Geo get_detail, search
Streams create_monitor, list_monitors, get_monitor, update_monitor, pause_monitor, resume_monitor, delete_monitor, list_delivery_logs, list_billing_logs, connect

Response Models

All responses use strongly-typed Pydantic models:

  • Tweet - Tweet data with text, metrics, media, polls, etc.
  • User - User profile with bio, metrics, verification status
  • UserAbout - Extended user information
  • List - Twitter list details
  • Community - Community with rules and admin info
  • Trend - Trending topic
  • Place - Geographic place
  • PaginatedResponse[T] - Wrapper for paginated results
  • StreamMonitor - Stream monitor configuration and status
  • StreamMonitorList - Paginated list of monitors
  • TweetEvent - Real-time tweet delivery event with latency
  • ConnectedEvent, PingEvent, ErrorEvent - WebSocket lifecycle events
  • DeliveryLog, BillingLog - Audit log records

See the full API documentation for complete details.

Development

Setup

# Clone the repository
git clone https://github.com/scrape-badger/scrapebadger-python.git
cd scrapebadger-python

# Install dependencies with uv
uv sync --dev

# Install pre-commit hooks
uv run pre-commit install

Running Tests

# Run all tests
uv run pytest

# Run with coverage
uv run pytest --cov=src/scrapebadger --cov-report=html

# Run specific tests
uv run pytest tests/test_client.py -v

Code Quality

# Lint
uv run ruff check src/ tests/

# Format
uv run ruff format src/ tests/

# Type check
uv run mypy src/

# All checks
uv run ruff check src/ tests/ && uv run ruff format --check src/ tests/ && uv run mypy src/

Contributing

Contributions are welcome! Please read our Contributing Guide for details.

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Make your changes
  4. Run tests and linting (uv run pytest && uv run ruff check)
  5. Commit your changes (git commit -m 'Add amazing feature')
  6. Push to the branch (git push origin feature/amazing-feature)
  7. Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Support


Made with ❤️ by ScrapeBadger

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapebadger-0.3.1.tar.gz (43.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scrapebadger-0.3.1-py3-none-any.whl (55.5 kB view details)

Uploaded Python 3

File details

Details for the file scrapebadger-0.3.1.tar.gz.

File metadata

  • Download URL: scrapebadger-0.3.1.tar.gz
  • Upload date:
  • Size: 43.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for scrapebadger-0.3.1.tar.gz
Algorithm Hash digest
SHA256 3cec051824de9a26a631064f2d924469ebb3f08fb6cfcca118bf2c0ac13455e7
MD5 d136e142e9292d2bfb702d0723d99617
BLAKE2b-256 7dee35789aeb64eec97153b46a3b169a6e86ef6fd9fd5e002632e8102bd90a53

See more details on using hashes here.

Provenance

The following attestation bundles were made for scrapebadger-0.3.1.tar.gz:

Publisher: publish.yml on scrape-badger/scrapebadger-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file scrapebadger-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: scrapebadger-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 55.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for scrapebadger-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 23cf719efe808dd3e8ade82cbf69abad1861f4e5fa1fae20d1e4fcf2929acde1
MD5 a271a81fcd496afb513ca702d1b0c21b
BLAKE2b-256 af9332c889e8be0b6eb34efd319bcbd79ef2ce2f866d9c3689a64a5dd51a2bb3

See more details on using hashes here.

Provenance

The following attestation bundles were made for scrapebadger-0.3.1-py3-none-any.whl:

Publisher: publish.yml on scrape-badger/scrapebadger-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page