API-only tweet search scraping (Twitter/X web GraphQL)

These details have not been verified by PyPI

Project links

Project description

Scweet v5 — Full Documentation

Scweet is an API-only Twitter/X scraper built on the web GraphQL endpoints. It handles account pooling, rate limiting, cooldowns, resume, and output persistence — all backed by a local SQLite database.

pip install -U Scweet

from Scweet import Scweet, ScweetConfig, ScweetDB, configure_logging

Account Setup

Scweet needs Twitter/X account cookies to make authenticated API requests.

Getting your cookies

Log into Twitter/X in your browser
Open DevTools (F12) > Application > Cookies > https://x.com
Copy auth_token and ct0 values

Option A: cookies.json (recommended)

Create a cookies.json file:

[
  {
    "username": "your_account",
    "cookies": { "auth_token": "...", "ct0": "..." }
  }
]

s = Scweet(cookies_file="cookies.json")

For multiple accounts (enables concurrent scraping):

[
  { "username": "account1", "cookies": { "auth_token": "...", "ct0": "..." } },
  { "username": "account2", "cookies": { "auth_token": "...", "ct0": "..." } }
]

Option B: auth_token (quickest)

If you just have an auth_token, Scweet will bootstrap ct0 automatically:

s = Scweet(auth_token="YOUR_AUTH_TOKEN")

Option C: inline cookies

Pass cookies directly — useful for scripts and one-off runs:

# Single account
s = Scweet(cookies={"auth_token": "...", "ct0": "..."})

# Multiple accounts
s = Scweet(cookies=[
    {"auth_token": "tok1", "ct0": "ct0_1"},
    {"auth_token": "tok2", "ct0": "ct0_2"},
])

How provisioning works

When you create a Scweet instance, it provisions your accounts into a local SQLite database (scweet_state.db by default). This means your cookies are imported, validated, and stored so Scweet can manage them — tracking rate limits, cooldowns, daily caps, and lease state across requests.

This happens automatically on init (provision=True by default). You provide cookies once, and Scweet handles the rest. If an account already exists in the DB (matched by username or auth_token), it's updated rather than duplicated.

Reuse existing DB

Because accounts are persisted in SQLite, you don't need to provide cookies every time. After the first run, you can just point to the DB:

s = Scweet(db_path="scweet_state.db")

This reuses your previously provisioned accounts with all their state (daily counters, cooldowns, etc.) intact.

You can also skip provisioning entirely if you only want to work with accounts already in the DB:

s = Scweet(db_path="scweet_state.db", provision=False)

Controlling Limits

Every method that paginates (search, get_profile_tweets, get_followers, get_following) accepts a limit parameter — the maximum number of items to collect in that call. If omitted (None), scraping continues until results are exhausted or account daily caps are hit.

Always set a limit to avoid burning through your account quota unexpectedly:

tweets = s.search("python", limit=200)              # stop after 200 tweets
tweets = s.get_profile_tweets(["elonmusk"], limit=100)
users  = s.get_followers(["elonmusk"], limit=500)
users  = s.get_following(["OpenAI"], limit=500)

There are two layers of limits:

Layer	Where	What it controls
Per-call `limit`	Method argument	Max items returned by a single call
Account daily caps	`ScweetConfig`	Max API requests / tweets per account per UTC day

The per-call limit is what you should set in normal usage. The account daily caps (daily_requests_limit, daily_tweets_limit) are safety nets that protect your account from over-use across all calls in a day — see Rate Limiting in the config reference.

get_user_info does not paginate (one API call per user), so it has no limit parameter.

Search API

Basic search

s = Scweet(cookies_file="cookies.json")

# Defaults to last 7 days if since/until omitted
tweets = s.search("python programming", limit=100)

# Explicit date range
tweets = s.search("python programming", since="2024-01-01", until="2024-02-01", limit=500)
print(f"Found {len(tweets)} tweets")

Both since and until are optional. since defaults to 7 days ago, until defaults to today.

Structured filters

All filters are optional and merge with the query string:

tweets = s.search(
    since="2024-01-01",
    from_users=["elonmusk"],
    min_likes=100,
    has_images=True,
    lang="en",
    limit=200,
)

Combining a query string with filters:

tweets = s.search(
    "AI tools",
    since="2024-01-01",
    from_users=["OpenAI"],
    min_likes=50,
    limit=100,
)

Available filters

Parameter	Type	Description
`all_words`	`list[str]`	All words must appear (AND)
`any_words`	`list[str]`	Any word can appear (OR)
`exact_phrases`	`list[str]`	Exact phrase match
`exclude_words`	`list[str]`	Exclude tweets with these words
`hashtags_any`	`list[str]`	Match any of these hashtags
`hashtags_exclude`	`list[str]`	Exclude these hashtags
`from_users`	`list[str]`	Tweets from these users
`to_users`	`list[str]`	Tweets to these users
`mentioning_users`	`list[str]`	Tweets mentioning these users
`tweet_type`	`str`	`all`, `originals_only`, `replies_only`, `exclude_replies`
`verified_only`	`bool`	Verified accounts only
`blue_verified_only`	`bool`	Blue verified only
`has_images`	`bool`	Must contain images
`has_videos`	`bool`	Must contain videos
`has_links`	`bool`	Must contain links
`has_mentions`	`bool`	Must contain mentions
`has_hashtags`	`bool`	Must contain hashtags
`min_likes`	`int`	Minimum likes
`min_replies`	`int`	Minimum replies
`min_retweets`	`int`	Minimum retweets
`place`	`str`	Place filter
`geocode`	`str`	Geocode filter (e.g., `"37.7749,-122.4194,10km"`)
`near`	`str`	Near location
`within`	`str`	Within radius (e.g., `"15mi"`)

Standard parameters

Parameter	Type	Default	Description
`query`	`str`	`""`	Raw query string (Twitter search operators)
`since`	`str`	7 days ago	Start date (`YYYY-MM-DD`)
`until`	`str`	today	End date (`YYYY-MM-DD`)
`lang`	`str`	`None`	Language filter (e.g., `"en"`)
`display_type`	`str`	`"Top"`	`"Top"` or `"Latest"`
`limit`	`int`	`None`	Max tweets to collect. `None` = no cap (scrapes until exhausted). Recommended to always set.
`max_empty_pages`	`int`	config value	Stop after N consecutive empty pages
`resume`	`bool`	`False`	Resume from last checkpoint
`save`	`bool`	`False`	Save results to disk
`save_format`	`str`	config value	`"csv"`, `"json"`, or `"both"`

Async variant

tweets = await s.asearch("query", since="2024-01-01", limit=100)

Profile Tweets

Fetch tweets from user timelines:

tweets = s.get_profile_tweets(["elonmusk", "OpenAI"], limit=100)

# With options
tweets = s.get_profile_tweets(
    ["elonmusk"],
    limit=500,
    max_empty_pages=2,
    save=True,
    save_format="json",
)

Parameter	Type	Default	Description
`users`	`list[str]`	required	Usernames or profile URLs
`limit`	`int`	`None`	Max tweets to collect. `None` = no cap. Recommended to always set.
`max_empty_pages`	`int`	config value	Stop after N consecutive empty pages
`resume`	`bool`	`False`	Resume from last checkpoint
`save`	`bool`	`False`	Save results to disk
`save_format`	`str`	config value	`"csv"`, `"json"`, or `"both"`

Async: await s.aget_profile_tweets(["elonmusk"], limit=100)

Followers / Following

# Followers
users = s.get_followers(["elonmusk"], limit=1000)

# Following
users = s.get_following(["OpenAI"], limit=500)

Parameter	Type	Default	Description
`users`	`list[str]`	required	Usernames or profile URLs
`limit`	`int`	`None`	Max users to collect. `None` = no cap. Recommended to always set.
`max_empty_pages`	`int`	config value	Stop after N consecutive empty pages
`resume`	`bool`	`False`	Resume from last checkpoint
`raw_json`	`bool`	`False`	Include full Twitter user payload under `raw` key
`save`	`bool`	`False`	Save results to disk
`save_format`	`str`	config value	`"csv"`, `"json"`, or `"both"`

raw_json option

By default, follower/following records contain curated fields. With raw_json=True, each record includes the full Twitter user payload under a raw key (CSV output stays curated regardless):

users = s.get_followers(["elonmusk"], limit=100, raw_json=True)
# users[0]["raw"] contains the full GraphQL user object

Async: await s.aget_followers(["elonmusk"], limit=500) / await s.aget_following(["OpenAI"], limit=500)

User Info

Fetch profile information for one or more users:

profiles = s.get_user_info(["elonmusk", "OpenAI"])
# Returns list of dicts with profile fields

Parameter	Type	Default	Description
`users`	`list[str]`	required	Usernames or profile URLs
`save`	`bool`	`False`	Save results to disk
`save_format`	`str`	config value	`"csv"`, `"json"`, or `"both"`

Async: await s.aget_user_info(["elonmusk"])

Saving Results

By default, results are returned in-memory only (save=False). To persist to disk:

# Save as CSV (default format)
tweets = s.search("query", since="2024-01-01", limit=200, save=True)

# Save as JSON
tweets = s.search("query", since="2024-01-01", limit=200, save=True, save_format="json")

# Save both CSV and JSON
tweets = s.search("query", since="2024-01-01", limit=200, save=True, save_format="both")

Output files are written to the save_dir directory (default: "outputs"). File names are based on the operation type (search.csv, profile_tweets.json, followers.csv, etc.).

The default format can be set globally via ScweetConfig(save_format="json").

Resume Interrupted Searches

Resume a search from where it left off using SQLite cursor checkpoints:

# First run — gets interrupted or completes partially
tweets = s.search("query", since="2024-01-01", until="2024-06-01", limit=1000)

# Resume — picks up from last saved checkpoint
tweets = s.search("query", since="2024-01-01", until="2024-06-01", limit=1000, resume=True)

Resume works by matching a hash of the query parameters. The same since, until, query, lang, and display_type must be provided to resume correctly.

Configuration Reference

All fields have sensible defaults. Override with ScweetConfig:

from Scweet import Scweet, ScweetConfig

s = Scweet(
    cookies_file="cookies.json",
    config=ScweetConfig(
        concurrency=3,
        proxy="http://user:pass@host:port",
        min_delay_s=2.0,
    ),
)

Core

Field	Type	Default	Description
`db_path`	`str`	`"scweet_state.db"`	SQLite state file path
`proxy`	`str \| dict \| None`	`None`	HTTP proxy for API calls
`concurrency`	`int`	`5`	Number of parallel workers

Output

Field	Type	Default	Description
`save_dir`	`str`	`"outputs"`	Default output directory
`save_format`	`str`	`"csv"`	Default format: `"csv"`, `"json"`, or `"both"`

HTTP Tuning

Field	Type	Default	Description
`api_http_mode`	`str`	`"auto"`	HTTP mode: `"auto"`, `"async"`, `"sync"`
`api_http_impersonate`	`str \| None`	`None`	Browser impersonation target for curl_cffi
`api_user_agent`	`str \| None`	`None`	Custom User-Agent string

Rate Limiting

Field	Type	Default	Description
`daily_requests_limit`	`int`	`30`	Max API requests per account per day
`daily_tweets_limit`	`int`	`600`	Max tweets per account per day
`max_empty_pages`	`int`	`1`	Stop after N consecutive empty result pages
`api_page_size`	`int`	`20`	Tweets per API page (1-100)
`min_delay_s`	`float`	`2.0`	Minimum delay between requests
`requests_per_min`	`int`	`30`	Rate limit per account per minute

Advanced

Field	Type	Default	Description
`enable_wal`	`bool`	`True`	SQLite WAL mode
`busy_timeout_ms`	`int`	`5000`	SQLite busy timeout
`lease_ttl_s`	`int`	`120`	Account lease time-to-live
`lease_heartbeat_s`	`float`	`30.0`	Heartbeat interval for active leases
`cooldown_default_s`	`float`	`120.0`	Default cooldown after rate limit
`transient_cooldown_s`	`float`	`120.0`	Cooldown for transient errors (e.g., 404/stale query IDs)
`auth_cooldown_s`	`float`	`2592000.0`	Cooldown for auth failures (30 days)
`cooldown_jitter_s`	`float`	`10.0`	Random jitter added to cooldowns
`task_retry_base_s`	`int`	`1`	Base delay for task retry backoff
`task_retry_max_s`	`int`	`30`	Max delay for task retry backoff
`max_task_attempts`	`int`	`3`	Max retry attempts per task
`max_fallback_attempts`	`int`	`3`	Max fallback attempts on failure
`max_account_switches`	`int`	`2`	Max account switches per task
`scheduler_min_interval_s`	`int`	`300`	Minimum time interval split (seconds)
`n_splits`	`int`	`5`	Number of time interval splits for search
`priority`	`int`	`1`	Task priority
`strict`	`bool`	`False`	Raise exceptions instead of returning empty
`proxy_check_on_lease`	`bool`	`True`	Verify proxy connectivity before leasing
`proxy_check_url`	`str`	`"https://x.com/robots.txt"`	URL for proxy check
`proxy_check_timeout_s`	`float`	`10.0`	Timeout for proxy check
`profile_timeline_allow_anonymous`	`bool`	`False`	Allow anonymous profile timeline requests

Manifest (Query IDs)

Field	Type	Default	Description
`manifest_url`	`str \| None`	`None`	Remote manifest URL for query IDs
`manifest_ttl_s`	`int`	`3600`	Cache TTL for remote manifest
`manifest_update_on_init`	`bool`	`False`	Fetch remote manifest on init
`manifest_scrape_on_init`	`bool`	`False`	Scrape fresh query IDs from X on init

Auto-Updating Query IDs

Twitter/X rotates GraphQL query IDs periodically. When IDs go stale, requests return 404. Scweet ships with default IDs that work at release time, but you can auto-fetch fresh ones:

s = Scweet(
    cookies_file="cookies.json",
    config=ScweetConfig(manifest_scrape_on_init=True),
)

This fetches the current main.js bundle from X on init and extracts the latest query IDs. It adds a few seconds to startup but ensures your requests use current IDs.

Account Management (ScweetDB)

ScweetDB provides direct access to the SQLite state for account inspection and management:

from Scweet import ScweetDB

db = ScweetDB("scweet_state.db")

accounts_summary()

summary = db.accounts_summary()
# {"db_path": "...", "total": 5, "eligible": 3, "unusable": 1, "cooling_down": 1, ...}

list_accounts()

accounts = db.list_accounts(limit=10, eligible_only=True)
# Returns list of account dicts with redacted secrets (fingerprints only)

# Include cookie keys
accounts = db.list_accounts(include_cookies=True)

# Reveal full secrets (use with caution)
accounts = db.list_accounts(reveal_secrets=True)

get_account(username)

account = db.get_account("my_account")

repair_account(username)

Reset cooldowns, clear leases, and optionally refresh auth tokens:

result = db.repair_account("my_account")
# {"updated": 1, "changes": ["cooldown_cleared", "lease_cleared", ...], ...}

# Force token refresh even if auth material looks valid
result = db.repair_account("my_account", force_refresh=True)

reset_account_cooldowns()

# Reset all account cooldowns
db.reset_account_cooldowns()

# Reset specific accounts
db.reset_account_cooldowns(usernames=["account1", "account2"])

# Include unusable accounts (reactivates them)
db.reset_account_cooldowns(include_unusable=True)

clear_leases()

# Clear expired leases only (safe)
db.clear_leases(expired_only=True)

# Clear all leases
db.clear_leases(expired_only=False)

reset_daily_counters()

db.reset_daily_counters()

Other methods

delete_account(username) — Remove an account from the pool.
set_account_proxy(username, proxy) — Set or clear a per-account proxy override.
mark_account_unusable(username) — Mark an account as unusable (won't be leased).
import_accounts_from_sources(...) — Import accounts from files/cookies into the DB.
collapse_duplicates_by_auth_token(dry_run=True) — Find and merge duplicate accounts.
get_checkpoint(query_hash) / clear_checkpoint(query_hash) / clear_all_checkpoints() — Manage resume checkpoints.
list_runs(limit=50) / last_run() / runs_summary() — Inspect run history.

Logging

Scweet uses Python's logging module. By default, no output is produced (NullHandler). Opt in with configure_logging():

from Scweet import configure_logging

# Simple — high-level flow logs
configure_logging(profile="simple", level="INFO", force=True)

# Detailed — includes per-request API logs and file/line info
configure_logging(profile="detailed", level="DEBUG", force=True)

Parameter	Type	Default	Description
`level`	`str \| int`	`"INFO"`	Log level
`profile`	`str`	`"simple"`	`"simple"` or `"detailed"`
`force`	`bool`	`False`	Replace existing handlers
`stream`	`TextIO`	`sys.stdout`	Output stream
`fmt`	`str \| None`	`None`	Custom format string
`show_api_http`	`bool \| None`	`None`	Show per-request API logs
`api_level`	`str \| int \| None`	`None`	Override api_engine log level
`transaction_level`	`str \| int \| None`	`None`	Override transaction log level

Async Usage

All public methods have async variants. Use them in async contexts:

import asyncio
from Scweet import Scweet

async def main():
    s = Scweet(cookies_file="cookies.json")

    tweets = await s.asearch("query", since="2024-01-01", limit=100)
    profiles = await s.aget_user_info(["elonmusk"])
    followers = await s.aget_followers(["elonmusk"], limit=500)

asyncio.run(main())

The sync methods (search, get_followers, etc.) wrap their async counterparts with asyncio.run(), so they cannot be called from within an already-running event loop.

No close needed

Scweet and ScweetDB don't require explicit closing. HTTP sessions are created and closed per-request internally, and SQLite connections are scoped per-operation. You can create a Scweet instance, use it, and let it go out of scope — no resource leaks.

Error Handling

Strict vs non-strict mode

By default (strict=False), all methods return [] when something goes wrong — no accounts available, network errors, proxy failures, etc. Errors are logged but don't crash your code:

s = Scweet(cookies_file="cookies.json")
tweets = s.search("query")  # Returns [] on any error

With strict=True, exceptions are raised instead, so you can handle them explicitly:

from Scweet import Scweet, ScweetConfig, AccountPoolExhausted, NetworkError

s = Scweet(
    cookies_file="cookies.json",
    config=ScweetConfig(strict=True),
)

try:
    tweets = s.search("query")
except AccountPoolExhausted:
    print("No accounts available — check cooldowns or add more accounts")
except NetworkError:
    print("Network issue — check your connection or proxy")

This applies to all methods: search, get_profile_tweets, get_followers, get_following, and get_user_info.

Exception hierarchy

All Scweet exceptions inherit from ScweetError, so you can catch everything with a single handler:

ScweetError                          # Base — catch-all
  AccountPoolExhausted               # No eligible accounts (all cooled down / at daily limits)
  EngineError                        # Engine-level runtime error
    RunFailed                        # Run completed but couldn't produce results
      NetworkError                   # Network/connectivity failure
      ProxyError                     # Proxy misconfiguration or connectivity failure

All exceptions are importable from the top-level package:

from Scweet import ScweetError, AccountPoolExhausted, RunFailed, NetworkError, ProxyError, EngineError

Migration from v4

v4	v5
`Scweet.from_sources(...)`	`Scweet(cookies_file=...)`
`scweet.scrape(words=["bitcoin"], ...)`	`s.search("bitcoin", ...)`
`scweet.ascrape(...)`	`s.asearch(...)`
`scweet.profile_tweets(usernames=[...])`	`s.get_profile_tweets([...])`
`scweet.get_user_information(usernames=[...])`	`s.get_user_info([...])`
`ScweetConfig.from_sources(overrides={...})`	`ScweetConfig(field=value)`
Nested config (`pool.concurrency`)	Flat config (`concurrency`)
`from Scweet.scweet import Scweet`	`from Scweet import Scweet`

MIT License

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

5.3

Apr 14, 2026

5.2

Apr 3, 2026

5.1

Mar 25, 2026

This version

5.0

Mar 20, 2026

4.0

Feb 10, 2026

3.2

Apr 16, 2025

3.1

Apr 15, 2025

3.0

Apr 14, 2025

1.8

Jan 3, 2022

1.7

Jan 3, 2022

1.6

Sep 11, 2021

1.5

Aug 16, 2021

1.4

Aug 9, 2021

1.2

Aug 9, 2021

1.1

Jul 24, 2021

1.0

Apr 21, 2021

0.3.3

Jan 20, 2021

0.3.1

Apr 21, 2021

0.2.1

Dec 29, 2020

0.2

Dec 25, 2020

0.1.2

Dec 25, 2020

0.1.1

Dec 25, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scweet-5.0.tar.gz (128.2 kB view details)

Uploaded Mar 20, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

scweet-5.0-py3-none-any.whl (94.2 kB view details)

Uploaded Mar 20, 2026 Python 3

File details

Details for the file scweet-5.0.tar.gz.

File metadata

Download URL: scweet-5.0.tar.gz
Upload date: Mar 20, 2026
Size: 128.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.7

File hashes

Hashes for scweet-5.0.tar.gz
Algorithm	Hash digest
SHA256	`f604cfde15e888b44f9e61fb29f499a9917c60fa8c0327dd50b749ee9809d62a`
MD5	`97881c0c1b1abd8e0d6b9b3332e0d250`
BLAKE2b-256	`9bd214d34731de4d8ecad5f118a702986bb0e7e508772863af6f34d9709f4112`

See more details on using hashes here.

File details

Details for the file scweet-5.0-py3-none-any.whl.

File metadata

Download URL: scweet-5.0-py3-none-any.whl
Upload date: Mar 20, 2026
Size: 94.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.7

File hashes

Hashes for scweet-5.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7369ce73e2bd1e70f986a50d9fbf829ac27d89b3034af058a4040cb032d78cde`
MD5	`7ac07d4c9664bc0e9e94129e90be9dbc`
BLAKE2b-256	`b5f60c3e5b14e0176fef86c1e570731fc29490416f32a64ab958296a5dbe6a33`

See more details on using hashes here.

Scweet 5.0

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

Scweet v5 — Full Documentation

Account Setup

Getting your cookies

Option A: cookies.json (recommended)

Option B: auth_token (quickest)

Option C: inline cookies

How provisioning works

Reuse existing DB

Controlling Limits

Search API

Basic search

Structured filters

Available filters

Standard parameters

Async variant

Profile Tweets

Followers / Following

raw_json option

User Info

Saving Results

Resume Interrupted Searches

Configuration Reference

Core

Output

HTTP Tuning

Rate Limiting

Advanced

Manifest (Query IDs)

Auto-Updating Query IDs

Account Management (ScweetDB)

accounts_summary()

list_accounts()

get_account(username)

repair_account(username)

reset_account_cooldowns()

clear_leases()

reset_daily_counters()

Other methods

Logging

Async Usage

No close needed

Error Handling

Strict vs non-strict mode

Exception hierarchy

Migration from v4

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes