Skip to main content

Scrape comments and replies from Threads by keyword

Project description

Threads Comment Scraper

A keyword-based CLI scraper for Threads comments and replies — no account required.

Python 3.10+ License: MIT PyPI


Features

  • Keyword search: search Threads by any keyword and collect all matching comments and replies
  • No account required: session tokens are fetched automatically via headless browser on first run
  • Auto token refresh: detects expired tokens after 3 consecutive 403s and silently refreshes via headless Chromium
  • Text cleaning: removes URLs, @mentions, #hashtags, emoji, and normalizes whitespace
  • Deduplication: skips posts already scraped, tracked across restarts via checkpoint file
  • Resume support: interrupted scrapes continue from where they left off
  • Configurable via CLI: limit, output file, delay range, minimum comment length, checkpoint toggle
  • CSV output with columns: post_code, post_id, post_text, comment_id, comment_text, username, like_count, reply_count, timestamp, keyword, type

Installation

pip install threads-comment-scraper
playwright install chromium

Usage

# Scrape by inline keywords
threads-scraper --keywords "politik indonesia,pilkada"

# Use a keywords file
threads-scraper --keywords-file keywords.txt

# With all options
threads-scraper --keywords-file keywords.txt \

  --output data.csv \
  --limit 5000 \
  --delay-min 2 \
  --delay-max 5 \
  --min-length 15

keywords.txt format

Lines starting with # are treated as comments and ignored.

# Politik
politik indonesia
pilkada

# Ekonomi
ekonomi indonesia
bbm naik

CLI reference

Argument Default Description
--keywords Comma-separated keyword string
--keywords-file Path to .txt file, one keyword per line
--limit unlimited Maximum total comments to collect
--output output.csv Output CSV file path
--min-length 10 Minimum character count per comment
--delay-min 2.0 Minimum seconds between requests
--delay-max 5.0 Maximum seconds between requests
--no-checkpoint off Disable resume behavior (start fresh)

Output CSV columns

Column Description
post_code Original post shortcode from the URL (e.g. DYeZUeiElWy)
post_id Numeric media ID used by the GraphQL API
post_text Text of the top-level post being replied to
comment_id Numeric ID of the comment or reply
comment_text Cleaned comment/reply text
username Poster's Threads username
like_count Number of likes on the comment
reply_count Number of direct replies to the comment
timestamp Unix timestamp of the comment
keyword The search keyword that found this post
type comment (top-level) or reply

Notes

  • For educational and research purposes only
  • Respect Threads' Terms of Service
  • Use reasonable delays (--delay-min, --delay-max) to avoid overloading servers
  • The first run launches a headless browser to capture fresh session tokens, this is normal and takes ~10 seconds

Credit

Made by @galihkjaya @Nathaniel7

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

threads_comment_scraper-0.1.1.tar.gz (11.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

threads_comment_scraper-0.1.1-py3-none-any.whl (14.1 kB view details)

Uploaded Python 3

File details

Details for the file threads_comment_scraper-0.1.1.tar.gz.

File metadata

  • Download URL: threads_comment_scraper-0.1.1.tar.gz
  • Upload date:
  • Size: 11.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.1

File hashes

Hashes for threads_comment_scraper-0.1.1.tar.gz
Algorithm Hash digest
SHA256 c9795a023d3853f3a0d9c28402e400c47dfc3981158ebe1a4b67295fc4ccff5b
MD5 f2f59754c67a519bdd22b8f091a0b11e
BLAKE2b-256 2d043e441fa0946fdc72e403cd6aad74d9fa6f176066ec20b1011bac80d8e8af

See more details on using hashes here.

File details

Details for the file threads_comment_scraper-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for threads_comment_scraper-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 7ce327ff78c43655b57c91f683ff8209674aa97622cfa4d934e791597fd36ab2
MD5 77d838f61817c3eed580fcf828634008
BLAKE2b-256 faae6d5601ecc72d192116f06388f0000a2e02b098ed4a16abfe8cd9f25b214a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page