Scrape comments and replies from Threads by keyword

These details have not been verified by PyPI

Project links

Project description

threadscraper

A keyword-based CLI scraper for Threads comments and replies — no account required.

Python 3.10+ License: MIT PyPI

Features

Keyword search — search Threads by any keyword and collect all matching comments and replies
No account required — session tokens are fetched automatically via headless browser on first run
Auto token refresh — detects expired tokens after 3 consecutive 403s and silently refreshes via headless Chromium
Text cleaning — removes URLs, @mentions, #hashtags, emoji, and normalizes whitespace
Deduplication — skips posts already scraped, tracked across restarts via checkpoint file
Resume support — interrupted scrapes continue from where they left off
Configurable via CLI — limit, output file, delay range, minimum comment length, checkpoint toggle
CSV output with columns: post_code, post_id, post_text, comment_id, comment_text, username, like_count, reply_count, timestamp, keyword, type

Installation

pip install threads-scraper
playwright install chromium

Usage

# Scrape by inline keywords
threads-scraper --keywords "politik indonesia,pilkada"

# Use a keywords file
threads-scraper --keywords-file keywords.txt

# With all options
threads-scraper --keywords-file keywords.txt \
  --output data.csv \
  --limit 5000 \
  --delay-min 2 \
  --delay-max 5 \
  --min-length 15

keywords.txt format

Lines starting with # are treated as comments and ignored.

# Politik
politik indonesia
pilkada

# Ekonomi
ekonomi indonesia
bbm naik

CLI reference

Argument	Default	Description
`--keywords`	—	Comma-separated keyword string
`--keywords-file`	—	Path to `.txt` file, one keyword per line
`--limit`	unlimited	Maximum total comments to collect
`--output`	`output.csv`	Output CSV file path
`--min-length`	`10`	Minimum character count per comment
`--delay-min`	`2.0`	Minimum seconds between requests
`--delay-max`	`5.0`	Maximum seconds between requests
`--no-checkpoint`	off	Disable resume behavior (start fresh)

Output CSV columns

Column	Description
`post_code`	Original post shortcode from the URL (e.g. `DYeZUeiElWy`)
`post_id`	Numeric media ID used by the GraphQL API
`post_text`	Text of the top-level post being replied to
`comment_id`	Numeric ID of the comment or reply
`comment_text`	Cleaned comment/reply text
`username`	Poster's Threads username
`like_count`	Number of likes on the comment
`reply_count`	Number of direct replies to the comment
`timestamp`	Unix timestamp of the comment
`keyword`	The search keyword that found this post
`type`	`comment` (top-level) or `reply`

Notes

For educational and research purposes only
Respect Threads' Terms of Service
Use reasonable delays (--delay-min, --delay-max) to avoid overloading servers
The first run launches a headless browser to capture fresh session tokens — this is normal and takes ~10 seconds

Credit

Made by @galihkjaya

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.1

May 19, 2026

This version

0.1.0

May 19, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

threads_comment_scraper-0.1.0.tar.gz (11.4 kB view details)

Uploaded May 19, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

threads_comment_scraper-0.1.0-py3-none-any.whl (14.0 kB view details)

Uploaded May 19, 2026 Python 3

File details

Details for the file threads_comment_scraper-0.1.0.tar.gz.

File metadata

Download URL: threads_comment_scraper-0.1.0.tar.gz
Upload date: May 19, 2026
Size: 11.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.1

File hashes

Hashes for threads_comment_scraper-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`774fadff986d5527df147c283ac6093192a4e532c5e0a05f2b2cd7aab0b4f185`
MD5	`d462e1058bcb103105ef996c81045972`
BLAKE2b-256	`c07ced86b74dc2be4f127e6c40973c05a7b916dfd21a2162d546a6baf7271804`

See more details on using hashes here.

File details

Details for the file threads_comment_scraper-0.1.0-py3-none-any.whl.

File metadata

Download URL: threads_comment_scraper-0.1.0-py3-none-any.whl
Upload date: May 19, 2026
Size: 14.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.1

File hashes

Hashes for threads_comment_scraper-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8f572a659eea7e0d055bc307de90429b2dad58c4c1ac9eda4bee734dab94098f`
MD5	`df5cfef95e3018819f6448627a24b6b6`
BLAKE2b-256	`d8ea097cf0a48f7753810ef8c684b9a40cefa840f66ca1961d8fa9fefe334b7d`

See more details on using hashes here.

threads-comment-scraper 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

threadscraper

Features

Installation

Usage

keywords.txt format

CLI reference

Output CSV columns

Notes

Credit

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes