Zero-auth Reddit scraper CLI and library

These details have not been verified by PyPI

Project links

Homepage

Project description

Scrapi Reddit

Scrapi Reddit is a zero-auth toolkit for scraping public Reddit listings. Use the CLI for quick data pulls or import the library to integrate pagination, comment harvesting, and CSV exports into your own workflows. This scraper fetches data from Reddit's Public API and does not require any API key.

Features

Listing coverage: Scrape subreddit posts (with their comments), the front page, r/popular (geo-aware), r/all, user activity, or custom listing URLs without OAuth.
Search mode: Run keyword searches (site-wide or scoped to a subreddit) with custom type filters, sort orders, and time windows.
Comment controls: Toggle comment collection per post with resumable runs that reuse cached JSON and persist to CSV.
Post deep dives: Target individual posts to download full comment trees on demand.
Resilient fetching: Automatic pagination, exponential backoff for rate limits, and structured logging with adjustable verbosity.
Media archiving: Optional media capture downloads linked images, GIFs, and videos alongside post metadata.
Media filters: Media filters let you keep only the assets you need (e.g., videos only or static images only).
Flexible exports: Save outputs as JSON and optionally flatten posts/comments into CSV for downstream analysis.
Scriptable tooling: Configurable CLI plus Python API for scripting and integration.

Important Notes

Respect Reddit's User Agreement and local laws. Scraped data may have legal or ethical constraints.
Heavy scraping can trigger rate limits or temporary IP bans. Keep delays reasonable (I recommend 3 or 4 seconds delay).

Dependencies

Python 3.9+
requests (runtime)
pytest (tests, optional)

Installation

pip install scrapi-reddit

After installation the console entry point scrapi-reddit is available on your PATH.

Quick Start (CLI)

scrapi-reddit python --limit 200 --fetch-comments --output-format both

This command downloads up to 200 posts from r/python, fetches comments (up to 500 per post), and writes JSON + CSV outputs under ./scrapi_reddit_data.

Common CLI Options

--fetch-comments Enable post-level comment requests (defaults off).
--comment-limit 0 Request the maximum 500 comments per post.
--continue Resume a previous run by reusing cached post JSON files and skipping previously downloaded media.
--media-filter video,gif Restrict downloads to specific categories or extensions (video, image, animated, or extensions such as mp4, jpg, gif).
--search "python asyncio" --search-types post,comment --search-sort top --search-time week Query Reddit search.json with flexible filters (types: post/link, comment, sr, user, media).
--download-media Save linked images/GIFs/videos under each target's media directory.
--popular --popular-geo <region-code> Pull popular listings with geo filters.
--user <name> Scrape user overview/submitted/comments sections.

Advanced CLI Examples

Fetch multiple subreddits with varied sorts and time windows, downloading all fetched media:

scrapi-reddit python typescript --subreddit-sorts top,hot --subreddit-top-times day,all --limit 500 --output-format both --download-media

Resume a long run after interruption:

scrapi-reddit python --fetch-comments --continue --limit 1000 --log-level INFO

Download a single post (JSON + CSV):

scrapi-reddit --post-url https://www.reddit.com/r/python/comments/xyz789/example_post/

Fetch top search results with the keyword "python asyncio", including the comments for each fetched post and download all media:

scrapi-reddit --search "python asyncio" --search-types post,comment --search-sort top --search-time week --limit 200 --output-format both --fetch-comments --download-media

Python API

Import the library when you need finer control inside Python scripts.

Step 1 – Configure a session

from scrapi_reddit import build_session

session = build_session("your-app-name/0.1", verify=True)

Step 2 – Define scrape options

from pathlib import Path
from scrapi_reddit import ScrapeOptions

options = ScrapeOptions(
    output_root=Path("./scrapes"),
    listing_limit=250,
    comment_limit=0,      # auto-expand to 500
    delay=3.0,
    time_filter="day",
    output_formats={"json", "csv"},
    fetch_comments=True,
    resume=True,          # reuse cached JSON/media on reruns
    download_media=True,
    media_filters={"video", ".mp4"},
)

Step 3 – Scrape a listing or search

from scrapi_reddit import ListingTarget, build_search_target, process_listing

target = ListingTarget(
    label="r/python top (day)",
    output_segments=("subreddits", "python", "top_day"),
    url="https://www.reddit.com/r/python/top/.json",
    params={"t": "day"},
    context="python",
)

process_listing(target, session=session, options=options)

search_target = build_search_target(
    "python asyncio",
    search_types=["comment"],
    sort="new",
    time_filter="day",
)

process_listing(search_target, session=session, options=options)

Step 4 – Scrape a single post

from scrapi_reddit import PostTarget, process_post

post_target = PostTarget(
    label="Example post",
    output_segments=("posts", "python", "xyz789"),
    url="https://www.reddit.com/r/python/comments/xyz789/example_post/.json",
)

process_post(post_target, session=session, options=options)

Both helpers write JSON/CSV to the configured output directory and emit progress via logging. When download_media=True (or --download-media on the CLI) any discoverable images, GIFs, and videos are saved under a media/ directory per target. Media is organized by the item that produced it: media/posts/<format>/ for post attachments and (when comment scraping is enabled) media/comments/<format>/ for comment attachments. Formats include mp4, webm, gif, jpg, and png; additional extensions fall back to an other/ directory. Reddit preview URLs occasionally expire, so you may see warning logs for 404 responses when older links have been removed.

Contributing

Bug reports and pull requests are welcome. For feature requests or questions, please open an issue. When contributing, add tests that cover new behavior and ensure python -m pytest passes before submitting a PR.

License

Released under the MIT License. You may use, modify, and distribute this project with attribution and a copy of the license. Use at your own risk.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.2.2

Nov 6, 2025

0.2.1

Nov 6, 2025

0.2.0

Nov 6, 2025

This version

0.1.0

Nov 6, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapi_reddit-0.1.0.tar.gz (29.0 kB view details)

Uploaded Nov 6, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

scrapi_reddit-0.1.0-py3-none-any.whl (24.1 kB view details)

Uploaded Nov 6, 2025 Python 3

File details

Details for the file scrapi_reddit-0.1.0.tar.gz.

File metadata

Download URL: scrapi_reddit-0.1.0.tar.gz
Upload date: Nov 6, 2025
Size: 29.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for scrapi_reddit-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`37ee07899cbd361b58568fa66c6b0919227b229b3108608d668a65b3bcd6e672`
MD5	`890ccf5ca8463acb3f46054692043052`
BLAKE2b-256	`91626039ec4bb932c7393eda3d5b6a5af109c25aa34e02158eacf4d9feb379e7`

See more details on using hashes here.

File details

Details for the file scrapi_reddit-0.1.0-py3-none-any.whl.

File metadata

Download URL: scrapi_reddit-0.1.0-py3-none-any.whl
Upload date: Nov 6, 2025
Size: 24.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for scrapi_reddit-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7b8838c54173af403d88f09d52ecd5b8033dfd2574c04887a960d9280d9fc545`
MD5	`9707621c493e949a67251bc3b3b1a690`
BLAKE2b-256	`7a555776523d3a0def483d8598a681bcd495b9fff4015cebae9acdd3e69dde95`

See more details on using hashes here.

scrapi-reddit 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

Scrapi Reddit

Features

Important Notes

Dependencies

Installation

Quick Start (CLI)

Common CLI Options

Advanced CLI Examples

Python API

Step 1 – Configure a session

Step 2 – Define scrape options

Step 3 – Scrape a listing or search

Step 4 – Scrape a single post

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes