Skip to main content

Python library for scraping Reddit data, powered by a .NET 10 backend

Project description

RedScrapsLib

A Python library for scraping Reddit data — posts, comments, and user activity — without needing the official API. The scraping logic is written in C# (.NET 10) and exposed to Python via pythonnet, with automatic rate-limit handling built in.

PyPI Python Platform License


Requirements

Requirement Details
Python 3.10+
.NET Runtime .NET 10 — must be installed separately
Platform Windows x64, macOS 12+ (Apple Silicon & Intel), Linux x86_64 (incl. WSL2)

Note: pip installs the Python wrapper and the compiled .NET assembly, but cannot install the .NET runtime itself. Download and install it from the link above before using the library.


Installation

pip install redscrapslib

Quick Start

import RedScrapsLib as rs

# Must be called once before anything else
rs.init(user_agent="MyBot/1.0")

# Fetch posts from a subreddit
posts = rs.get_home("python", limit=10)
for post in posts.Posts:
    print(post.Title, post.Author)

# Fetch comments on a specific post
comments = rs.get_comments("python", post_id="abc123", limit=50)
for comment in comments.Comments:
    print(comment.Author, comment.Body)

# Fetch a user's post submissions
submissions = rs.get_user_posts("spez", limit=25)
for post in submissions.Posts:
    print(post.Title, post.Subreddit)

# Fetch a user's comments
user_comments = rs.get_user_comments("spez", limit=25)
for comment in user_comments.Comments:
    print(comment.Body, comment.Subreddit)

# Check session statistics
print(rs.get_stats())
# {'calls': 4, 'rate_limit_hits': 0, 'total_wait_seconds': 0.0}

Rate Limiting

Reddit's unofficial API enforces a hard limit of roughly 100 requests per window. RedScrapsLib handles this automatically — no extra code needed.

When a 429 response is received, the library:

  1. Reads the Retry-After header (defaults to 60s if absent)
  2. Prints a message so you know it's waiting
  3. Sleeps for the required time
  4. Retries the request transparently
[RedScrapsLib] Rate limited on get_home. Waiting 60s... (hit #1, 60s waited total)

This means you can run long loops without worrying about crashes:

rs.init(user_agent="MyBot/1.0")

for subreddit in my_list:
    data = rs.get_home(subreddit)  # sleeps and retries automatically if rate limited
    process(data)

print(rs.get_stats())
# {'calls': 250, 'rate_limit_hits': 3, 'total_wait_seconds': 780.0}

Based on testing: Reddit allows ~100 requests before rate limiting, then applies ~480s penalties for sustained hammering. For bulk scraping, adding a small delay between calls avoids the heavy penalty entirely.


API Reference

init(user_agent=None, debug=False)

Initialises the scraper. Must be called once before any other function.

Parameter Type Default Description
user_agent str | None None Custom User-Agent string sent with every request. Defaults to "RedScrapsBot"
debug bool False Prints step-by-step logs for each request when True

get_home(subreddit, sort="hot", limit=100, time=None, after=None) → HomeSent

Fetches posts from a subreddit.

Parameter Type Default Description
subreddit str Subreddit name (without r/)
sort str "hot" "hot", "new", "top", "rising"
limit int 100 Number of posts to fetch (max 100 per request)
time str | None None Time filter for "top": "hour", "day", "week", "month", "year", "all"
after str | None None Post ID to paginate from

Returns: HomeSent

HomeSent
├── Subreddit     str
├── FirstID       str
├── LastID        str          ← use as `after` to paginate
├── TotalPosts    int
└── Posts         List[Post]
    ├── PostID    str | None
    ├── Title     str | None
    ├── Author    str | None
    ├── SelfText  str | None
    └── Link      str | None

get_comments(subreddit, post_id, sort="confidence", limit=100) → CommentSent

Fetches comments for a specific post.

Parameter Type Default Description
subreddit str Subreddit the post belongs to
post_id str Post ID (e.g. "abc123")
sort str "confidence" "confidence", "top", "new", "controversial", "old"
limit int 100 Max number of comments to fetch

Returns: CommentSent

CommentSent
├── PostID        str | None
├── Title         str | None
├── Author        str | None
├── Selftext      str | None
├── Subreddit     str | None
├── Num_comments  int | None
├── Permalink     str | None
└── Comments      List[Comment]
    ├── CommentID str | None
    ├── Author    str | None
    ├── ParentID  str | None
    └── Body      str | None

get_user_posts(user, sort=None, limit=None, time=None, after=None) → UserSubmittedSent

Fetches a user's post submissions.

Parameter Type Default Description
user str Reddit username (without u/)
sort str | None None "hot", "new", "top", "controversial"
limit int | None None Number of posts to fetch
time str | None None Time filter when using "top"
after str | None None Post ID to paginate from

Returns: UserSubmittedSent

UserSubmittedSent
├── Username      str
├── FirstID       str
├── LastID        str          ← use as `after` to paginate
├── TotalCount    int
└── Posts         List[Post]
    ├── PostID       str | None
    ├── Title        str | None
    ├── Author       str | None
    ├── Subreddit    str | None
    ├── SelfText     str | None
    ├── Link         str | None
    ├── Upvotes      int | None
    ├── CommentCount int | None
    └── CreatedUtc   float

get_user_comments(user, sort=None, limit=None, time=None, after=None) → UserCommentsSent

Fetches a user's comment history.

Parameter Type Default Description
user str Reddit username (without u/)
sort str | None None "hot", "new", "top", "controversial"
limit int | None None Number of comments to fetch
time str | None None Time filter when using "top"
after str | None None Comment ID to paginate from

Returns: UserCommentsSent

UserCommentsSent
├── Username      str
├── FirstID       str
├── LastID        str          ← use as `after` to paginate
├── TotalCount    int
└── Comments      List[Comment]
    ├── CommentID  str | None
    ├── Author     str | None
    ├── Subreddit  str | None
    ├── Body       str | None
    ├── ParentID   str | None
    ├── PostID     str | None
    ├── PostTitle  str | None
    ├── Link       str | None
    ├── Upvotes    int | None
    └── CreatedUtc float

get_stats() → dict

Returns session statistics since init() was called.

{
    'calls': int,               # total successful API calls
    'rate_limit_hits': int,     # number of 429 responses received
    'total_wait_seconds': float # total time spent waiting on rate limits
}

Pagination

Every response includes FirstID and LastID. Pass LastID as the after parameter to fetch the next page:

rs.init(user_agent="MyBot/1.0")

after = None
all_posts = []

while True:
    page = rs.get_home("python", limit=100, after=after)
    all_posts.extend(page.Posts)

    if page.TotalPosts < 100:
        break  # last page

    after = page.LastID

Architecture

Python (RedScrapsLib)
    │
    │  pythonnet
    ▼
C# .NET 10 Assembly (RedScrap.dll)
    ├── Scraper          — HttpClient, request logic
    ├── URLs             — URL builders for each endpoint
    ├── Receive (JSON)   — deserialisation models
    ├── Map              — raw → clean data mapping
    └── Sent             — clean data models returned to Python

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

redscrapslib-0.1.5-py3-none-manylinux2014_x86_64.whl (25.0 kB view details)

Uploaded Python 3

redscrapslib-0.1.5-py3-none-macosx_12_0_universal2.whl (25.0 kB view details)

Uploaded Python 3macOS 12.0+ universal2 (ARM64, x86-64)

redscrapslib-0.1.5-cp313-cp313-win_amd64.whl (25.0 kB view details)

Uploaded CPython 3.13Windows x86-64

File details

Details for the file redscrapslib-0.1.5-py3-none-manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for redscrapslib-0.1.5-py3-none-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 62b312a675000e161b165d2270598f2043323c938a177deb7ada1981033597d5
MD5 85987257b3ae296a349cfff712a5a6c5
BLAKE2b-256 ec7fcb23bb0ff78d2afb884b154d05e5c438435884175e24814c99a45f1ffb90

See more details on using hashes here.

File details

Details for the file redscrapslib-0.1.5-py3-none-macosx_12_0_universal2.whl.

File metadata

File hashes

Hashes for redscrapslib-0.1.5-py3-none-macosx_12_0_universal2.whl
Algorithm Hash digest
SHA256 8248166a298ba9ae3d0907d0d6fcda9eaa4e8f692e51767c8a1aea9859f2537b
MD5 3886cd6405bfc0bc7688ae53d3f47414
BLAKE2b-256 52966600e7b0ac5804ee8318a0a483125d7c8d7c0e4d6f720efdd94743c2173e

See more details on using hashes here.

File details

Details for the file redscrapslib-0.1.5-cp313-cp313-win_amd64.whl.

File metadata

File hashes

Hashes for redscrapslib-0.1.5-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 ecd7d1d64455da0d0ee5f38cea3fe1e06c08a778d0962f5e90fadaf8d3264c55
MD5 afab70789f4166faccf30953802844e4
BLAKE2b-256 50ba0ab124bc0ae83e7cceb6ff286d1e5cc0cc79864529d86355b946c45aa27c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page