Skip to main content

Reddit Gateway API Library

Project description

GateRed

Reddit Gateway API Library, w/ pushshift history support.

Latest Version Supported Python Versions CI Code style: black GitHub license

Introduction

The idea is to access reddit data without logins and limits through its web API and proxy support.

Installing

You can install this library easily from pypi:

# with pip
pip install gatered

# with poetry
poetry add gatered

Using

The library provides easy functions to get start fast:

  • get_post_comments
  • get_posts
  • get_pushshift_posts

Alternatively you can directly use Client and PushShiftAPI classes to implement your own logics.

Errors can be handled with either RequestError or HTTPStatusError, see httpx exceptions to learn more.

See the examples folder to learn more.

Documentation

function get_post_comments

get_post_comments(
    submission_id: str,
    all_comments: bool = False,
    httpx_options: Dict[str, Any] = {}
)

Helper function to get submission and its comments. If all_comments is True, it will fetch all the comments that are nested by reddit.

Returns post (submission) and its comments as list.

Parameters

submission_id: :class:str
The Submission id (starts with t3_).

all_comments: Optional[:class:bool]
Set this to True to also get all nested comments. Default to False.

function get_posts

get_posts(
    subreddit_name: str,
    sort: Optional[str] = 'hot',
    t: Optional[str] = 'day',
    page_limit: Optional[int] = 4,
    req_delay: int = 0.5,
    httpx_options: Dict[str, Any] = {}
)

Async Generator to get submissions page by page.

Returns an async generator. Use async for loop to handle page results.

Parameters

subreddit_name: :class:str
Name of the subreddit.

sort: Optional[:class:str]
Option to sort the submissions, default to hot
Available options: hot, new, top, rising

t: Optional[:class:str]
Type for sorting submissions by top, default to day
Available options: hour, day, week, month, year, all

page_limit: Optional[:class:int]
Set a request limit for pages to fetch. Disable this limit by passing None. Default to 4 (which will fetch 100 posts)

req_delay: Optional[:class:int]
Set delay between each page request. Set 0 to disable it. Default to 0.5.

function get_pushshift_posts

get_pushshift_posts(
    subreddit_name: str,
    start_desc: datetime = None,
    end_till: datetime = None,
    req_delay: int = 0.5,
    httpx_options: Dict[str, Any] = {}
)

Async Generator to get submissions by time range.

Returns an async generator. Use async for loop to handle page results.

Parameters

subreddit_name: :class:str
Name of the subreddit.

start_desc: Optional[:class:datetime]
Provide datetime to get posts of a time range. Default to None to get from latest posts.

end_till: Optional[:class:datetime]
Provide datetime to get posts of a time range. Default to None to get all existing posts.

req_delay: Optional[:class:int]
Set delay between each page request. Set 0 to disable it. Default to 0.5.


class Client

The Client that interacts with the Reddit gateway API and returns raw JSON.
Httpx options can be passed in when creating the client such as proxies: https://www.python-httpx.org/api/#asyncclient

method __init__

__init__(**options: Any)

method get_post_comments

get_post_comments(
    submission_id: str,
    sort: Optional[str] = None,
    all_comments: bool = False,
    max_at_once: int = 8,
    max_per_second: int = 4,
    **kwargs: Any
)

Get submission and its comments. If all_comments is True, it will fetch all the comments that are nested by reddit.

Returns post (submission) and its comments as list.

Parameters

submission_id: :class:str
The Submission id (starts with t3_).

sort: Optional[:class:str]
Option to sort the comments of the submission, default to None (best) Available options: top, new, controversial, old, qa.

all_comments: Optional[:class:bool]
Set this to True to also get all nested comments. Default to False.

max_at_once: Optional[:class:int]
Limits the maximum number of concurrently requests for all comments. Default to 8.

max_per_second: Optional[:class:int]
Limits the number of requests spawned per second. Default to 4.

method get_posts

get_posts(
    subreddit_name: str,
    sort: Optional[str] = 'hot',
    t: Optional[str] = 'day',
    after: Optional[str] = None,
    dist: Optional[int] = None,
    **kwargs: Any
)

Get submissions list from a subreddit, with ads filtered. This provides flexibility for you to handle pagninations by yourself.

Returns subreddit and its posts (submissions) as list, as well as token and dist for paginations.

Parameters

subreddit_name: :class:str
The Subreddit name.

sort: Optional[:class:str]
Option to sort the submissions, default to hot
Available options: hot, new, top, rising

t: Optional[:class:str] Type for sorting submissions by top, default to day Available options: hour, day, week, month, year, all

after: Optional[:class:str], dist: Optional[:class:str]
Needed for pagnitions.


class PushShiftAPI

The Client that interacts with the PushShift API and returns raw JSON. Httpx options can be passed in when creating the client.

This acts as a helper to fetch past submissions based on time range (which is not provided by reddit). To get the comments, it's recommended to use offical Gateway API as source.

method __init__

__init__(**options: Any)

method get_posts

get_posts(
    subreddit_name: str,
    before: int = None,
    after: int = None,
    sort: str = 'desc',
    size: int = 100,
    **kwargs: Any
)

Get submissions list from a subreddit.

Returns a list of submissions.

Parameters

subreddit_name: :class:str
The Subreddit name.

before: Optional[:class:int], after: Optional[:class:int]
Provide epoch time (without ms) to get posts from a time range. Default to None to get latest posts.

sort: Optional[:class:str]
Option to sort the submissions, default to desc
Available options: asc, desc

size: Optional[:class:int]
Size of list to fetch. Default to maximum of 100.


Plan

  • Reddit Gateway API (fetch posts and comments)
  • Add support to fetch past submissions using pushshift
  • Add GitHub Action CI check and publish flow
  • Publish on PyPI w/ portry
  • Handle pagination through async generators
  • Refine documentation in README and add examples
  • Make an example sandbox in replit
  • Prepare test cases

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gatered-0.2.1.tar.gz (13.3 kB view details)

Uploaded Source

Built Distribution

gatered-0.2.1-py3-none-any.whl (14.5 kB view details)

Uploaded Python 3

File details

Details for the file gatered-0.2.1.tar.gz.

File metadata

  • Download URL: gatered-0.2.1.tar.gz
  • Upload date:
  • Size: 13.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.13 CPython/3.8.2 Linux/5.11.0-1028-azure

File hashes

Hashes for gatered-0.2.1.tar.gz
Algorithm Hash digest
SHA256 7b7e510d43442468011d77a4f7ba8a756eb8214861746a72bf6397889e3dfe6b
MD5 9784f507934d6f9acc09ca3b761f9779
BLAKE2b-256 681d0818d007ac3a0cf086f1e9ff3ea266fd2182059af70b07fae198db1cae4c

See more details on using hashes here.

File details

Details for the file gatered-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: gatered-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 14.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.13 CPython/3.8.2 Linux/5.11.0-1028-azure

File hashes

Hashes for gatered-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c7e483b29092667463b879e4bc34ad633910f5f0c68e06c522cc9ed821150e4d
MD5 cac448d6384edbbe2564283ed986541e
BLAKE2b-256 4af5b2525af84ceeede0aca7fcf6bb84321f9836ad70890484bdd9c65ee98e95

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page