Skip to main content

Reddit Gateway API Library

Project description

GateRed

Reddit Gateway API Library, w/ pushshift history support.

Latest Version Supported Python Versions CI Code style: black GitHub license

Introduction

The idea is to access reddit data without logins and limits through its web API and proxy support.

Installing

You can install this library easily from pypi:

# with pip
pip install gatered

# with poetry
poetry add gatered

Using

The library provides easy functions to get start fast:

  • get_post_comments
  • get_posts
  • get_pushshift_posts

Alternatively you can directly use Client and PushShiftAPI classes to implement your own logics.

Errors can be handled with either RequestError or HTTPStatusError, see httpx exceptions to learn more.

See the examples folder to learn more.

Documentation

function get_post_comments

get_post_comments(
    submission_id: str,
    all_comments: bool = False,
    httpx_options: Dict[str, Any] = {}
)

Helper function to get submission and its comments. If all_comments is True, it will fetch all the comments that are nested by reddit.

Returns post (submission) and its comments as list.

Parameters

submission_id: :class:str
The Submission id (starts with t3_).

all_comments: Optional[:class:bool]
Set this to True to also get all nested comments. Default to False.

function get_posts

get_posts(
    subreddit_name: str,
    sort: Optional[str] = 'hot',
    t: Optional[str] = 'day',
    page_limit: Optional[int] = 4,
    req_delay: int = 0.5,
    httpx_options: Dict[str, Any] = {}
)

Async Generator to get submissions page by page.

Returns an async generator. Use async for loop to handle page results.

Parameters

subreddit_name: :class:str
Name of the subreddit.

sort: Optional[:class:str]
Option to sort the submissions, default to hot
Available options: hot, new, top, rising

t: Optional[:class:str]
Type for sorting submissions by top, default to day
Available options: hour, day, week, month, year, all

page_limit: Optional[:class:int]
Set a request limit for pages to fetch. Disable this limit by passing None. Default to 4 (which will fetch 100 posts)

req_delay: Optional[:class:int]
Set delay between each page request. Set 0 to disable it. Default to 0.5.

function get_pushshift_posts

get_pushshift_posts(
    subreddit_name: str,
    start_desc: datetime = None,
    end_till: datetime = None,
    req_delay: int = 0.5,
    httpx_options: Dict[str, Any] = {}
)

Async Generator to get submissions by time range.

Returns an async generator. Use async for loop to handle page results.

Parameters

subreddit_name: :class:str
Name of the subreddit.

start_desc: Optional[:class:datetime]
Provide datetime to get posts of a time range. Default to None to get from latest posts.

end_till: Optional[:class:datetime]
Provide datetime to get posts of a time range. Default to None to get all existing posts.

req_delay: Optional[:class:int]
Set delay between each page request. Set 0 to disable it. Default to 0.5.


class Client

The Client that interacts with the Reddit gateway API and returns raw JSON.
Httpx options can be passed in when creating the client such as proxies: https://www.python-httpx.org/api/#asyncclient

method __init__

__init__(**options: Any)

method get_post_comments

get_post_comments(
    submission_id: str,
    sort: Optional[str] = None,
    all_comments: bool = False,
    max_at_once: int = 8,
    max_per_second: int = 4,
    **kwargs: Any
)

Get submission and its comments. If all_comments is True, it will fetch all the comments that are nested by reddit.

Returns post (submission) and its comments as list.

Parameters

submission_id: :class:str
The Submission id (starts with t3_).

sort: Optional[:class:str]
Option to sort the comments of the submission, default to None (best) Available options: top, new, controversial, old, qa.

all_comments: Optional[:class:bool]
Set this to True to also get all nested comments. Default to False.

max_at_once: Optional[:class:int]
Limits the maximum number of concurrently requests for all comments. Default to 8.

max_per_second: Optional[:class:int]
Limits the number of requests spawned per second. Default to 4.

method get_posts

get_posts(
    subreddit_name: str,
    sort: Optional[str] = 'hot',
    t: Optional[str] = 'day',
    after: Optional[str] = None,
    dist: Optional[int] = None,
    **kwargs: Any
)

Get submissions list from a subreddit, with ads filtered. This provides flexibility for you to handle pagninations by yourself.

Returns subreddit and its posts (submissions) as list, as well as token and dist for paginations.

Parameters

subreddit_name: :class:str
The Subreddit name.

sort: Optional[:class:str]
Option to sort the submissions, default to hot
Available options: hot, new, top, rising

t: Optional[:class:str] Type for sorting submissions by top, default to day Available options: hour, day, week, month, year, all

after: Optional[:class:str], dist: Optional[:class:str]
Needed for pagnitions.


class PushShiftAPI

The Client that interacts with the PushShift API and returns raw JSON. Httpx options can be passed in when creating the client.

This acts as a helper to fetch past submissions based on time range (which is not provided by reddit). To get the comments, it's recommended to use offical Gateway API as source.

method __init__

__init__(**options: Any)

method get_posts

get_posts(
    subreddit_name: str,
    before: int = None,
    after: int = None,
    sort: str = 'desc',
    size: int = 100,
    **kwargs: Any
)

Get submissions list from a subreddit.

Returns a list of submissions.

Parameters

subreddit_name: :class:str
The Subreddit name.

before: Optional[:class:int], after: Optional[:class:int]
Provide epoch time (without ms) to get posts from a time range. Default to None to get latest posts.

sort: Optional[:class:str]
Option to sort the submissions, default to desc
Available options: asc, desc

size: Optional[:class:int]
Size of list to fetch. Default to maximum of 100.


Plan

  • Reddit Gateway API (fetch posts and comments)
  • Add support to fetch past submissions using pushshift
  • Add GitHub Action CI check and publish flow
  • Publish on PyPI w/ portry
  • Handle pagination through async generators
  • Refine documentation in README and add examples
  • Make an example sandbox in replit
  • Prepare test cases

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gatered-1.0.0.tar.gz (13.3 kB view details)

Uploaded Source

Built Distribution

gatered-1.0.0-py3-none-any.whl (14.5 kB view details)

Uploaded Python 3

File details

Details for the file gatered-1.0.0.tar.gz.

File metadata

  • Download URL: gatered-1.0.0.tar.gz
  • Upload date:
  • Size: 13.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.13 CPython/3.8.2 Linux/5.11.0-1028-azure

File hashes

Hashes for gatered-1.0.0.tar.gz
Algorithm Hash digest
SHA256 8e82aeed64ec9efa1cfa97c76a2424bc8094f841262ac4dab7c69137b4b45084
MD5 7be771da59de72c59eebeb2a8b7e51e3
BLAKE2b-256 78c5a408498389d242c90f9a9cc2f639caa749931b08693a4c2f0801cd70904a

See more details on using hashes here.

File details

Details for the file gatered-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: gatered-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 14.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.13 CPython/3.8.2 Linux/5.11.0-1028-azure

File hashes

Hashes for gatered-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e6c4e7cb5d3b0870868d90aec4e9d38f503d331371c9111892c593dde2eef591
MD5 7f7ccbf28a8209bdc4040bec6c924c83
BLAKE2b-256 70208be9ac2eac1abf6ececd3ab8592c75bd6a8e599077fd7291eed6d60f4532

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page