Skip to main content

Reddit Gateway API Library

Project description

GateRed

Reddit Gateway API Library, w/ pushshift history support.

Latest Version Supported Python Versions CI Code style: black GitHub license

Introduction

The idea is to access reddit data without logins and limits through its web API and proxy support.

Installing

You can install this library easily from pypi:

# with pip
pip install gatered

# with poetry
poetry add gatered

Using

The library provides easy functions to get start fast:

  • get_post_comments
  • get_posts
  • get_pushshift_posts

Alternatively you can directly use Client and PushShiftAPI classes to implement your own logics.

Errors can be handled with either RequestError or HTTPStatusError, see httpx exceptions to learn more.

See the examples folder to learn more.

Documentation

function get_post_comments

get_post_comments(
    submission_id: str,
    all_comments: bool = False,
    httpx_options: Dict[str, Any] = {}
)

Helper function to get submission and its comments. If all_comments is True, it will fetch all the comments that are nested by reddit.

Returns post (submission) and its comments as list.

Parameters

submission_id: :class:str
The Submission id (starts with t3_).

all_comments: Optional[:class:bool]
Set this to True to also get all nested comments. Default to False.

function get_posts

get_posts(
    subreddit_name: str,
    sort: Optional[str] = 'hot',
    t: Optional[str] = 'day',
    page_limit: Optional[int] = 4,
    req_delay: int = 0.5,
    httpx_options: Dict[str, Any] = {}
)

Async Generator to get submissions page by page.

Returns an async generator. Use async for loop to handle page results.

Parameters

subreddit_name: :class:str
Name of the subreddit.

sort: Optional[:class:str]
Option to sort the submissions, default to hot
Available options: hot, new, top, rising

t: Optional[:class:str]
Type for sorting submissions by top, default to day
Available options: hour, day, week, month, year, all

page_limit: Optional[:class:int]
Set a request limit for pages to fetch. Disable this limit by passing None. Default to 4 (which will fetch 100 posts)

req_delay: Optional[:class:int]
Set delay between each page request. Set 0 to disable it. Default to 0.5.

function get_pushshift_posts

get_pushshift_posts(
    subreddit_name: str,
    start_desc: datetime = None,
    end_till: datetime = None,
    req_delay: int = 0.5,
    httpx_options: Dict[str, Any] = {}
)

Async Generator to get submissions by time range.

Returns an async generator. Use async for loop to handle page results.

Parameters

subreddit_name: :class:str
Name of the subreddit.

start_desc: Optional[:class:datetime]
Provide datetime to get posts of a time range. Default to None to get from latest posts.

end_till: Optional[:class:datetime]
Provide datetime to get posts of a time range. Default to None to get all existing posts.

req_delay: Optional[:class:int]
Set delay between each page request. Set 0 to disable it. Default to 0.5.


class Client

The Client that interacts with the Reddit gateway API and returns raw JSON.
Httpx options can be passed in when creating the client such as proxies: https://www.python-httpx.org/api/#asyncclient

method __init__

__init__(**options: Any)

method get_post_comments

get_post_comments(
    submission_id: str,
    sort: Optional[str] = None,
    all_comments: bool = False,
    max_at_once: int = 8,
    max_per_second: int = 4,
    **kwargs: Any
)

Get submission and its comments. If all_comments is True, it will fetch all the comments that are nested by reddit.

Returns post (submission) and its comments as list.

Parameters

submission_id: :class:str
The Submission id (starts with t3_).

sort: Optional[:class:str]
Option to sort the comments of the submission, default to None (best) Available options: top, new, controversial, old, qa.

all_comments: Optional[:class:bool]
Set this to True to also get all nested comments. Default to False.

max_at_once: Optional[:class:int]
Limits the maximum number of concurrently requests for all comments. Default to 8.

max_per_second: Optional[:class:int]
Limits the number of requests spawned per second. Default to 4.

method get_posts

get_posts(
    subreddit_name: str,
    sort: Optional[str] = 'hot',
    t: Optional[str] = 'day',
    after: Optional[str] = None,
    dist: Optional[int] = None,
    **kwargs: Any
)

Get submissions list from a subreddit, with ads filtered. This provides flexibility for you to handle pagninations by yourself.

Returns subreddit and its posts (submissions) as list, as well as token and dist for paginations.

Parameters

subreddit_name: :class:str
The Subreddit name.

sort: Optional[:class:str]
Option to sort the submissions, default to hot
Available options: hot, new, top, rising

t: Optional[:class:str] Type for sorting submissions by top, default to day Available options: hour, day, week, month, year, all

after: Optional[:class:str], dist: Optional[:class:str]
Needed for pagnitions.


class PushShiftAPI

The Client that interacts with the PushShift API and returns raw JSON. Httpx options can be passed in when creating the client.

This acts as a helper to fetch past submissions based on time range (which is not provided by reddit). To get the comments, it's recommended to use offical Gateway API as source.

method __init__

__init__(**options: Any)

method get_posts

get_posts(
    subreddit_name: str,
    before: int = None,
    after: int = None,
    sort: str = 'desc',
    size: int = 100,
    **kwargs: Any
)

Get submissions list from a subreddit.

Returns a list of submissions.

Parameters

subreddit_name: :class:str
The Subreddit name.

before: Optional[:class:int], after: Optional[:class:int]
Provide epoch time (without ms) to get posts from a time range. Default to None to get latest posts.

sort: Optional[:class:str]
Option to sort the submissions, default to desc
Available options: asc, desc

size: Optional[:class:int]
Size of list to fetch. Default to maximum of 100.


Plan

  • Reddit Gateway API (fetch posts and comments)
  • Add support to fetch past submissions using pushshift
  • Add GitHub Action CI check and publish flow
  • Publish on PyPI w/ portry
  • Handle pagination through async generators
  • Refine documentation in README and add examples
  • Make an example sandbox in replit
  • Prepare test cases

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gatered-0.3.0.tar.gz (13.3 kB view details)

Uploaded Source

Built Distribution

gatered-0.3.0-py3-none-any.whl (14.5 kB view details)

Uploaded Python 3

File details

Details for the file gatered-0.3.0.tar.gz.

File metadata

  • Download URL: gatered-0.3.0.tar.gz
  • Upload date:
  • Size: 13.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.13 CPython/3.8.2 Linux/5.11.0-1028-azure

File hashes

Hashes for gatered-0.3.0.tar.gz
Algorithm Hash digest
SHA256 47bbbcd45703ae59425b015fa749eee7d4e83e647cf68db2ca7ffa196125bae2
MD5 1cb3aee5ea6b4c69c495035f930d0374
BLAKE2b-256 1e5f3e00cfaa2887cad62f4fa8c9b92be7938e843f932d4d26a33d9af485a2f8

See more details on using hashes here.

File details

Details for the file gatered-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: gatered-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 14.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.13 CPython/3.8.2 Linux/5.11.0-1028-azure

File hashes

Hashes for gatered-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4ad936f639cf0cc1e42b5ec6e57f20d0a3e6707c4c53d36bde4078da558597e4
MD5 b9bb0998a8ff10fed0487a89b73fa25d
BLAKE2b-256 fbfd71735a68a89fe5be6f38f436adf729fcfdd35660b2f49901529e662f5083

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page