Skip to main content

Reddit Gateway API Library

Project description

GateRed

Reddit Gateway API Library, w/ pushshift history support.

CI Code style: black

Introduction

The idea is to access reddit data without logins and limits through its web API and proxy support.

Installing

You can install this library easily from pypi:

# with pip
pip install gatered

# with poetry
poetry add gatered

Using

The library provides easy functions to get start fast:

  • get_post_comments
  • get_posts
  • get_pushshift_posts

Alternatively you can directly use Client and PushShiftAPI classes to implement your own logics.

Check the examples folder to learn more.

Documentation

function get_post_comments

get_post_comments(
    submission_id: str,
    all_comments: bool = False,
    httpx_options: Dict[str, Any] = {}
)

Helper function to get submission and its comments. If all_comments is True, it will fetch all the comments that are nested by reddit.

Returns post (submission) and its comments as list.

Parameters

submission_id: :class:str
The Submission id (starts with t3_).

all_comments: Optional[:class:bool]
Set this to True to also get all nested comments. Default to False.

function get_posts

get_posts(
    subreddit_name: str,
    sort: Optional[str] = 'hot',
    t: Optional[str] = 'day',
    page_limit: Optional[int] = 4,
    req_delay: int = 0.5,
    httpx_options: Dict[str, Any] = {}
)

Async Generator to get submissions page by page.

Returns an async generator. Use async for loop to handle page results.

Parameters

subreddit_name: :class:str
Name of the subreddit.

sort: Optional[:class:str]
Option to sort the submissions, default to hot
Available options: hot, new, top, rising

t: Optional[:class:str]
Type for sorting submissions by top, default to day
Available options: hour, day, week, month, year, all

page_limit: Optional[:class:int]
Set a request limit for pages to fetch. Disable this limit by passing None. Default to 4 (which will fetch 100 posts)

req_delay: Optional[:class:int]
Set delay between each page request. Set 0 to disable it. Default to 0.5.

function get_pushshift_posts

get_pushshift_posts(
    subreddit_name: str,
    start_desc: datetime = None,
    end_till: datetime = None,
    req_delay: int = 0.5,
    httpx_options: Dict[str, Any] = {}
)

Async Generator to get submissions by time range.

Returns an async generator. Use async for loop to handle page results.

Parameters

subreddit_name: :class:str
Name of the subreddit.

start_desc: Optional[:class:datetime]
Provide datetime to get posts of a time range. Default to None to get from latest posts.

end_till: Optional[:class:datetime]
Provide datetime to get posts of a time range. Default to None to get all existing posts.

req_delay: Optional[:class:int]
Set delay between each page request. Set 0 to disable it. Default to 0.5.


class Client

The Client that interacts with the Reddit gateway API and returns raw JSON.
Httpx options can be passed in when creating the client such as proxies: https://www.python-httpx.org/api/#asyncclient

method __init__

__init__(**options: Any)

method get_post_comments

get_post_comments(
    submission_id: str,
    sort: Optional[str] = None,
    all_comments: bool = False,
    max_at_once: int = 8,
    max_per_second: int = 4,
    **kwargs: Any
)

Get submission and its comments. If all_comments is True, it will fetch all the comments that are nested by reddit.

Returns post (submission) and its comments as list.

Parameters

submission_id: :class:str
The Submission id (starts with t3_).

sort: Optional[:class:str]
Option to sort the comments of the submission, default to None (best) Available options: top, new, controversial, old, qa.

all_comments: Optional[:class:bool]
Set this to True to also get all nested comments. Default to False.

max_at_once: Optional[:class:int]
Limits the maximum number of concurrently requests for all comments. Default to 8.

max_per_second: Optional[:class:int]
Limits the number of requests spawned per second. Default to 4.

method get_posts

get_posts(
    subreddit_name: str,
    sort: Optional[str] = 'hot',
    t: Optional[str] = 'day',
    after: Optional[str] = None,
    dist: Optional[int] = None,
    **kwargs: Any
)

Get submissions list from a subreddit, with ads filtered. This provides flexibility for you to handle pagninations by yourself.

Returns subreddit and its posts (submissions) as list, as well as token and dist for paginations.

Parameters

subreddit_name: :class:str
The Subreddit name.

sort: Optional[:class:str]
Option to sort the submissions, default to hot
Available options: hot, new, top, rising

t: Optional[:class:str] Type for sorting submissions by top, default to day Available options: hour, day, week, month, year, all

after: Optional[:class:str], dist: Optional[:class:str]
Needed for pagnitions.


class PushShiftAPI

The Client that interacts with the PushShift API and returns raw JSON. Httpx options can be passed in when creating the client.

This acts as a helper to fetch past submissions based on time range (which is not provided by reddit). To get the comments, it's recommended to use offical Gateway API as source.

method __init__

__init__(**options: Any)

method get_posts

get_posts(
    subreddit_name: str,
    before: int = None,
    after: int = None,
    sort: str = 'desc',
    size: int = 100,
    **kwargs: Any
)

Get submissions list from a subreddit.

Returns a list of submissions.

Parameters

subreddit_name: :class:str
The Subreddit name.

before: Optional[:class:int], after: Optional[:class:int]
Provide epoch time (without ms) to get posts from a time range. Default to None to get latest posts.

sort: Optional[:class:str]
Option to sort the submissions, default to desc
Available options: asc, desc

size: Optional[:class:int]
Size of list to fetch. Default to maximum of 100.


Plan

  • Reddit Gateway API (fetch posts and comments)
  • Add support to fetch past submissions using pushshift
  • Add GitHub Action CI check and publish flow
  • Publish on PyPI w/ portry
  • Handle pagination through async generators
  • Refine documentation in README and add examples
  • Make an example sandbox in replit
  • Prepare test cases

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gatered-0.2.0.tar.gz (13.1 kB view details)

Uploaded Source

Built Distribution

gatered-0.2.0-py3-none-any.whl (14.4 kB view details)

Uploaded Python 3

File details

Details for the file gatered-0.2.0.tar.gz.

File metadata

  • Download URL: gatered-0.2.0.tar.gz
  • Upload date:
  • Size: 13.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.13 CPython/3.8.2 Linux/5.11.0-1028-azure

File hashes

Hashes for gatered-0.2.0.tar.gz
Algorithm Hash digest
SHA256 c1ac9e36943852420fb3ea780ba57cb9e7123f37d5855e1447490a395d5710ea
MD5 6a17766da8c290023d6bd201f1cda338
BLAKE2b-256 9046ba84d573707f5dfa2494d021b34ee0b42bd8d54b169052788c0ea3386584

See more details on using hashes here.

File details

Details for the file gatered-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: gatered-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 14.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.13 CPython/3.8.2 Linux/5.11.0-1028-azure

File hashes

Hashes for gatered-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c091f288b95b7c7f988a07f69cceb3167c0df311274c69dd1e018d637411a254
MD5 7a8e6189735d03d44e0323fd939ac7a6
BLAKE2b-256 0850b09f4f8a3be035f003163f29037bc9098fa7cd42761470616ef3c904de63

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page