Reddit Gateway API Library
Project description
GateRed
Reddit Gateway API Library, w/ pushshift history support.
Introduction
The idea is to access reddit data without logins and limits through its web API and proxy support.
Installing
You can install this library easily from pypi:
# with pip
pip install gatered
# with poetry
poetry add gatered
Using
The library provides easy functions to get start fast:
get_post_comments
get_posts
get_pushshift_posts
Alternatively you can directly use Client
and PushShiftAPI
classes to implement your own logics.
Errors can be handled with either RequestError
or HTTPStatusError
, see httpx exceptions to learn more.
See the examples
folder to learn more.
Documentation
function get_post_comments
get_post_comments(
submission_id: str,
all_comments: bool = False,
httpx_options: Dict[str, Any] = {}
)
Helper function to get submission and its comments. If all_comments
is True
, it will fetch all the comments that are nested by reddit.
Returns post
(submission) and its comments
as list.
Parameters
submission_id: :class:str
The Submission id (starts with t3_
).
all_comments: Optional[:class:bool
]
Set this to True
to also get all nested comments. Default to False
.
function get_posts
get_posts(
subreddit_name: str,
sort: Optional[str] = 'hot',
t: Optional[str] = 'day',
page_limit: Optional[int] = 4,
req_delay: int = 0.5,
httpx_options: Dict[str, Any] = {}
)
Async Generator to get submissions page by page.
Returns an async generator. Use async for loop to handle page results.
Parameters
subreddit_name: :class:str
Name of the subreddit.
sort: Optional[:class:str
]
Option to sort the submissions, default to hot
Available options: hot
, new
, top
, rising
t: Optional[:class:str
]
Type for sorting submissions by top
, default to day
Available options: hour
, day
, week
, month
, year
, all
page_limit: Optional[:class:int
]
Set a request limit for pages to fetch. Disable this limit by passing None
. Default to 4 (which will fetch 100 posts)
req_delay: Optional[:class:int
]
Set delay between each page request. Set 0 to disable it. Default to 0.5.
function get_pushshift_posts
get_pushshift_posts(
subreddit_name: str,
start_desc: datetime = None,
end_till: datetime = None,
req_delay: int = 0.5,
httpx_options: Dict[str, Any] = {}
)
Async Generator to get submissions by time range.
Returns an async generator. Use async for loop to handle page results.
Parameters
subreddit_name: :class:str
Name of the subreddit.
start_desc: Optional[:class:datetime
]
Provide datetime
to get posts of a time range. Default to None
to get from latest posts.
end_till: Optional[:class:datetime
]
Provide datetime
to get posts of a time range. Default to None
to get all existing posts.
req_delay: Optional[:class:int
]
Set delay between each page request. Set 0 to disable it. Default to 0.5.
class Client
The Client that interacts with the Reddit gateway API and returns raw JSON.
Httpx options can be passed in when creating the client such as proxies: https://www.python-httpx.org/api/#asyncclient
method __init__
__init__(**options: Any)
method get_post_comments
get_post_comments(
submission_id: str,
sort: Optional[str] = None,
all_comments: bool = False,
max_at_once: int = 8,
max_per_second: int = 4,
**kwargs: Any
)
Get submission and its comments. If all_comments
is True
, it will fetch all the comments that are nested by reddit.
Returns post
(submission) and its comments
as list.
Parameters
submission_id: :class:str
The Submission id (starts with t3_
).
sort: Optional[:class:str
]
Option to sort the comments of the submission, default to None
(best) Available options: top
, new
, controversial
, old
, qa
.
all_comments: Optional[:class:bool
]
Set this to True
to also get all nested comments. Default to False
.
max_at_once: Optional[:class:int
]
Limits the maximum number of concurrently requests for all comments. Default to 8.
max_per_second: Optional[:class:int
]
Limits the number of requests spawned per second. Default to 4.
method get_posts
get_posts(
subreddit_name: str,
sort: Optional[str] = 'hot',
t: Optional[str] = 'day',
after: Optional[str] = None,
dist: Optional[int] = None,
**kwargs: Any
)
Get submissions list from a subreddit, with ads filtered. This provides flexibility for you to handle pagninations by yourself.
Returns subreddit
and its posts
(submissions) as list, as well as token
and dist
for paginations.
Parameters
subreddit_name: :class:str
The Subreddit name.
sort: Optional[:class:str
]
Option to sort the submissions, default to hot
Available options: hot
, new
, top
, rising
t: Optional[:class:str
] Type for sorting submissions by top
, default to day
Available options: hour
, day
, week
, month
, year
, all
after: Optional[:class:str
], dist: Optional[:class:str
]
Needed for pagnitions.
class PushShiftAPI
The Client that interacts with the PushShift API and returns raw JSON. Httpx options can be passed in when creating the client.
This acts as a helper to fetch past submissions based on time range (which is not provided by reddit). To get the comments, it's recommended to use offical Gateway API as source.
method __init__
__init__(**options: Any)
method get_posts
get_posts(
subreddit_name: str,
before: int = None,
after: int = None,
sort: str = 'desc',
size: int = 100,
**kwargs: Any
)
Get submissions list from a subreddit.
Returns a list of submissions.
Parameters
subreddit_name: :class:str
The Subreddit name.
before: Optional[:class:int
], after: Optional[:class:int
]
Provide epoch time (without ms) to get posts from a time range. Default to None
to get latest posts.
sort: Optional[:class:str
]
Option to sort the submissions, default to desc
Available options: asc
, desc
size: Optional[:class:int
]
Size of list to fetch. Default to maximum of 100.
Plan
- Reddit Gateway API (fetch posts and comments)
- Add support to fetch past submissions using pushshift
- Add GitHub Action CI check and publish flow
- Publish on PyPI w/ portry
- Handle pagination through async generators
- Refine documentation in README and add examples
- Make an example sandbox in replit
- Prepare test cases
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file gatered-0.3.0.tar.gz
.
File metadata
- Download URL: gatered-0.3.0.tar.gz
- Upload date:
- Size: 13.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.13 CPython/3.8.2 Linux/5.11.0-1028-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 47bbbcd45703ae59425b015fa749eee7d4e83e647cf68db2ca7ffa196125bae2 |
|
MD5 | 1cb3aee5ea6b4c69c495035f930d0374 |
|
BLAKE2b-256 | 1e5f3e00cfaa2887cad62f4fa8c9b92be7938e843f932d4d26a33d9af485a2f8 |
File details
Details for the file gatered-0.3.0-py3-none-any.whl
.
File metadata
- Download URL: gatered-0.3.0-py3-none-any.whl
- Upload date:
- Size: 14.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.13 CPython/3.8.2 Linux/5.11.0-1028-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4ad936f639cf0cc1e42b5ec6e57f20d0a3e6707c4c53d36bde4078da558597e4 |
|
MD5 | b9bb0998a8ff10fed0487a89b73fa25d |
|
BLAKE2b-256 | fbfd71735a68a89fe5be6f38f436adf729fcfdd35660b2f49901529e662f5083 |