Convert Reddit posts to text
Project description
Reddit2Text
reddit2text is the Python library designed to effortlessly transform any Reddit thread into clean, readable text data.
Perfect for feeding to an LLM, performing textual/data analysis, or simply archiving for offline use, reddit2text offers a straightforward interface to access and convert content from Reddit.
Table of Contents
Features
- Convert any Reddit thread (the post + all its comments) into structured text.
- Include all comments, with the ability to specify the maximum comment depth.
- Configure a custom comment delimiter, for visual separation of nested comments.
Have a Feature Idea?
Simply open an issue on github and tell us what should be added to the next release!
Installation
Easy install using pip
pip3 install reddit2text
Quickstart
First, you need to create a Reddit app to get your client_id and client_secret. Follow the instructions on Reddit's API documentation to set up your application.
Then, replace the client_id, client_secret, and user_agent with your credentials.
The user agent can be anything you like, but we recommend following this convention according to Reddit's guidelines: '<app type>:<app name>:<version> (by <your username>)'
Here's an example:
from reddit2text import Reddit2Text
r2t = Reddit2Text(
# example credentials
client_id='123abc',
client_secret='123abc',
user_agent='script:my_app:v1.0 (by u/reddit2text)'
)
# The URL must have the post ID after the /comments/ to work, e.g. `1buyr0g`
URL = 'https://www.reddit.com/r/MadeMeSmile/comments/1buyr0g/ryan_reynolds_being_wholesome/'
output = r2t.textualize_post(URL)
print(output)
Here is an example (truncated) output from the above code! https://pastebin.com/mmHFJtcc
Extra Configuration
- max_comment_depth: Maximum depth of comments to output. Includes the top-most comment. Defaults to
Noneor-1to include all. - comment_delim: String/character used to indent comments according to their nesting level. Defaults to
|to mimic reddit.
r2t = Reddit2Text(
# credentials ...
max_comment_depth=3, # all comment chains will be limited to a max of 3 replies
comment_delim='#' # each comment level will be preceded by multiples of this string
)
Contributions
Contributions to reddit2text are welcome. Please submit pull requests or issues to our GitHub repository.
License
reddit2text is released under the MIT License. See the LICENSE file for more details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file reddit2text-0.0.9.tar.gz.
File metadata
- Download URL: reddit2text-0.0.9.tar.gz
- Upload date:
- Size: 9.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b2defa149e841a9a5142bc82b121d276b056eaa3f87e1e528a3337c9ded6b349
|
|
| MD5 |
ebfb342f5963fd1a2c14d2c80713e19a
|
|
| BLAKE2b-256 |
35a3fa7ade4567b39b6f507f9575e5acef6bde9a91852b3452717aeaeade2640
|
File details
Details for the file reddit2text-0.0.9-py3-none-any.whl.
File metadata
- Download URL: reddit2text-0.0.9-py3-none-any.whl
- Upload date:
- Size: 8.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c950f7872a589f5382223b81813f361e7945a54d6c7d1d9cbf95491f67f4cb4a
|
|
| MD5 |
90bf93cee60fda12a0939fab3e94631d
|
|
| BLAKE2b-256 |
e4f5e9eb5d5b1f851aba31d91e0d07dd7e1e44001db69733d82fe419f2398d48
|