Create a SQLite database containing data pulled from Reddit about a single user.
Project description
reddit-user-to-sqlite
Stores all the content from a specific user in a SQLite database. This includes their comments and their posts.
Install
The PyPI package is reddit-user-to-sqlite
(PyPI Link). Install it globally using pipx:
pipx install reddit-user-to-sqlite
Usage
The CLI currently exposes a single command: user
. An archive
command is planned.
user
Fetches all comments and posts for a specific user.
reddit-user-to-sqlite user your_username
reddit-user-to-sqlite user your_username --db my-reddit-data.db
Params
Note: the argument order is reversed from most dogsheep packages (which take db_path first). This method allows for use of a default db name, so I prefer it.
username
: a case-insensitive string. The leading/u/
is optional (and ignored if supplied)- (optional)
--db
: the path to a sqlite file, which will be created or updated as needed. Defaults toreddit.db
.
A Note on Stored Data
While most Dogsheep projects grab the raw JSON output of their source APIs, Reddit's API has a lot of junk in it. So, I opted for a slimmed down approach.
Viewing Data
The resulting SQLite database pairs well with Datasette, a tool for viewing SQLite in the web. Below is my recommended configuration.
First, install datasette
:
pipx install datasette
Then, add the recommended plugins (for rendering timestamps and markdown):
pipx inject datasette datasette-render-markdown datasette-render-timestamps
Finally, create a metadata.json
file with the following:
{
"databases": {
"reddit": {
"tables": {
"comments": {
"sort_desc": "timestamp",
"plugins": {
"datasette-render-markdown": {
"columns": ["text"]
},
"datasette-render-timestamps": {
"columns": ["timestamp"]
}
}
},
"posts": {
"sort_desc": "timestamp",
"plugins": {
"datasette-render-markdown": {
"columns": ["text"]
},
"datasette-render-timestamps": {
"columns": ["timestamp"]
}
}
},
"subreddits": {
"sort": "name"
}
}
}
}
}
Now when you run
datasette reddit.db --metadata metadata.json
You'll get a nice, formatted output:
Development
This section is people making changes to this package.
When in a virtual environment, run the following:
pip install -e '.[test]'
This installs the package in --edit
mode and makes its dependencies available.
Running Tests
In your virtual environment, a simple pytest
should run the unit test suite.
Motivation
I got nervous when I saw Reddit's notification of upcoming API changes. To ensure I could always access data I created, I wanted to make sure I had a backup in place before anything changed in a big way.
FAQs
Why do some of my posts say [removed]
even though I can see them?
If a post is removed, only the mods and the user who posted it can see its text. Since this tool currently runs without any authentication, those removed posts can't be fetched via the API.
This will be fixed in a future release, either by:
- (planned) being able to pull data from a GDPR archive
- (maybe) adding support for authentication, so you can see your own posts
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for reddit-user-to-sqlite-0.2.0.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6db188f62ca60c0a4ecd9ff35e1dce00d2a0ccff15713dad7ff5fe50dce1d786 |
|
MD5 | 55aec865e9b77301a45a4995d1af4836 |
|
BLAKE2b-256 | 124f05ee9ec14ee939273371ae080822841295b4d91ad31b0bc087af2c5d2244 |
Hashes for reddit_user_to_sqlite-0.2.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 27ac3dc2ba81c2bf5ca9d02420e54a96e8c67974ffcfbb4d20d4dfd4c5a26247 |
|
MD5 | e44d4655e75f2e4787da18ce6a59f93d |
|
BLAKE2b-256 | 8bd54d5ae77d62472d2b4b5d4867b1b77b0274af600866770ad458049eae3df5 |