Skip to main content

A client used for interacting with the Sunbelt API.

Project description

Sunbelt

SAWP: Sunbelt API Wrapper for Python

Author: Jacob Bayer

Introduction

Sunbelt is a database that stores information mined from Reddit. Unlike other services such as Pushshift and Reveddit, which store data on posts and comments immediately after they are posted (Pushshift), or create a new way for users to see live data on Reddit (Reveddit), Sunbelt stores information about how posts, comments, redditors, and subreddits have changed over time.

Sunbelt is the only service that does this (as far as I know), but it is still in a very early stage of development and does not have data at the same scale as Pushshift, nor will it any time soon. If you're interested in using Sunbelt for your project and you'd like to have data from a specific subreddit (or subreddits) loaded into the Sunbelt database, please contact me at jacobbenjaminbayer@gmail.com.

To start using Sunbelt, install the Sunbelt API Wrapper for Python (SAWP) by running

pip install sawp

Then import and instantiate the SunbeltClient from SAWP as follows.

from sawp import SunbeltClient
sunbelt = SunbeltClient()

SAWP enables a user to query the Sunbelt database using a GraphQL API. In this example, I select the first post in the Sunbelt database.

Posts stored in the Sunbelt database are called "SunPosts" to differentiate them from other reddit objects you may be analyzing (for example PRAW Submissions).

post = sunbelt.posts.first()
post
SunPost(1)

The SunPost object can be used to access attributes of the post.

post.permalink
'/r/AskReddit/comments/10kzboh/happy_birthday_askreddit/'
post.title
'Happy Birthday AskReddit!'

We can list the comments for this post using the post.comments attribute.

post.comments
[SunComment(1),
 SunComment(2),
 SunComment(3),
 SunComment(4),
 SunComment(5),
 ...
 SunComment(48),
 SunComment(49),
 SunComment(50)]

Sunbelt stores multiple versions of data for any given object, representing different times that the SunCrawler saw the entity on Reddit. These versions describe the non-permanent attributes of an object such as upvotes, karma, or subreddit subscribers.

Let's take a look at how many versions we have for a comment on SunPost(2).

post = sunbelt.posts.get(2)
comment = post.comments[3]
comment
SunComment(59)
comment.versions
[CommentVersion(SunComment = 59 , SunVersion = 1),
 CommentVersion(SunComment = 59 , SunVersion = 2)]

Let's look at some of the version data.

print('\n Upvotes over time for Comment:', comment.reddit_comment_id, '\n') #, '\n Posted in r/', post.subreddit.display_name, '\n')
for v in comment.versions:
    print(v.ups, 'upvotes at', v.sun_created_at)
 Upvotes over time for Comment: t1_j5t0ysc 

22389 upvotes at 25-01-2023 18:06:12
24792 upvotes at 26-01-2023 14:45:14

By looking at the comment body text of each version, we can see that this comment has been deleted by the author.

[x.body for x in comment.versions]
['Being a YouTube "prankster"', '[deleted]']

The details from the most recent version of any object are also stored as attributes with the "most_recent_" prefix.

print(comment.most_recent_ups)
print(comment.most_recent_body)
24792
[deleted]

Sunbelt to Pandas

Sunbelt uses a GraphQL API to query only the data specifically requested by the user. When a Sun object is first initalized by SAWP, it contains only bare minimum of information necessary to initialize the object unless additional information is specifically requested by the user. When an attribute is requested, a new API call is made to obtain that attribute from the database. A batch request for many attributes can be made by passing the requested attributes as arguments.

all_comments = sunbelt.comments.all(# Requested fields can be passed as args
                                 'sun_post_id',
                                 'sun_comment_id',
                                 'reddit_post_id',
                                 'reddit_comment_id',
                                 'reddit_parent_id',
                                 'most_recent_body',
                                 'most_recent_ups',
                                 'most_recent_downs',
                                 'created_utc',
                                 'most_recent_edited',
                                 'most_recent_gilded',
                                 'depth')

Sunbelt objects have a useful to_dict method, which can be used to create a pandas dataframe.

comment = all_comments[0]
comment.to_dict()
{'kind': 'comment',
 'uid': 1,
 'created_utc': 1674655786.0,
 'depth': '0',
 'most_recent_body': "Happy birthday to the world's internet town square.",
 'most_recent_downs': 0,
 'most_recent_edited': 0,
 'most_recent_gilded': '0',
 'most_recent_ups': 28,
 'reddit_comment_id': 't1_j5tm5b1',
 'reddit_parent_id': None,
 'reddit_post_id': 't3_10kzboh',
 'sun_comment_id': '1',
 'sun_post_id': 1,
 'sun_unique_id': 1}
import pandas as pd
comments_df = pd.DataFrame(x.to_dict() for x in all_comments)
comments_df
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
kind uid created_utc depth most_recent_body most_recent_downs most_recent_edited most_recent_gilded most_recent_ups reddit_comment_id reddit_parent_id reddit_post_id sun_comment_id sun_post_id sun_unique_id
0 comment 1 1.674656e+09 0 Happy birthday to the world's internet town sq... 0 0 0 28 t1_j5tm5b1 None t3_10kzboh 1 1 1
1 comment 2 1.674656e+09 0 ask reddit is aquarius 0 0 0 8 t1_j5tlz13 None t3_10kzboh 2 1 2
2 comment 3 1.674655e+09 0 Cool 0 0 0 7 t1_j5tlfri None t3_10kzboh 3 1 3
3 comment 4 1.674658e+09 0 Thanks for being there for 15 years so we coul... 0 0 0 8 t1_j5tq7nj None t3_10kzboh 4 1 4
4 comment 5 1.674656e+09 0 happy birthday reddits most disturbing comment... 0 0 0 8 t1_j5tmm97 None t3_10kzboh 5 1 5
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
655 comment 648 1.674667e+09 0 What a fucking asshole. 0 0 0 1 t1_j5udej5 None t3_10kzjx3 648 15 648
656 comment 649 1.674667e+09 0 Yep, like a cancer. 0 0 0 1 t1_j5uf640 None t3_10kzjx3 649 15 649
657 comment 650 1.674658e+09 1 I mean if you're a leading religious figure in... 0 0 0 39 t1_j5tselp None t3_10kzjx3 650 15 650
658 comment 655 1.674662e+09 2 Except the ones involving invading your country 0 0 0 11 t1_j5u073l None t3_10kzjx3 655 15 655
659 comment 657 1.674670e+09 2 There are many Islamic movements that aren't c... 0 0 0 1 t1_j5unxzh None t3_10kzjx3 657 15 657

660 rows × 15 columns

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sawp-0.0.8.tar.gz (32.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sawp-0.0.8-py3-none-any.whl (12.0 kB view details)

Uploaded Python 3

File details

Details for the file sawp-0.0.8.tar.gz.

File metadata

  • Download URL: sawp-0.0.8.tar.gz
  • Upload date:
  • Size: 32.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.13

File hashes

Hashes for sawp-0.0.8.tar.gz
Algorithm Hash digest
SHA256 92c6dd2b792f52777336a725455c0a6368db067e148f5e12691437bc29c7edc8
MD5 9eee24560b0ab94200553deaf5c4e1af
BLAKE2b-256 73e602173194457cb6cdaf69ca0980ee96742290bde0d18b820b67474582eda4

See more details on using hashes here.

File details

Details for the file sawp-0.0.8-py3-none-any.whl.

File metadata

  • Download URL: sawp-0.0.8-py3-none-any.whl
  • Upload date:
  • Size: 12.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.13

File hashes

Hashes for sawp-0.0.8-py3-none-any.whl
Algorithm Hash digest
SHA256 08184341d0270bf278ef8011b65e391d591f612772a99171b3d7467f3f13cfd1
MD5 c1e61b30a9780f40526ac0c85a0205b4
BLAKE2b-256 35505045f8b12accc7f57a8758df8e48b2dfd435c7935c73f63f9cba61206273

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page