Skip to main content

a library to download reply trees in forums and social media

Project description

delab-socialmedia

A python library to facilitate downloading conversation trees from social media platforms like Twitter, Reddit or Mastodon.

Overview

This library provides a unified interface for downloading conversations from Twitter, Reddit, and Mastodon. It simplifies the process of querying and retrieving conversations based on specific criteria, such as language, query string, and the recency of conversations. The library is designed to work with connectors specific to each platform, such as twarc for Twitter, praw for Reddit, and the Mastodon API for Mastodon.

Features

  • Download conversations from Twitter, Reddit, and Mastodon using a unified interface.
  • Filter conversations by language, query string, and recency.
  • Download the conversations in a unified format DelabTree that allows CSS analysis for the conversation trees, but also exports as pandas dataframe or networkx graph
  • Validate and filter the downloaded conversations based on custom criteria.
  • Download a daily sample of political discussions for CSS research
  • NOTE: Twitter functions have not been tested since its demise

Installation

pip install delab-socialmedia

Getting Started

Before you begin, make sure you have the necessary credentials:

  • For Reddit: reddit_secret, reddit_script_id, reddit_user, reddit_password, user_agent
  • For Mastodon: client_id, client_secret, access_token

Reddit Connector (get_praw function)

This function creates a connector for Reddit using the praw library.

Direct Variable Setting

from connection_util import get_praw

reddit = get_praw(
    reddit_secret='YOUR_REDDIT_SECRET',
    reddit_script_id='YOUR_REDDIT_SCRIPT_ID',
    reddit_user='YOUR_REDDIT_USERNAME',
    reddit_password='YOUR_REDDIT_PASSWORD',
    user_agent='YOUR_USER_AGENT'
)

Use YAML

reddit_secret: YOUR_REDDIT_SECRET
reddit_script_id: YOUR_REDDIT_SCRIPT_ID
reddit_user: YOUR_REDDIT_USERNAME
reddit_password: YOUR_REDDIT_PASSWORD
user_agent: YOUR_USER_AGENT

and

from connection_util import get_praw
reddit = get_praw(
    use_yaml=True,
    yaml_path='path/to/your/social_media_credentials.yml'
)

Mastodon Connector (create_mastodon function)

Direct Variable Setting

from connection_util import create_mastodon

mastodon = create_mastodon(
    client_id='YOUR_CLIENT_ID',
    client_secret='YOUR_CLIENT_SECRET',
    access_token='YOUR_ACCESS_TOKEN',
    api_base_url='https://mastodon.social/'
)

Use YAML

client_id: YOUR_CLIENT_ID
client_secret: YOUR_CLIENT_SECRET
access_token: YOUR_ACCESS_TOKEN
api_base_url: https://mastodon.social/

and

from connection_util import create_mastodon

mastodon = create_mastodon(
    use_yaml=True,
    yaml_path='path/to/your/social_media_credentials.yml'
)

Twitter Connector

Analogously, for Twitter, although the Twitter access has not been tested.

from twarc import Twarc2

class DelabTwarc(Twarc2):
    def __init__(self, access_token=None, access_token_secret=None, bearer_token=None, consumer_key=None,
                 consumer_secret=None, use_yaml=False, yaml_path=None):
        """
        create the Twitter connector
        :param access_token:
        :param access_token_secret:
        :param bearer_token:
        :param consumer_key:
        :param consumer_secret:
        :param use_yaml:
        :param yaml_path:
        """
        ...

Download daily sample

from models.language import LANGUAGE
from models.platform import PLATFORM
from connection_util import create_mastodon
from socialmedia import download_daily_sample_conversations

connector = create_mastodon()
conversations = download_daily_sample_conversations(platform=PLATFORM.MASTODON,
                                                    language=LANGUAGE.ENGLISH,
                                                    min_results=5, 
                                                    connector=connector)

Download Conversations

The library provides functions to download conversations based on various parameters. Here's an example of downloading conversations from Reddit:

from socialmedia import download_conversations, PLATFORM, LANGUAGE
from connection_util import get_praw

connector = get_praw()
conversations = download_conversations(query_string="Politics",
                                       platform=PLATFORM.REDDIT,
                                       language=LANGUAGE.ENGLISH,
                                       recent=True,
                                       max_conversations=30,
                                       connector=connector)

Get Conversations by User

To get all conversations a given user has participated in on Reddit or Mastodon:

from socialmedia import get_conversations_by_user, PLATFORM

from connection_util import get_praw

connector = get_praw()
user_conversations = get_conversations_by_user(username="u/example_user",
                                                platform=PLATFORM.REDDIT,
                                                connector=connector)

Contribution

Contributions to improve the library are welcome. Please submit pull requests or open issues to suggest changes or report bugs.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

delab-socialmedia-0.3.6.tar.gz (18.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

delab_socialmedia-0.3.6-py3-none-any.whl (22.2 kB view details)

Uploaded Python 3

File details

Details for the file delab-socialmedia-0.3.6.tar.gz.

File metadata

  • Download URL: delab-socialmedia-0.3.6.tar.gz
  • Upload date:
  • Size: 18.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.9.18

File hashes

Hashes for delab-socialmedia-0.3.6.tar.gz
Algorithm Hash digest
SHA256 5db8c102c2f41b0b2816728895fd1f007d4c2da84b2bc9261a0bfc611eee5f22
MD5 5698387eddacec420242309a157b240f
BLAKE2b-256 9409f1ba488c01b4789ed4c403dbe5777646504fe8c8384cf388d33f796139ac

See more details on using hashes here.

File details

Details for the file delab_socialmedia-0.3.6-py3-none-any.whl.

File metadata

File hashes

Hashes for delab_socialmedia-0.3.6-py3-none-any.whl
Algorithm Hash digest
SHA256 a4b502e64e19dfb603c0eb07a30c5c7f2e407a0b6479df8b0e1f3290efbbeca7
MD5 94d6003737fe98e771a71c163889ff08
BLAKE2b-256 6d2519d3c533c715d89c0f951f6ebfbe5b4e4a7d657a0ce901f3c4690c7d2e5f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page