Skip to main content

Tools for content based filtering of tweets and Twitter accounts

Project description

Twittersphere

Twittersphere is a tool for ingesting tweets and applying content based filtering rules for tweets and user profiles. Twittersphere uses simple inclusion/exclusion rules based on words and phrases to include or exclude user profiles or tweets as belonging to part of a group.

Functionality

Twittersphere exposes a command line interface and a Python library for:

  • Extracting relevant tweet entities from Twitter V2 API JSON
  • Creating a structured relational database in SQLite for further analytics
  • Applying rule based filters to select user profiles based on their content
  • Iteratively create or update these rule based filters

Command Line Usage

Creating a database

In the first instance Twittersphere can be used to create a local relational database from files containing V2 API Twitter JSON data collected via twarc. Any tweet or user JSON data collected via the Twitter API including search and streaming endpoints should work. Note that this process safely deduplicates items: you can insert the same file more than once and not see the same tweet twice. This database can be queried directly from most programming languages, or after installing an ODBC connector can be connected to tools like Excel or Tableau.

twittersphere prepare FILE1.json FILE2.json ... FILEN.json processed.db

Rule Based User Filtering

An existing ruleset (such as ... this not yet public Australian Twittersphere rules ...) can be applied as follows:

twittersphere filter-users rules.csv processed.db

This will populate the user_matching_ruleset table with the user_id's of profiles that have matched that ruleset, along with the name of the filename of the rules for later reference.

Updating Rules

After applying a ruleset, you can generate an updated list of rules with new candidate rules to expand the existing matching population. Note that the first time you run this command will take longer, as this is when initial statistics about ngrams are created.

The following command wil

twittersphere refine-user-rules processed.db RULESET_NAME candidate_rules.csv

Note that the following will show you which rules have already been applied and are valid rules for RULESET_NAME:

twittersphere list-user-rules processed.db

Creating Rules

To create a new rules you will need to generate an initial seed rule set, or alternatively an initial seed population set.

Limitations

Note that Twittersphere does not support Twitter V1.1 data at all.

Data collected with tools other than twarc, collected with twarc metadata turned off, or collected with limited fields included in the output will not be well supported.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

twittersphere-0.4.0.tar.gz (22.6 kB view details)

Uploaded Source

Built Distribution

twittersphere-0.4.0-py3-none-any.whl (20.5 kB view details)

Uploaded Python 3

File details

Details for the file twittersphere-0.4.0.tar.gz.

File metadata

  • Download URL: twittersphere-0.4.0.tar.gz
  • Upload date:
  • Size: 22.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.8

File hashes

Hashes for twittersphere-0.4.0.tar.gz
Algorithm Hash digest
SHA256 c7b9b40bc80efb042e50bff49d8285781fd2781e38bd89cc1de65acb95ca6d6f
MD5 c798095858d8066e5fcf9af8a749644c
BLAKE2b-256 fd49da04b6bdd449bff584f19dc987c14b35fe0980131677c5c046376802d8ef

See more details on using hashes here.

File details

Details for the file twittersphere-0.4.0-py3-none-any.whl.

File metadata

File hashes

Hashes for twittersphere-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 117ca6d2f7b2479a42162053b3c86047b7bd28a05f9b44f5a66719aabdbeca4d
MD5 2508d9d4df9a28a3bbd44b0f5839fef5
BLAKE2b-256 984ace49df4460bec6fc1a91a73b1e7934303366de2667f25e1ccbc6954e3262

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page