Skip to main content

Tools for content based filtering of tweets and Twitter accounts

Project description

Twittersphere

Twittersphere is a tool for ingesting tweets and applying content based filtering rules for tweets and user profiles. Twittersphere uses simple inclusion/exclusion rules based on words and phrases to include or exclude user profiles or tweets as belonging to part of a group.

Functionality

Twittersphere exposes a command line interface and a Python library for:

  • Extracting relevant tweet entities from Twitter V2 API JSON
  • Creating a structured relational database in SQLite for further analytics
  • Applying rule based filters to select user profiles based on their content
  • Iteratively create or update these rule based filters

Command Line Usage

In the first instance Twittersphere can be used to create a local relational database from files containing V2 API Twitter JSON data collected via twarc. Any tweet or user JSON data collected via the Twitter API including search and streaming endpoints should work. Note that this process safely deduplicates items: you can insert the same file more than once and not see the same tweet twice.

twittersphere prepare processed.db FILE1.json FILE2.json ... FILEN.json

An existing ruleset (such as ... this not yet public Australian Twittersphere rules ...) can be applied as follows:

twittersphere filter-users oz_twittersphere_rules.csv processed.db RULESET_NAME

This will populate the user_matching_ruleset table with the user_id's of profiles that have matched that ruleset.

Updating Rules

After applying a ruleset, you can generate an updated list of rules with new candidate rules to expand the existing matching population. Note that the first time you run this command will take longer, as this is when initial statistics about ngrams are created.

The following command wil

twittersphere refine-user-rules processed.db RULESET_NAME candidate_rules.csv

Note that the following will show you which rules have already been applied and are valid rules for RULESET_NAME:

twittersphere list-user-rules processed.db

Creating Rules

To create a new rules you will need to generate an initial seed rule set, or alternatively an initial seed population set.

Limitations

Note that Twittersphere does not support Twitter V1.1 data at all.

Data collected with tools other than twarc, collected with twarc metadata turned off, or collected with limited fields included in the output will not be well supported.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

twittersphere-0.1.3.tar.gz (21.3 kB view details)

Uploaded Source

Built Distribution

twittersphere-0.1.3-py3-none-any.whl (19.5 kB view details)

Uploaded Python 3

File details

Details for the file twittersphere-0.1.3.tar.gz.

File metadata

  • Download URL: twittersphere-0.1.3.tar.gz
  • Upload date:
  • Size: 21.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.7

File hashes

Hashes for twittersphere-0.1.3.tar.gz
Algorithm Hash digest
SHA256 df27e157fef6c06edf76b059ea96f4504efd101ae3d88bda5fd2c90c6566e6e2
MD5 c157af54ef0897c9d5588fa21f30d12b
BLAKE2b-256 eabf4fd77e29226422ae68b974e536b84fdba309d5a5b2178484d3e6c3521aaa

See more details on using hashes here.

File details

Details for the file twittersphere-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: twittersphere-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 19.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.7

File hashes

Hashes for twittersphere-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 fe31cba4c496fa732ba6a25f149eda4a2b57018780ea448e51e159efb37e22bb
MD5 72473ceeaa9f27167803f112cac8a6cc
BLAKE2b-256 7774991bfa15ea9cb5122f53d282b0843fa234d81e0d6697e56b3071a869c50d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page