Skip to main content

CLI to manage rules and start tweets collection from the Twitter Stream API

Project description

TwCompose

PyPI-Server Project generated with PyScaffold

CLI to manage rules and start tweets collection from the Twitter Stream API

With TwCompose, you can:

  • Add, modify and delete Twitter stream rules in a simple configuration file
  • Validate that your rules are properly format before applying your changes
  • Get volume estimation for your rules to stay within the rate limits
  • Start collecting tweets in the background (Docker) with error handling and restart mechanism

Installation

Installing TwCompose requires at least Python 3.8

pip install twcompose

Usage

Create a credentials file

First, we need to specify the Twitter authentication token to connect to the Twitter Stream API. This needs to be specified in a YAML file (called credentials.yml by default) with the following format:

twitter_token: "<TWITTER_BEARER_TOKEN>"

Create a Twitter-Compose file

The following is an example of a twitter-compose.yml file. It defines stream parameters and rules as well as output driver to save collected tweets.

# twitter-compose.yml
image_tag: "0.1.0"

output:
  driver: local
  path: ./data/
  options:
    max_file_size: 1048576

parameters:
  tweet_fields:
    - text

streams:
  cop26:
    - tag: COP26GDA
      value: "#COP26GDA"
    - tag: bare cop26
      value: cop26 OR COP26 OR Cop26

Collection image reference

Controls the name and version of the Docker image used for the collector container.

# twitter-compose.yml
image_tag: "0.1.0"
image_name: "ghcr.io/smassonnet/twcollect"

Output driver reference

Controls how the collected tweets are being saved. Only support saving to a local folder in gzip compressed JSONLines files. Files are split according the max_file_size option.

# twitter-compose.yml
output:
  driver: local
  path: ./data/
  options:
    max_file_size: 1048576
driver

Only supports collection to a local folder.

path

Path to the local folder to save into.

options
  • max_file_size (number of bytes): Tweets are written to a new file when the file size reaches that limit. Defaults to 1 Gb.

Stream parameters reference

Controls the fields collected from the tweets.

# twitter-compose.yml
parameters:
  tweet_fields:
    - text

See the Twitter stream API reference for documentation.

Note that the following fields correspond to the Twitter fields ending with .fields instead of _fields:

  • media_fields: media.fields
  • place_fields: place.fields
  • poll_fields: poll.fields
  • tweet_fields: tweet.fields
  • user_fields: user.fields

Stream rules reference

Defines the scope of tweet to collect. See Twitter stream rules for reference.

It is organised as a mapping between a stream group name (cop26 is the example below) and a list of Twitter stream rules. Naming the stream rules with unique and comprehensive tags is highly recommended.

# twitter-compose.yml
streams:
  cop26:
    - tag: COP26GDA
      value: "#COP26GDA"
    - tag: bare cop26
      value: cop26 OR COP26 OR Cop26

Command-line inteface

Run twitter-compose --help from the command-line:

usage: twitter-compose [-h] [-f TC_FILE] [-p PROJECT_NAME]
                       [--credentials-file CREDENTIALS]
                       [--log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}]
                       {config,up,status,stop,volume} ...

Manage Twitter streams

positional arguments:
  {config,up,status,stop,volume}
    config              Show parsed configuration
    up                  Update Twitter streams
    status              Status of defined streams
    stop                Stop Twitter streams
    volume              Estimation of the monthly volume of streams

optional arguments:
  -h, --help            show this help message and exit
  -f TC_FILE, --file TC_FILE
                        The file name of the twitter-compose configuration
  -p PROJECT_NAME, --project-name PROJECT_NAME
                        Name of the current project
  --credentials-file CREDENTIALS, -c CREDENTIALS
                        A yaml file with mapping between credential name and
                        value
  --log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}
                        Logging level

config

Validates and prints the parsed twitter-compose.yml configuration.

up

Update twitter stream rules and starts/updates the local running stream collector Docker container. If takes an optional --check argument to display the changes without running the update.

status

Show the installed Twitter stream rules and the status of the stream collector.

stop

Stop the Docker container running the collection.

Note

This project has been set up using PyScaffold 4.3.1. For details and usage information on PyScaffold see https://pyscaffold.org/.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

twcompose-0.1.0.tar.gz (22.3 kB view hashes)

Uploaded Source

Built Distribution

twcompose-0.1.0-py3-none-any.whl (22.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page