Skip to main content

Automate Twitter Stream data collection

Project description

Twistream: Twitter Stream API data collection

CircleCI PyPI version

Twistream helps you automatically collect and store data from Twitter Stream API.

Installation

Latest stable release:

pip install twistream

From source:

git clone https://github.com/guillermo-carrasco/twistream.git
cd twistream
pip install .

Setting up

Twitter credentials

You need your twitter credentials in order to be able to use Twitter API. For that, create an application here. Once created, save the credentials to configure twistream

Create a configuration file

You can use the command twistream init to help you create a correctly formatted configuration file for your collections.

Once created, you will have a file that will luke like this:

~> cat ~/.twistream/twistream.yml      

twitter:                  
  consumer_key: your_consumer_key                   
  consumer_secret: your_consumer_secret             
  access_token_key: your_access_token_key             
  access_token_secret: your_access_token_secret       


backend: backend_name                  

backend_params:
    username: db_username
    password: db_password

Usage

Remember that --help is always an available option

Once created a configuration file, start collecting tweets!

twistream collect --tracks tracks,to,follow config.yaml

Refer to the twitter documentation to know what tracks are, in short:

A comma-separated list of phrases which will be used to determine what Tweets will be delivered on the stream. A phrase may be one or more terms separated by spaces, and a phrase will match if all of the terms in the phrase are present in the Tweet, regardless of order and ignoring case. By this model, you can think of commas as logical ORs, while spaces are equivalent to logical ANDs (e.g. ‘the twitter’ is the AND twitter, and ‘the,twitter’ is the OR twitter).

If what you want is to follow hashtags, don't forget to include the # character.

Supported backends

From version 0.1.3, twistream supports two backends. A relational database (SQLite) and a no-sql database (MongoDB).

NOTE that the SQLite backend will only save a couple of tweet fields, whilst the MongoDB backend will save the whole blob. It is a trade off between information and storage space.

Backend params format

SQLite
backend: sqlite

backend_params:
    db_path: /path/to/your/db
MongoDB
backend: mongodb

backend_params:
    db_string: database_connection_string

(See database connection string documentation)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

twistream-0.1.4.tar.gz (8.8 kB view details)

Uploaded Source

Built Distribution

twistream-0.1.4-py3-none-any.whl (10.6 kB view details)

Uploaded Python 3

File details

Details for the file twistream-0.1.4.tar.gz.

File metadata

  • Download URL: twistream-0.1.4.tar.gz
  • Upload date:
  • Size: 8.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.2.0.post20200210 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.8.1

File hashes

Hashes for twistream-0.1.4.tar.gz
Algorithm Hash digest
SHA256 5aa177d3c9a508c922bb93088acc73fa36e21da363bf52a4c4452c3fee3ccbdc
MD5 f8d2f76e00074ca92c618f15fe0d91c2
BLAKE2b-256 14532974f1b78a605a87aa61ca953e672736776f585d592f7f4a42a9db35304f

See more details on using hashes here.

File details

Details for the file twistream-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: twistream-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 10.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.2.0.post20200210 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.8.1

File hashes

Hashes for twistream-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 fa51902641bd6aa7b896e71dba7c689e938f9b772b238a2087acc338a88832e2
MD5 3fc77a3137a04b22a6a875f7ac179130
BLAKE2b-256 471da0d43e5563b64cc168be439917117df5040dea0242939bc7cdcc2bbae9c2

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page