Skip to main content

Package that makes sucking down tweets from Twitter easy.

Project description

tweetvac

https://secure.travis-ci.org/cash/tweetvac.png?branch=master

Python package for sucking down tweets from Twitter. It implements Twitter’s guidelines for working with timelines so that you don’t have to.

tweetvac supports retrospective pulling of tweets from Twitter. For example, it can pull down a large number of tweets by a specific user or all the tweets from a geographic area that mentions a search term. It automatically generates the requests to work backward along the timeline.

Installation

Install tweetvac using pip:

$ pip install tweetvac

If cloning this repository, you need to install twython and its dependencies.

Authentication

Twitter requires OAuth. tweetvac can store a user’s authentication information in a configuration file for reuse.

  1. Log into Twitter and open https://dev.twitter.com/apps.

  2. Create a new application. The name needs to be unique across all Twitter apps. A callback is not needed.

  3. Create an OAuth access token on your application web page.

  4. Create a file called tweetvac.cfg and format it as follows:

[Auth]
consumer_key = Gx33LSA3IICoqqPoJOp9Q
consumer_secret = 1qkKAljfpQMH9EqDZ8t50hK1HbahYXAUEi2p505umY0
oauth_token = 14574199-4iHhtyGRAeCvVzGpPNz0GLwfYC54ba3sK5uBl4hPe
oauth_token_secret = K80YytdT9FRXEoADlVzJ64HDQEaUMwb37N9NBykCNw5gw

Alternatively, you can pass those four parameters as a tuple in the above order into the Tweetvac constructor rather than storing them in a configuration file.

The Basics

Import tweetvac

import tweetvac

Create a TweetVac instance

You can pass the OAuth parameters as a tuple:

vac = tweetvac.TweetVac((consumer_key, consumer_secret, oauth_token, oauth_token_secret))

or use the configuration object:

config = tweetvac.AuthConfig()
vac = tweetvac.TweetVac(config)

Suck down tweets

tweetvac expects a Twitter endpoint and a dictionary of parameters for that endpoint. Read the Twitter documentation for a list of endpoints and their parameters. It is recommended to set the count option in the params dict to the largest value supported by that endpoint.

params = {'screen_name': 'struckDC', 'count': 200}
data = vac.suck('statuses/user_timeline', params)

Work with the data

The data returned is a list of dicts. The fields in the dict are listed in the Twitter API documentation on the Tweet object.

The data can be converted back to json and stored to a file like this:

with open('data.json', 'w') as outfile:
    json.dump(data, outfile)

Advanced

Filtering the tweets

Twitter provides several parameters on each endpoint for selecting what tweets you want to retrieve. Additional culling is available by passing a list of filter functions.

def remove_mention_tweets(tweet):
    return not '@' in tweet['text']

data = vac.suck('statuses/user_timeline', params, filters=[remove_mention_tweets])

Return false from your function to remove the tweet from the list.

Turning off the vacuum

tweetvac will suck down tweets until you reach your rate limit or you consume all the available tweets. To stop sooner, you can pass a cutoff function that returns true when tweetvac should stop.

def stop(tweet):
    cutoff_date = time.strptime("Wed Jan 01 00:00:00 +0000 2014", '%a %b %d %H:%M:%S +0000 %Y')
    tweet_date = time.strptime(tweet['created_at'], '%a %b %d %H:%M:%S +0000 %Y')
    return tweet_date < cutoff_date

data = vac.suck('statuses/user_timeline', params, cutoff=stop)

You can also pass a hard limit to the number of requests to stop tweetvac early:

data = vac.suck('statuses/user_timeline', params, max_requests=10)

Twitter API

Supported Endpoints

The endpoints have different request rate limits, count limits per request, and total tweet count limits.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tweetvac-1.0.1.tar.gz (6.7 kB view details)

Uploaded Source

Built Distribution

tweetvac-1.0.1-py3-none-any.whl (6.6 kB view details)

Uploaded Python 3

File details

Details for the file tweetvac-1.0.1.tar.gz.

File metadata

  • Download URL: tweetvac-1.0.1.tar.gz
  • Upload date:
  • Size: 6.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.6.9

File hashes

Hashes for tweetvac-1.0.1.tar.gz
Algorithm Hash digest
SHA256 0267ff1c8229c9cbf596cb64befb888b8a014e068dfe14fa243f6d8fc55cd71e
MD5 fc99e4720daaec7bfc4fe4a4fa2e69fe
BLAKE2b-256 64d68a980b6daf7f08262ca8855bc8a8fb7a5cad8d3d866ae98ef6f6f68accdc

See more details on using hashes here.

File details

Details for the file tweetvac-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: tweetvac-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 6.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.6.9

File hashes

Hashes for tweetvac-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 f0792e58286786b79e23aed0143287b0e54fb465095cb7aad7f5032286ef46e6
MD5 4ca4cfaa9a8d12f499d01520f04fd455
BLAKE2b-256 7be5a9fa984a6a82179e158a8e82ec5f5270eb5edbaebaac5e7e6e887e75242c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page