Package that makes sucking down tweets from Twitter easy.
Project description
tweetvac
Python package for sucking down tweets from Twitter. It implements Twitter’s guidelines for working with timelines so that you don’t have to.
tweetvac supports retrospective pulling of tweets from Twitter. For example, it can pull down a large number of tweets by a specific user or all the tweets from a geographic area that mentions a search term. It automatically generates the requests to work backward along the timeline.
Installation
Install tweetvac using pip:
$ pip install tweetvac
If cloning this repository, you need to install twython and its dependencies.
Authentication
Twitter requires OAuth. tweetvac can store a user’s authentication information in a configuration file for reuse.
Log into Twitter and open https://dev.twitter.com/apps.
Create a new application. The name needs to be unique across all Twitter apps. A callback is not needed.
Create an OAuth access token on your application web page.
Create a file called tweetvac.cfg and format it as follows:
[Auth] consumer_key = Gx33LSA3IICoqqPoJOp9Q consumer_secret = 1qkKAljfpQMH9EqDZ8t50hK1HbahYXAUEi2p505umY0 oauth_token = 14574199-4iHhtyGRAeCvVzGpPNz0GLwfYC54ba3sK5uBl4hPe oauth_token_secret = K80YytdT9FRXEoADlVzJ64HDQEaUMwb37N9NBykCNw5gw
Alternatively, you can pass those four parameters as a tuple in the above order into the Tweetvac constructor rather than storing them in a configuration file.
The Basics
Import tweetvac
import tweetvac
Create a TweetVac instance
You can pass the OAuth parameters as a tuple:
vac = tweetvac.TweetVac((consumer_key, consumer_secret, oauth_token, oauth_token_secret))
or use the configuration object:
config = tweetvac.AuthConfig()
vac = tweetvac.TweetVac(config)
Suck down tweets
tweetvac expects a Twitter endpoint and a dictionary of parameters for that endpoint. Read the Twitter documentation for a list of endpoints and their parameters. It is recommended to set the count option in the params dict to the largest value supported by that endpoint.
params = {'screen_name': 'struckDC', 'count': 200}
data = vac.suck('statuses/user_timeline', params)
Work with the data
The data returned is a list of dicts. The fields in the dict are listed in the Twitter API documentation on the Tweet object.
The data can be converted back to json and stored to a file like this:
with open('data.json', 'w') as outfile:
json.dump(data, outfile)
Advanced
Filtering the tweets
Twitter provides several parameters on each endpoint for selecting what tweets you want to retrieve. Additional culling is available by passing a list of filter functions.
def remove_mention_tweets(tweet):
return not '@' in tweet['text']
data = vac.suck('statuses/user_timeline', params, filters=[remove_mention_tweets])
Return false from your function to remove the tweet from the list.
Turning off the vacuum
tweetvac will suck down tweets until you reach your rate limit or you consume all the available tweets. To stop sooner, you can pass a cutoff function that returns true when tweetvac should stop.
def stop(tweet):
cutoff_date = time.strptime("Wed Jan 01 00:00:00 +0000 2014", '%a %b %d %H:%M:%S +0000 %Y')
tweet_date = time.strptime(tweet['created_at'], '%a %b %d %H:%M:%S +0000 %Y')
return tweet_date < cutoff_date
data = vac.suck('statuses/user_timeline', params, cutoff=stop)
You can also pass a hard limit to the number of requests to stop tweetvac early:
data = vac.suck('statuses/user_timeline', params, max_requests=10)
Twitter API
Supported Endpoints
statuses/user_timeline - tweets by the specified user.
statuses/home_timeline - tweets by those followed by the authenticating user.
statuses/mentions_timeline - tweets mentioning the authenticating user.
statuses/retweets_of_me - tweets that are retweets of the authenticating user.
search/tweets - search over tweets
The endpoints have different request rate limits, count limits per request, and total tweet count limits.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file tweetvac-1.0.1.tar.gz
.
File metadata
- Download URL: tweetvac-1.0.1.tar.gz
- Upload date:
- Size: 6.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.6.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0267ff1c8229c9cbf596cb64befb888b8a014e068dfe14fa243f6d8fc55cd71e |
|
MD5 | fc99e4720daaec7bfc4fe4a4fa2e69fe |
|
BLAKE2b-256 | 64d68a980b6daf7f08262ca8855bc8a8fb7a5cad8d3d866ae98ef6f6f68accdc |
File details
Details for the file tweetvac-1.0.1-py3-none-any.whl
.
File metadata
- Download URL: tweetvac-1.0.1-py3-none-any.whl
- Upload date:
- Size: 6.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.6.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f0792e58286786b79e23aed0143287b0e54fb465095cb7aad7f5032286ef46e6 |
|
MD5 | 4ca4cfaa9a8d12f499d01520f04fd455 |
|
BLAKE2b-256 | 7be5a9fa984a6a82179e158a8e82ec5f5270eb5edbaebaac5e7e6e887e75242c |