This is a pre-production deployment of Warehouse, however changes made here WILL affect the production instance of PyPI.
Latest Version Dependencies status unknown Test status unknown Test coverage unknown
Project Description

Synopsis

A simple script to scrape for Tweets using the Python package requests to retrieve the content and Beautifullsoup4 to parse the retrieved content.

Motivation

Twitter has provided REST API’s which can be used by developers to access and read Twitter data. They have also provided a Streaming API which can be used to access Twitter Data in real-time.

Most of the software written to access Twitter data provide a library which functions as a wrapper around Twitters Search and Streaming API’s and therefore are limited by the limitations of the API’s.

With Twitter’s Search API you can only sent 180 Requests every 15 minutes. With a maximum number of 100 tweets per Request this means you can mine for 4 x 180 x 100 = 72.000 tweets per hour. By using TwitterScraper you are not limited by this number but by your internet speed/bandwith and the number of instances of TwitterScraper you are willing to start.

One of the bigger disadvantages of the Search API is that you can only access Tweets written in the past 7 days. This is a major bottleneck for anyone looking for older past data to make a model from. With TwitterScraper there is no such limitation.

Installation

To install twitterscraper:

(sudo) pip install twitterscraper

or you can clone the repository and in the folder containing setup.py

python setup.py install

The CLI

You can use the command line application to get your tweets stored to JSON right away:

twitterscraper "Trump OR Clinton" --limit 100 --output=tweets.json

Omit the limit to retrieve all tweets. You can at any time abort the scraping by pressing Ctrl+C, the scraped tweets will be stored safely in your JSON file.

Code Example

You can easily use TwitterScraper from within python:

from twitterscraper import query_tweets


# All tweets matching either Trump or Clinton will be returned. You will get at
# least 10 results within the minimal possible time/number of requests
for tweet in query_tweets("Trump OR Clinton", 10)[:10]:
    print(tweet.user.encode('utf-8'))

You can use any advanced query twitter supports. Simply compile your query at https://twitter.com/search-advanced.

You should get an output like this:

Tweet(user='@WiseFreeman', id='797020313569689601', timestamp='02:17 - 11. Nov. 2016', fullname='Solomon Freeman', text="Chinese Internet Companies Are In Danger After Trump's Victory http://fb.me/5NfxcdTn9\xa0")
Tweet(user='@TheRoseBushes', id='797020313099927552', timestamp='02:17 - 11. Nov. 2016', fullname='The Rose Bushes', text='Democrats Wonder If Bernie Sanders Could Have Beaten Trump http://ift.tt/2fHyWDL\xa0')
Tweet(user='@TvPrefeito', id='797020312869240833', timestamp='02:17 - 11. Nov. 2016', fullname='TvPrefeito', text='Na Casa Branca, Trump diz que vai pedir conselhos a Obama http://tinyurl.com/z97ll3c\xa0pic.twitter.com/aJeAjrldnM')
Tweet(user='@portlandor_agen', id='797020312655241217', timestamp='02:17 - 11. Nov. 2016', fullname='Portland Agent', text='Portland Police Say Anti-Trump Protest Is 'Riot' #donald https://dragplus.com/post/id/38524480\xa0…')
Tweet(user='@rpsmybb', id='797020312084905984', timestamp='02:17 - 11. Nov. 2016', fullname='Павел Афонин', text='Electoral College Electors: Electoral College Make Hillary Clinton President on December 19 - Подпишите петицию! http://fb.me/Rlv2qfWH\xa0')
Tweet(user='@Shane_Conneely', id='797020312038703104', timestamp='02:17 - 11. Nov. 2016', fullname='Shane Conneely', text="#Trump's public works scheme has odd historical symmetries, ironically tea-party fear of big gov spending may be the only restraint on him")
Tweet(user='@LavakyMohammed', id='797020311220785152', timestamp='02:17 - 11. Nov. 2016', fullname='Lavaky Mohammed', text='Usa-Un extrémiste Sioniste , milicien de Soros et anti Trump ! pic.twitter.com/PTDag2eQbO')
Tweet(user='@Isethoriginal', id='797020310641983489', timestamp='02:17 - 11. Nov. 2016', fullname='Iseth Goldstein', text='Just curious: If SJW are convinced Trump is a dictator, will they still be eager to repeal the 2nd amendment? Or will they buy mass ammo?')
Tweet(user='@italianfood_age', id='797020310134525952', timestamp='02:17 - 11. Nov. 2016', fullname='Italian Food agent', text='Portland Police Say Anti-Trump Protest Is 'Riot' #donald https://dragplus.com/post/id/38524480\xa0…')
Tweet(user='@laurac2605', id='797020310113570817', timestamp='02:17 - 11. Nov. 2016', fullname='Laura Ceccato', text='Why Clinton’s identity politics backfired http://www.spiked-online.com/newsite/article/why-clintons-identity-politics-backfired-trump-election/18960#.WCWah68m09A.twitter\xa0…  very interesting article')

TO DO

  • Add caching potentially? Would be nice to be able to resume scraping if something goes wrong and have half of the data of a request cached or so.
  • Add an example of using a thread pool/asynchio for gathering more tweets in parallel.
Release History

Release History

0.2.7

This version

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.2.6

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.2.5

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.2.4

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.2.3

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.2.2

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.2.1

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.2.0

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.1.4

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.1.3.2

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.1.3.1

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.1.3

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.1.2

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.1.2b0

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.1.1

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.1

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

Download Files

Download Files

TODO: Brief introduction on what you do with files - including link to relevant help section.

File Name & Checksum SHA256 Checksum Help Version File Type Upload Date
twitterscraper-0.2.7.zip (11.7 kB) Copy SHA256 Checksum SHA256 Source Jan 10, 2017

Supported By

WebFaction WebFaction Technical Writing Elastic Elastic Search Pingdom Pingdom Monitoring Dyn Dyn DNS Sentry Sentry Error Logging CloudAMQP CloudAMQP RabbitMQ Heroku Heroku PaaS Kabu Creative Kabu Creative UX & Design Fastly Fastly CDN DigiCert DigiCert EV Certificate Rackspace Rackspace Cloud Servers DreamHost DreamHost Log Hosting