Skip to main content

A Python Package to get tweets with giving only single keyword

Project description

Tweet Scraping

Prerequisites

  1. Internet Connection
  2. Python 3.6+
  3. Must have present credentials (i,e: consumer key, consumer secret, access token, access token secret) by creating an account on Twitter Dev
  4. The code will create the output in the form of a csv file at the location of same code
  5. The dataset created will be unique at tweetid level

Installing Tweet Scraping

pip3 install tweetScraping

Using tweetScraping

Just import tweetScraping and call functions!

Code Usage:

import tweetScraping
a = tweetScraping.tweetScraping(consumer_key : str ,  consumer_secret :str, access_token : str , access_token_secret : str, query : str , [file_name:str],[no_of_tweets : int])
a.start()

Code Example:

NOTE: These are dummy keys and tokens and are only for representation, please replace these with your credentials

import tweetScraping
a = tweetScraping.tweetScraping('ghF98tufKbgWpGxHVbBTkx9L5' ,
                                'EiyUJ9aEdwTEKEe2HLuo8ZhBTJscztgaEpSBY38YZhSUkq1Az4',
                                '1099325182525661186-9dn78kOA4Z09plZWPHrn9nmgdukg6j',
                                'dZMfqR9O4eCQLvS0bnWNYr9eivjS4wtwsPY8WnBugR5xJ',
                                'GOT',
                                1000)
a.start()

This will output a csv file by the name GOT.csv, with 1000 tweets, this 1000 tweets can be increased further

Description of 33 columns created in the form of structured data from twitter unstructured data

1) tweet_id: the tweet id prefized and suffixed by '~' so that no digits are lost
2) tweet_created_at: When was the tweet posted on Twitter
3) tweet_created_on_holiday_bool: A boolean to tell if the tweet was posted on a 
national holiday or not(True:Yes, False: No)
4) tweet_created_on_weekend_bool: A boolean to tell if the tweet was posted on a 
weekend or not(True:Yes, False: No)
5) tweet_created_at_noon_bool: A boolean to tell if the tweet was posted during 
noon hours or not(True:Yes, False: No)
5) tweet_created_at_eve_bool: A boolean to tell if the tweet was posted during 
evening hours or not(True:Yes, False: No)
6) user_id: Twitter user account from which the tweet was posted prefixed and 
suffixed by '~' so that no integer is lost
7) user_screen_name: Twitter user screen name from which the tweet was posted, 
will have actual case(If it is camel case it remains as is)
8) user_screen_name_length: Length of the Twitter user screen name from which 
the current tweet was posted
9) user_no_of_tweets: How many tweets have been posted from the screen name 
since date of creation of account till the date this code getting executed
10) user_no_of_followers: Number of followers of the Twitter user screen name 
from which the current tweet was posted
11) user_no_of_followings: Number of accounts the Twitter user screen name 
follow from which the current tweet was posted
12) user_account_age: How old the Twitter user account on Twitter is from 
which the current tweet was posted (current date - account creation date)
13) user_no_of_favourites: Number of tweets liked by the Twitter 
user screen name from which the current tweet was posted
14) user_average_tweets: On a daily basis, how many tweets the Twitter 
user screen name post from which the current tweet was posted
15) user_average_favourites: On a daily basis, how many tweets the Twitter 
user screen name like from which the current tweet was posted
16) user_account_location: The geographical location(if shared) the Twitter 
user screen name from which the current tweet was posted
17) tweet_text: Tweet text post cleaning 
(cleaning done for (@[A-Za-z0-9]+)|([^0-9A-Za-z \t])|(\w+:\/\/\S+), 
and case standardization)
18) tweet_text_length: Length of the tweet text post cleaning
19) tweet_text_optimal_length: A boolean to tell if the tweet posted was of 
optimal length or less prior to cleaning(True:Yes, False: No)
20) tweet_text_no_of_hashtags: How many hashtags were present in the original 
tweet text before cleaning the text
21) tweet_text_contains_hashtags: A boolean to tell if the tweet posted had a 
hashtag or not prior to cleaning(True:Yes, False: No)
22) tweet_text_contains_url: A boolean to tell if the tweet posted had any url 
embedded prior to cleaning(True:Yes, False: No)
23) tweet_text_no_of_user_mentions: How many other screen names were tagged in 
the tweet text using '@'
24) tweet_text_contains_user_mentions: A boolean to tell if the tweet posted had
any user mentions prior to cleaning(True:Yes, False: No)
25) tweet_text_sentiment: The sentiment of the tweet
26) tweet_text_contains_media: A boolean to tell if the tweet posted had any 
multimedia prior to cleaning(True:Yes, False: No)
27) tweet_text_contains_number: A boolean to tell if the tweet posted had any 
numbers prior to cleaning(True:Yes, False: No)
28) tweet_text_contains_upper_words: A boolean to tell if the tweet posted had 
upper case words to emphasize the meaning prior to cleaning(True:Yes, False: No)
29) tweet_text_contains_lower_words: A boolean to tell if the tweet posted had 
lower case words prior to cleaning(True:Yes, False: No)
30) tweet_text_contains_excl: A boolean to tell if the tweet posted had 
exclamations prior to cleaning(True:Yes, False: No)
31) tweet_text_contains_retweet_suggestion: A boolean to tell if the tweet 
posted had 'RT' asking to retweet prior to cleaning(True:Yes, False: No)
32) retweeted: A boolean to tell if the tweet posted received any r
etweets or not(True:Yes, False: No)
33) retweets: How many actual number of retweets the current retweet 
received at time when you are running this code

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tweetScraping-1.0.2.tar.gz (9.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tweetScraping-1.0.2-py3-none-any.whl (7.3 kB view details)

Uploaded Python 3

File details

Details for the file tweetScraping-1.0.2.tar.gz.

File metadata

  • Download URL: tweetScraping-1.0.2.tar.gz
  • Upload date:
  • Size: 9.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.8.8

File hashes

Hashes for tweetScraping-1.0.2.tar.gz
Algorithm Hash digest
SHA256 fa8d87902b3046eeb7708ee85f863755dbd61121b2dd8c0b756b0ae3a11d18ea
MD5 098588c4c70be3b182ce51230feec71f
BLAKE2b-256 8bd29a4aed72734844788314c3f25ffdf798e8428cfe80e1fb83c6be4c7ac26e

See more details on using hashes here.

File details

Details for the file tweetScraping-1.0.2-py3-none-any.whl.

File metadata

  • Download URL: tweetScraping-1.0.2-py3-none-any.whl
  • Upload date:
  • Size: 7.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.8.8

File hashes

Hashes for tweetScraping-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 e55a216371ce19d7c97ee96c7f4d5a14970f65f36d0df655c85adafa270cc3ef
MD5 ae40c6542fb159d65c74e4089e3572c7
BLAKE2b-256 8099debef9cf9ca2b1064abc21db35d8b36165b50c61938e1bbb40c396a9e763

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page