A Python Package to get tweets with giving only single keyword
Project description
Tweet Scraping
Prerequisites
- Internet Connection
- Python 3.6+
- Must have present credentials (i,e: consumer key, consumer secret, access token, access token secret) by creating an account on Twitter Dev
- The code will create the output in the form of a csv file at the location of same code
- The dataset created will be unique at tweetid level
Installing Tweet Scraping
pip3 install tweetScraping
Using tweetScraping
Just import tweetScraping and call functions!
Code Usage:
import tweetScraping
a = tweetScraping.tweetScraping(consumer_key : str , consumer_secret :str, access_token : str , access_token_secret : str, query : str , [file_name:str],[no_of_tweets : int])
a.start()
Code Example:
NOTE: These are dummy keys and tokens and are only for representation, please replace these with your credentials
import tweetScraping
a = tweetScraping.tweetScraping('ghF98tufKbgWpGxHVbBTkx9L5' ,
'EiyUJ9aEdwTEKEe2HLuo8ZhBTJscztgaEpSBY38YZhSUkq1Az4',
'1099325182525661186-9dn78kOA4Z09plZWPHrn9nmgdukg6j',
'dZMfqR9O4eCQLvS0bnWNYr9eivjS4wtwsPY8WnBugR5xJ',
'GOT',
1000)
a.start()
This will output a csv file by the name GOT.csv, with 1000 tweets, this 1000 tweets can be increased further
Description of 33 columns created in the form of structured data from twitter unstructured data
1) tweet_id: the tweet id prefized and suffixed by '~' so that no digits are lost
2) tweet_created_at: When was the tweet posted on Twitter
3) tweet_created_on_holiday_bool: A boolean to tell if the tweet was posted on a
national holiday or not(True:Yes, False: No)
4) tweet_created_on_weekend_bool: A boolean to tell if the tweet was posted on a
weekend or not(True:Yes, False: No)
5) tweet_created_at_noon_bool: A boolean to tell if the tweet was posted during
noon hours or not(True:Yes, False: No)
5) tweet_created_at_eve_bool: A boolean to tell if the tweet was posted during
evening hours or not(True:Yes, False: No)
6) user_id: Twitter user account from which the tweet was posted prefixed and
suffixed by '~' so that no integer is lost
7) user_screen_name: Twitter user screen name from which the tweet was posted,
will have actual case(If it is camel case it remains as is)
8) user_screen_name_length: Length of the Twitter user screen name from which
the current tweet was posted
9) user_no_of_tweets: How many tweets have been posted from the screen name
since date of creation of account till the date this code getting executed
10) user_no_of_followers: Number of followers of the Twitter user screen name
from which the current tweet was posted
11) user_no_of_followings: Number of accounts the Twitter user screen name
follow from which the current tweet was posted
12) user_account_age: How old the Twitter user account on Twitter is from
which the current tweet was posted (current date - account creation date)
13) user_no_of_favourites: Number of tweets liked by the Twitter
user screen name from which the current tweet was posted
14) user_average_tweets: On a daily basis, how many tweets the Twitter
user screen name post from which the current tweet was posted
15) user_average_favourites: On a daily basis, how many tweets the Twitter
user screen name like from which the current tweet was posted
16) user_account_location: The geographical location(if shared) the Twitter
user screen name from which the current tweet was posted
17) tweet_text: Tweet text post cleaning
(cleaning done for (@[A-Za-z0-9]+)|([^0-9A-Za-z \t])|(\w+:\/\/\S+),
and case standardization)
18) tweet_text_length: Length of the tweet text post cleaning
19) tweet_text_optimal_length: A boolean to tell if the tweet posted was of
optimal length or less prior to cleaning(True:Yes, False: No)
20) tweet_text_no_of_hashtags: How many hashtags were present in the original
tweet text before cleaning the text
21) tweet_text_contains_hashtags: A boolean to tell if the tweet posted had a
hashtag or not prior to cleaning(True:Yes, False: No)
22) tweet_text_contains_url: A boolean to tell if the tweet posted had any url
embedded prior to cleaning(True:Yes, False: No)
23) tweet_text_no_of_user_mentions: How many other screen names were tagged in
the tweet text using '@'
24) tweet_text_contains_user_mentions: A boolean to tell if the tweet posted had
any user mentions prior to cleaning(True:Yes, False: No)
25) tweet_text_sentiment: The sentiment of the tweet
26) tweet_text_contains_media: A boolean to tell if the tweet posted had any
multimedia prior to cleaning(True:Yes, False: No)
27) tweet_text_contains_number: A boolean to tell if the tweet posted had any
numbers prior to cleaning(True:Yes, False: No)
28) tweet_text_contains_upper_words: A boolean to tell if the tweet posted had
upper case words to emphasize the meaning prior to cleaning(True:Yes, False: No)
29) tweet_text_contains_lower_words: A boolean to tell if the tweet posted had
lower case words prior to cleaning(True:Yes, False: No)
30) tweet_text_contains_excl: A boolean to tell if the tweet posted had
exclamations prior to cleaning(True:Yes, False: No)
31) tweet_text_contains_retweet_suggestion: A boolean to tell if the tweet
posted had 'RT' asking to retweet prior to cleaning(True:Yes, False: No)
32) retweeted: A boolean to tell if the tweet posted received any r
etweets or not(True:Yes, False: No)
33) retweets: How many actual number of retweets the current retweet
received at time when you are running this code
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
tweetScraping-1.0.2.tar.gz
(9.0 kB
view hashes)
Built Distribution
Close
Hashes for tweetScraping-1.0.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e55a216371ce19d7c97ee96c7f4d5a14970f65f36d0df655c85adafa270cc3ef |
|
MD5 | ae40c6542fb159d65c74e4089e3572c7 |
|
BLAKE2b-256 | 8099debef9cf9ca2b1064abc21db35d8b36165b50c61938e1bbb40c396a9e763 |