A twarc plugin to output Twitter data as CSV
Project description
twarc-csv
This module adds CSV Export for Tweets to twarc
.
Make sure twarc is installed and configured:
pip3 install --upgrade twarc
twarc2 configure
Install this plugin:
pip3 install --upgrade twarc-csv
A new csv
command will be available in twarc. If you have collected some
tweets in a file tweets.jsonl
you can now convert them to CSV
twarc2 search --limit 500 "blacklivesmatter" tweets.jsonl # collect some tweets
twarc2 csv tweets.jsonl tweets.csv # convert to CSV
Extra Command Line Options
Run
twarc2 csv --help
For a list of options.
Usage: twarc2 csv [OPTIONS] [INFILE] [OUTFILE]
Convert tweets to CSV.
Options:
--input-data-type [tweets|users|counts|compliance|lists]
Input data type - you can turn "tweets",
"users", "counts" or "compliance" or "lists"
data into CSV.
--inline-referenced-tweets / --no-inline-referenced-tweets
Output referenced tweets inline as separate
rows. Default: no.
--merge-retweets / --no-merge-retweets
Merge original tweet metadata into retweets.
The Retweet Text, metrics and entities are
merged from the original tweet. Default:
Yes.
--process-entities / --no-process-entities
Preprocess entities like URLs, mentions and
hashtags, providing expanded urls and lists
only instead of full json objects. Default:
Yes.
--json-encode-all / --no-json-encode-all
JSON encode / escape all fields. Default: no
--json-encode-text / --no-json-encode-text
Apply JSON encode / escape to text fields.
Default: no
--json-encode-lists / --no-json-encode-lists
JSON encode / escape lists. Default: yes
--allow-duplicates List every tweets as is, including
duplicates. Default: No, only unique tweets
per row. Retweets are not duplicates.
--extra-input-columns TEXT Manually specify extra input columns. Comma
separated string. Only modify this if you
have processed the json yourself. Default
output is all available object columns, no
extra input columns.
--output-columns TEXT Specify what columns to output in the CSV.
Default is all input columns.
--batch-size INTEGER How many lines to process per chunk. Default
is 100. Reduce this if output is slow.
--hide-stats Hide stats about the dataset on completion.
Always hidden if you're using stdin / stdout
pipes.
--hide-progress Hide the Progress bar. Always hidden if
you're using stdin / stdout pipes.
--help Show this message and exit.
Issues with Twitter Data in CSV
CSV isn't the best choice for storing twitter data. Always keep the original API responses, and perform feature extraction on json objects.
This export script is intended for convenience, for importing samples of data into other tools, there are many ways to format a CSV of tweets, and this is just one way.
Contributing
Suggestions, opinions, and pull requests welcome and encouraged. Even if you are just interested in using this plugin, post your use case in the Issues.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file twarc-csv-0.7.2.tar.gz
.
File metadata
- Download URL: twarc-csv-0.7.2.tar.gz
- Upload date:
- Size: 9.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/6.1.0 pkginfo/1.7.0 requests/2.28.2 requests-toolbelt/0.9.1 tqdm/4.65.0 CPython/3.7.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8d62f426bd6c7dd0b7848078382ace2e847843e2598fc91b0e88ae42888ec9f4 |
|
MD5 | d48776a67cb475ff7ee0604ceffe05c4 |
|
BLAKE2b-256 | 33c5cabde70e45eeec51b550a2f581d812b3bb7b3f3d01381d31acda1a7963f4 |