Skip to main content

A twarc plugin to output Twitter data as CSV

Project description

twarc-csv

This module adds CSV Export for Tweets to twarc.

Make sure twarc is installed and configured:

pip3 install --upgrade twarc
twarc2 configure

Install this plugin:

pip3 install --upgrade twarc-csv

A new csv command will be available in twarc. If you have collected some tweets in a file tweets.jsonl you can now convert them to CSV

twarc2 search --limit 500 "blacklivesmatter" tweets.jsonl # collect some tweets
twarc2 csv tweets.jsonl tweets.csv # convert to CSV

Extra Command Line Options

Run

twarc2 csv --help

For a list of options.

Usage: twarc2 csv [OPTIONS] [INFILE] [OUTFILE]

  Convert tweets to CSV.

Options:
  --input-data-type [tweets|users|counts|compliance|lists]
                                  Input data type - you can turn "tweets",
                                  "users", "counts" or "compliance" or "lists"
                                  data into CSV.
  --inline-referenced-tweets / --no-inline-referenced-tweets
                                  Output referenced tweets inline as separate
                                  rows. Default: no.
  --merge-retweets / --no-merge-retweets
                                  Merge original tweet metadata into retweets.
                                  The Retweet Text, metrics and entities are
                                  merged from the original tweet. Default:
                                  Yes.
  --process-entities / --no-process-entities
                                  Preprocess entities like URLs, mentions and
                                  hashtags, providing expanded urls and lists
                                  only instead of full json objects. Default:
                                  Yes.
  --json-encode-all / --no-json-encode-all
                                  JSON encode / escape all fields. Default: no
  --json-encode-text / --no-json-encode-text
                                  Apply JSON encode / escape to text fields.
                                  Default: no
  --json-encode-lists / --no-json-encode-lists
                                  JSON encode / escape lists. Default: yes
  --allow-duplicates              List every tweets as is, including
                                  duplicates. Default: No, only unique tweets
                                  per row. Retweets are not duplicates.
  --extra-input-columns TEXT      Manually specify extra input columns. Comma
                                  separated string. Only modify this if you
                                  have processed the json yourself. Default
                                  output is all available object columns, no
                                  extra input columns.
  --output-columns TEXT           Specify what columns to output in the CSV.
                                  Default is all input columns.
  --batch-size INTEGER            How many lines to process per chunk. Default
                                  is 100. Reduce this if output is slow.
  --hide-stats                    Hide stats about the dataset on completion.
                                  Always hidden if you're using stdin / stdout
                                  pipes.
  --hide-progress                 Hide the Progress bar. Always hidden if
                                  you're using stdin / stdout pipes.
  --help                          Show this message and exit.

Issues with Twitter Data in CSV

CSV isn't the best choice for storing twitter data. Always keep the original API responses, and perform feature extraction on json objects.

This export script is intended for convenience, for importing samples of data into other tools, there are many ways to format a CSV of tweets, and this is just one way.

Contributing

Suggestions, opinions, and pull requests welcome and encouraged. Even if you are just interested in using this plugin, post your use case in the Issues.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

twarc-csv-0.7.2.tar.gz (9.7 kB view details)

Uploaded Source

File details

Details for the file twarc-csv-0.7.2.tar.gz.

File metadata

  • Download URL: twarc-csv-0.7.2.tar.gz
  • Upload date:
  • Size: 9.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/6.1.0 pkginfo/1.7.0 requests/2.28.2 requests-toolbelt/0.9.1 tqdm/4.65.0 CPython/3.7.5

File hashes

Hashes for twarc-csv-0.7.2.tar.gz
Algorithm Hash digest
SHA256 8d62f426bd6c7dd0b7848078382ace2e847843e2598fc91b0e88ae42888ec9f4
MD5 d48776a67cb475ff7ee0604ceffe05c4
BLAKE2b-256 33c5cabde70e45eeec51b550a2f581d812b3bb7b3f3d01381d31acda1a7963f4

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page