Tidies Twitter json collected with Twarc into relational tables
Project description
tidy_tweet
Tidies Twitter json collected with Twarc into relational tables.
The resulting SQLite database is ideal for importing into analytical tools, or for using as a datasource for a programmatic analytical workflow that is more efficient than working directly from the raw JSON. However, we always recommend retaining the raw JSON data - think of tidy_tweet and its resulting databases as the first step of data pre-processing, rather than as the original/raw data for your project.
WARNING - tidy_tweet is still released in a preliminary version, not all data fields are loaded into the database, and we can't guarantee no breaking changes either of library interface or database schema before 1.0 release. Most notably, the database schema will have a significant change to allow multiple JSON files to be loaded into the same database file.
Installation
tidy_tweet is a Python package and can be installed with pip.
Short version of installation instructions:
pip install tidy-tweet
Usage
A command-line interface (CLI) is planned for the future, but is not yet implemented.
Using tidy_tweet as a Python library
Here is an example using the test data file included with tidy_tweet:
from tidy_tweet import initialise_sqlite, load_twarc_json_to_sqlite
import sqlite3
initialise_sqlite('ObservatoryTeam.db')
load_twarc_json_to_sqlite('tests/data/ObservatoryTeam.jsonl', 'ObservatoryTeam.db')
with sqlite3.connect('ObservatoryTeam.db') as connection:
db = connection.cursor()
db.execute("select count(*) from tweet")
print(f"There are {db.fetchone()[0]} tweets in the database!")
About tidy_tweet
Tidy_tweet is created and maintained by the QUT Digital Observatory and is open-sourced under an MIT license. We welcome contributions and feedback!
A DOI and citation information will be added in future.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for tidy_tweet-0.2.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6f98cfa2e9a4c6fd2af7c49efda7ef307c7fddb2d1b6efc21da1c8c6a6a93e63 |
|
MD5 | bec27198999a2fc98ad7dc3d0640115e |
|
BLAKE2b-256 | 9b8bf51a28034d087160c951db6c8b7fb1a21c0014de02207d69d8e8ca2ded7d |