Build a better understanding of your data in PostgreSQL.
Project description
Data Fluent for PostgreSQL
Build a better understanding of your data in PostgreSQL.
The following shows an example report generated by this tool. It gives the numbers of rows, columns, bytes as well as human-friendly size counts for each table within a given PostgreSQL database.
The following shows the row count for every column that represents a date grouped by year and month.
Installation
On Ubuntu 20:
$ wget -qO- \
https://www.postgresql.org/media/keys/ACCC4CF8.asc \
| sudo apt-key add -
$ echo "deb http://apt.postgresql.org/pub/repos/apt/ xenial-pgdg main" \
| sudo tee /etc/apt/sources.list.d/pgdg.list
$ sudo apt update
$ sudo apt install \
git \
python3-pip \
python3-virtualenv \
postgresql-13 \
postgresql-client-13
On macOS:
$ brew install \
git \
postgresql \
virtualenv
Then, regardless of platform, setup a virtual environment and install this software package.
$ virtualenv ~/.fluency
$ source ~/.fluency/bin/activate
$ python3 -m pip install datafluent
Example Analysis
Clone FiveThirtyEight's data repository. It has a large number of CSV-formatted datasets they've used for some of their articles and graphics.
$ git clone https://github.com/fivethirtyeight/data.git ~/538data
Make sure you can access a PostgreSQL database on your machine. Below I'll grant access to my account on my Ubuntu 20 machine. Please adjust the username and password for your system.
$ sudo -u postgres \
bash -c "psql -c \"CREATE USER mark
WITH PASSWORD 'test'
SUPERUSER;\""
With access setup, I've created a PostgreSQL database called intel
.
$ createdb intel
There are a few dependencies for csvkit that are required to be installed ahead of time.
To install Ubuntu 20's dependencies for csvkit run:
$ sudo apt-get install \
libicu-dev \
pkg-config
To install macOS's dependencies for csvkit run:
$ brew install icu4c
Then, regardless of platform, run:
$ python3 -m pip install csvkit
I'll import one of the datasets within FiveThirtyEight's repository. Note, because the dates within this dataset are not formatted in YYYY-MM-DD
format, I needed to override the date parsing so that the MM/DD/YYYY
format would be inferred by csvsql
properly.
$ csvsql --db postgresql:///intel \
--insert ~/538data/congress-generic-ballot/generic_topline_historical.csv \
--datetime-format="%m/%d/%Y"
I'll then run Data Fluent which will generate a report in Excel format.
$ datafluent --url postgresql:///intel
The above will produce a fluency.xlsx
file with two worksheets: Metrics
and Time Distributions
.
Further Help
The database URL is passed through to sqlalchemy. Please see their documentation on Database URLs for more information on their syntax and the drivers supported.
If you need to override any parameters in this application, please refer to the help documentation generated by typer:
$ datafluent --help
Usage: datafluent [OPTIONS]
Options:
--url TEXT [default: postgresql://localhost:5432/intel]
--output TEXT [default: fluency.xlsx]
--install-completion [bash|zsh|fish|powershell|pwsh]
Install completion for the specified shell.
--show-completion [bash|zsh|fish|powershell|pwsh]
Show completion for the specified shell, to
copy it or customize the installation.
--help Show this message and exit.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for datafluent-0.0.35-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 55b31ba7b5e3fbb14c7b99ccc7b7cfddaa75d151131653997a4e6133f7257c86 |
|
MD5 | 30a8f2347b65570dfeba2b905960b98b |
|
BLAKE2b-256 | 321c3a77a72396763908f27dd03c603fef4dcaff5a643b96a63f78b1cd925e50 |