Build a better understanding of your data in PostgreSQL.
Project description
Data Fluent for PostgreSQL
Build a better understanding of your data in PostgreSQL.
The following shows an example report generated by this tool. It gives the numbers of rows, columns, bytes as well as human-friendly size counts for each table within a given PostgreSQL database.
The following shows the row count for every column that represents a date grouped by year and month.
Installation
On Ubuntu 20:
$ wget -qO- \
https://www.postgresql.org/media/keys/ACCC4CF8.asc \
| sudo apt-key add -
$ echo "deb http://apt.postgresql.org/pub/repos/apt/ xenial-pgdg main" \
| sudo tee /etc/apt/sources.list.d/pgdg.list
$ sudo apt update
$ sudo apt install \
git \
python3-pip \
python3-virtualenv \
postgresql-13 \
postgresql-client-13 \
postgresql-contrib
On macOS:
$ brew install \
git \
postgresql \
virtualenv
Then, regardless of platform, setup a virtual environment and install this software package.
$ virtualenv ~/.fluency
$ source ~/.fluency/bin/activate
$ python3 -m pip install datafluent
Example Analysis
Clone fivethirtyeight's data repo. It has a large number of CSV-formatted datasets.
$ git clone https://github.com/fivethirtyeight/data.git ~/538data
Make sure you can access a PostgreSQL database on your machine. Here I'm creating an intel
database for the mark
user on my Ubuntu 20 machine.
$ sudo -u postgres \
bash -c "psql -c \"CREATE USER mark
WITH PASSWORD 'test'
SUPERUSER;\""
With PostgreSQL access setup, create a database called intel
.
$ createdb intel
I'll import one of the datasets within fivethirtyeight's repo. Note, because the dates within this dataset are not formatted in YYYY-MM-DD
format, I needed to override the format so that the MM/DD/YYYY
format would be read properly.
$ csvsql --db postgresql:///intel \
--insert ~/538data/congress-generic-ballot/generic_topline_historical.csv \
--datetime-format="%m/%d/%Y"
I'll run the Excel Report Generator:
$ datafluent_pg
This will result in a fluency.xlsx
file being produced with two worksheets: Metrics
and Time Distributions
.
If you need to override any parameters, please refer to the documentation:
$ datafluent_pg --help
Usage: datafluent [OPTIONS]
Options:
--dns TEXT [default: postgresql://localhost:5432/intel]
--output TEXT [default: fluency.xlsx]
--install-completion [bash|zsh|fish|powershell|pwsh]
Install completion for the specified shell.
--show-completion [bash|zsh|fish|powershell|pwsh]
Show completion for the specified shell, to
copy it or customize the installation.
--help Show this message and exit.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for datafluent-0.0.8-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 90056ddc9c75554058370309a05d2a5fb24e936826d01eda25e7bb34393f7d9e |
|
MD5 | af90029332a0b19a15a9884ff6dbc69d |
|
BLAKE2b-256 | bf8d70cc39945cbafd553c70ed00f69891174a890076bb63ad6bdad743740e8d |