Skip to main content

Build a better understanding of your data in PostgreSQL.

Project description

Data Fluent for PostgreSQL

license

Build a better understanding of your data in PostgreSQL.

The following shows an example report generated by this tool. It gives the numbers of rows, columns, bytes as well as human-friendly size counts for each table within a given PostgreSQL database.

The Metrics Report

The following shows the row count for every column that represents a date grouped by year and month.

The Time Distribution Report

Installation

On Ubuntu 20:

$ wget -qO- \
    https://www.postgresql.org/media/keys/ACCC4CF8.asc \
        | sudo apt-key add -
$ echo "deb http://apt.postgresql.org/pub/repos/apt/ xenial-pgdg main" \
    | sudo tee /etc/apt/sources.list.d/pgdg.list

$ sudo apt update
$ sudo apt install \
    git \
    python3-pip \
    python3-virtualenv \
    postgresql-13 \
    postgresql-client-13 \
    postgresql-contrib

On macOS:

$ brew install \
    git \
    postgresql \
    virtualenv

Then, regardless of platform, setup a virtual environment and install the following Python-based dependencies.

$ virtualenv ~/.fluency
$ source ~/.fluency/bin/activate
$ python3 -m pip install \
    csvkit \
    humanfriendly \
    ipython \
    openpyxl \
    Pandas \
    psycopg2-binary \
    typer \
    xlsxwriter

Then clone this repo as it contains the datafluent_pg.py script:

$ git clone https://github.com/marklit/datafluent_pg.git ~/datafluent_pg
$ cd ~/datafluent_pg

Example Analysis

Clone fivethirtyeight's data repo. It has a large number of CSV-formatted datasets.

$ git clone https://github.com/fivethirtyeight/data.git ~/538data

Make sure you can access a PostgreSQL database on your machine. Here I'm creating an intel database for the mark user on my Ubuntu 20 machine.

$ sudo -u postgres \
    bash -c "psql -c \"CREATE USER mark
                       WITH PASSWORD 'test'
                       SUPERUSER;\""

With PostgreSQL access setup, create a database called intel.

$ createdb intel

I'll import one of the datasets within fivethirtyeight's repo. Note, because the dates within this dataset are not formatted in YYYY-MM-DD format, I needed to override the format so that the MM/DD/YYYY format would be read properly.

$ csvsql --db postgresql:///intel \
         --insert ~/538data/congress-generic-ballot/generic_topline_historical.csv \
         --datetime-format="%m/%d/%Y"

I'll run the Excel Report Generator:

$ python datafluent_pg.py

This will result in a fluency.xlsx file being produced with two worksheets: Metrics and Time Distributions.

If you need to override any parameters, please refer to the documentation:

$ python datafluent_pg.py --help
Usage: datafluent_pg.py [OPTIONS]

Options:
  --pg-dns TEXT                   [default: postgresql://localhost:5432/intel]
  --output TEXT                   [default: fluency.xlsx]
  --install-completion [bash|zsh|fish|powershell|pwsh]
                                  Install completion for the specified shell.
  --show-completion [bash|zsh|fish|powershell|pwsh]
                                  Show completion for the specified shell, to
                                  copy it or customize the installation.

  --help                          Show this message and exit.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datafluent-0.0.1.tar.gz (5.7 kB view details)

Uploaded Source

Built Distribution

datafluent-0.0.1-py3-none-any.whl (5.7 kB view details)

Uploaded Python 3

File details

Details for the file datafluent-0.0.1.tar.gz.

File metadata

  • Download URL: datafluent-0.0.1.tar.gz
  • Upload date:
  • Size: 5.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.9.5

File hashes

Hashes for datafluent-0.0.1.tar.gz
Algorithm Hash digest
SHA256 41709fc342e68fdb2c4dbcdd30950040cc9cf5460c6f3834f9e3e4b0b3ada755
MD5 c3a57cc2f1993ad36730583a171fe063
BLAKE2b-256 68f517f8de7a385055c246de3f2b8cd463512d81fe0470cf6805bc3e18b1d23a

See more details on using hashes here.

File details

Details for the file datafluent-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: datafluent-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 5.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.9.5

File hashes

Hashes for datafluent-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 171fc85f2c179baa3e97fbac56125bc3eb2d361d019f1237397ecdb67c874673
MD5 04628fcc1db91efaa0a6a69905cdfbf9
BLAKE2b-256 06ad5fd0e512cdee5cb95059fb5d985ab62a4135e1ff945e7b2c0ce9a8f565a1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page