Skip to main content

Build a better understanding of your data in PostgreSQL.

Project description

Data Fluent for PostgreSQL

Documentation Status license PyPI - Python Version

Build a better understanding of your data in PostgreSQL.

The following shows an example report generated by this tool. It gives the numbers of rows, columns, bytes as well as human-friendly size counts for each table within a given PostgreSQL database.

The Metrics Report

The following shows the row count for every column that represents a date grouped by year and month.

The Time Distribution Report

Installation

On Ubuntu 20:

$ wget -qO- \
    https://www.postgresql.org/media/keys/ACCC4CF8.asc \
        | sudo apt-key add -
$ echo "deb http://apt.postgresql.org/pub/repos/apt/ xenial-pgdg main" \
    | sudo tee /etc/apt/sources.list.d/pgdg.list

$ sudo apt update
$ sudo apt install \
    git \
    python3-pip \
    python3-virtualenv \
    postgresql-13 \
    postgresql-client-13 \
    postgresql-contrib

On macOS:

$ brew install \
    git \
    postgresql \
    virtualenv

Then, regardless of platform, setup a virtual environment and install this software package.

$ virtualenv ~/.fluency
$ source ~/.fluency/bin/activate
$ python3 -m pip install datafluent

Example Analysis

Clone fivethirtyeight's data repo. It has a large number of CSV-formatted datasets.

$ git clone https://github.com/fivethirtyeight/data.git ~/538data

Make sure you can access a PostgreSQL database on your machine. Here I'm creating an intel database for the mark user on my Ubuntu 20 machine.

$ sudo -u postgres \
    bash -c "psql -c \"CREATE USER mark
                       WITH PASSWORD 'test'
                       SUPERUSER;\""

With PostgreSQL access setup, create a database called intel.

$ createdb intel

I'll import one of the datasets within fivethirtyeight's repo. Note, because the dates within this dataset are not formatted in YYYY-MM-DD format, I needed to override the format so that the MM/DD/YYYY format would be read properly.

$ csvsql --db postgresql:///intel \
         --insert ~/538data/congress-generic-ballot/generic_topline_historical.csv \
         --datetime-format="%m/%d/%Y"

I'll run the Excel Report Generator:

$ datafluent_pg

This will result in a fluency.xlsx file being produced with two worksheets: Metrics and Time Distributions.

If you need to override any parameters, please refer to the documentation:

$ datafluent_pg --help
Usage: datafluent [OPTIONS]

Options:
  --dns TEXT                      [default: postgresql://localhost:5432/intel]
  --output TEXT                   [default: fluency.xlsx]
  --install-completion [bash|zsh|fish|powershell|pwsh]
                                  Install completion for the specified shell.
  --show-completion [bash|zsh|fish|powershell|pwsh]
                                  Show completion for the specified shell, to
                                  copy it or customize the installation.

  --help                          Show this message and exit.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datafluent-0.0.12.tar.gz (5.7 kB view details)

Uploaded Source

Built Distribution

datafluent-0.0.12-py3-none-any.whl (5.7 kB view details)

Uploaded Python 3

File details

Details for the file datafluent-0.0.12.tar.gz.

File metadata

  • Download URL: datafluent-0.0.12.tar.gz
  • Upload date:
  • Size: 5.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.9.5

File hashes

Hashes for datafluent-0.0.12.tar.gz
Algorithm Hash digest
SHA256 f9bbd385bc1f3eb813a36acc84ba536920d35ba5b6a31a4bb808613acd8525fe
MD5 eb5456eb3eb5b2d83bb46e4fd6ead93f
BLAKE2b-256 2f12f9d0284ec90dd555f498eda444b226b5c49009dca7c2fff747e5ade8556c

See more details on using hashes here.

File details

Details for the file datafluent-0.0.12-py3-none-any.whl.

File metadata

  • Download URL: datafluent-0.0.12-py3-none-any.whl
  • Upload date:
  • Size: 5.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.9.5

File hashes

Hashes for datafluent-0.0.12-py3-none-any.whl
Algorithm Hash digest
SHA256 239b19cb0a158444999b255c3907e5c1147556d2d42badd89f96ece3cedcd16d
MD5 17096dca50897db0a579725a52325925
BLAKE2b-256 489952bc66cc7bbc3efec8a4b891b2385bf3ceee93e87fd4f7fefa0e0c4f685d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page