Skip to main content

A simple python wrapper script based on pgfutter to load multiple dumped csv files into a postgres database.

Project description

Build Status PyPI License PyPI Version PyPI Python versions

This repository provides a python wrapper script based on pgfutter to load dumped csv data into a postgres database. It exposes customization hooks and comes as a container or standalone script.

Installation

Note: If you want to use docker, skip installation and see below.

pip install postgresimporter # using pip
pipx install postgresimporter # using pipx

Usage

PIP

If you installed the python executable and already have a local postgres database running, run

postgresimporter \
    path/to/my/csv/files \
    --db-host=localhost \
    --db-port=5432 \
    --db-user=postgres \
    --db-password=example \
    --combine-tables \
    --exclude-regex="^.*sample.*$" \
    --post-load path/to/my/hooks/post-load.sql

Docker

The same command when using the docker container looks like this:

docker run \
    --network host \
    -v path/to/my/csv/files:/import \
    -v path/to/my/hooks/post-load.sql:/post-load.sql \
    -e DB_HOST=localhost \
    -e DB_PORT=5432 \
    -e DB_USER=postgres \
    -e DB_PASSWORD=example \
    romnn/postgresimporter \
    --post-load=/post-load.sql --combine-tables --exclude-regex="^.*sample.*$" /import

Note: When using docker, environment variables (-e) must be used in favor of command line arguments for specifying database connection parameters.

The tools will scan the sources directory you specify for any .zip files and unzip them. Afterwards, it will scan for any .csv files and load them into a table named just like the file. Afterwards, it will try to combine any tables with the same prefix.

Usage

See --help for Configuration options.

If you want to spawn a complete setup including the loader, a postgres database and pgadmin as a postgres admin UI, you can use the provided docker-compose config:

docker-compose -f deployment/postgresimporter.compose.yml -f deployment/postgres.compose.yml up
docker-compose -f deployment/postgresimporter.compose.yml -f deployment/postgres.compose.yml down

To specify arguments for the postgresimporter, modify deployment/postgresimporter.compose.yml.

Notice: Before using the provided database container, make sure to stop any already running instances of postgres. When using linux, do:

sudo /etc/init.d/postgresql stop

Hooks

The tool comes with some example hooks and the ability to add your own hooks scripts. You might have a file importdir/animals_1.csv and importdir/animals_2.csv that looks like this:

name,origin,height
Grizzly,"North America",220
Giraffe,"Africa",600
Wallabie,"Australia",180

After importing importdir/, you will have three tables:

Table

Content

import.animals

importdir/animals_1 and importdir/animals_2 combined

import.animals_1

All from importdir/animals_1.csv

import.animals_2

All from importdir/animals_2.csv

All of these tables will have the schema defined by the csv file. However, all values will naturally be of type text. With the --post-load you might want to execute a post load sql script that defines a typed table and inserts the data like so:

CREATE TABLE public.animals (
    name VARCHAR(200) PRIMARY KEY,
    origin VARCHAR(200),
    height INTEGER
);

INSERT INTO public.animals
SELECT name, origin, height::int
FROM import.animals

Configuration options

Option

Description

Default

Required

sources

List of csv files to load. Entries can either be directories or files.

None

yes

--disable-unzip

Disables unzipping of any *.zip archives in the source directory

False

no

--disable-import

Disables import of any *.csv files into the database

False

no

--disable-check

Disables checking csv row count and database row count after import

False

no

--combine-tables

Enabled combining of imported csv file tables into one table named by prefix (e.g. weather_1 & weather_2 -> weather)

False

no

--exclude-regex

Files matching this regex will not be processed

None

no

--pre-load

List of *.sql scripts to be executed before importing into the database (e.g. to clean the database). Entries can either be directories or files.

None

no

--post-load

List of *.sql scripts to be executed after import (e.g. normalization). . Entries can either be directories or files.

None

no

--all

Unzip and import all archives and zip files again

False

no

--db-name

PostgreSQL database name

postgres

no

--db-host

PostgreSQL database host

localhost

no

--db-port

PostgreSQL database port

5432

no

--db-user

PostgreSQL database user

postgres

no

--db-password

PostgreSQL database password

None

no

--log-level

Log level (DEBUG, INFO, WARNING, ERROR or FATAL)

INFO

no

Note: You can also specify database connection settings via DB_NAME, DB_HOST, DB_PORT, DB_USER and DB_PASSWORD environment variables.

Local installation

Clone this repository and run (assuming you have python 3.5+ and pgfutter installed):

pip install -r requirements.txt  # using pip
pipenv install --dev  # or using pipenv

Development

If you do not have pipx and pipenv, install with

python3 -m pip install --user pipx
python3 -m pipx ensurepath
pipx install pipenv

Install all dependencies with

pipenv install --dev

To format, sort imports and check PEP8 conformity, run

pipenv run black .
pipenv run isort
pipenv run flake8

These above checks are also configured as a git pre commit hook together with the TestSuite. Before you commit, make sure to run ./pre-commit.sh to resolve any errors in advance.

After merging new changes, a new version is deployed to pypi.org when the version is tagged with bump2version (subminor|minor|major).

Testing

This project is not under active maintenance and not tested for production use. However, a small test suite is provided and can be run with:

python -m postgresimporter.tests.run_tests

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

postgresimporter-0.2.0.tar.gz (12.3 kB view hashes)

Uploaded source

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Huawei Huawei PSF Sponsor Microsoft Microsoft PSF Sponsor NVIDIA NVIDIA PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page