Skip to main content

etl pipeline for investigations with follow the money data

Project description

investigraph on pypi Python test and package Build docker container pre-commit Coverage Status MIT License

investigraph

Research and implementation of an ETL process for a curated and up-to-date public and open-source data catalog of frequently used datasets in investigative journalism.

Using prefect.io for ftm pipeline processing

Documentation

Tutorial

installation

investigraph requires at least Python 3.11

pip install investigraph

example datasets

There is a dedicated repo for example datasets that can be used as a Block within the prefect.io deployment.

deployment

docker

docker-compose.yml for local development / testing, use docker-compose.prod.yml as a starting point for a production setup. More instructions here

run locally

Install app and dependencies (use a virtualenv):

pip install investigraph

Or, e.g. when using poetry:

poetry add investigraph

After installation, investigraph as a command should be available:

investigraph --help

Quick run a local dataset definition:

investigraph run -c ./path/to/config.yml

Register a local datasets block:

investigraph add-block -b local-file-system/investigraph-local -u ./datasets

Register github datasets block:

investigraph add-block -b github/investigraph-datasets -u https://github.com/investigativedata/investigraph-datasets.git

Run a dataset pipeline from a dataset defined in a registered block:

investigraph run -d ec_meetings -b github/investigraph-datasets

View prefect dashboard:

make server

development

This package is using poetry for packaging and dependencies management, so first install it.

Clone investigraph repository to a local destination.

Within the root directory, run

poetry install --with dev

This installs a few development dependencies, including pre-commit which needs to be registered:

poetry run pre-commit install

Before creating a commit, this checks for correct code formatting (isort, black) and some other useful stuff (see: .pre-commit-config.yaml)

test

make test

supported by

Media Tech Lab Bayern batch #3

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

investigraph-0.5.1.tar.gz (19.9 kB view details)

Uploaded Source

Built Distribution

investigraph-0.5.1-py3-none-any.whl (26.6 kB view details)

Uploaded Python 3

File details

Details for the file investigraph-0.5.1.tar.gz.

File metadata

  • Download URL: investigraph-0.5.1.tar.gz
  • Upload date:
  • Size: 19.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.7.1 CPython/3.11.7 Linux/6.5.0-5-amd64

File hashes

Hashes for investigraph-0.5.1.tar.gz
Algorithm Hash digest
SHA256 42f602f5a1796b346ccf07071351e436b925335db0d9c882879c1da5c61bdfe5
MD5 7483f4fb0c50caaf1a066b0aece15c8e
BLAKE2b-256 2f948f6b8ea6fb696b9b7465a325a8639116ac1a5232aeb5bb270ff1f6752126

See more details on using hashes here.

File details

Details for the file investigraph-0.5.1-py3-none-any.whl.

File metadata

  • Download URL: investigraph-0.5.1-py3-none-any.whl
  • Upload date:
  • Size: 26.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.7.1 CPython/3.11.7 Linux/6.5.0-5-amd64

File hashes

Hashes for investigraph-0.5.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c48fc587bfcb4c2ddcc6efe4d8cb9fd94ce96aac2a449610fb1f200bcdcb5502
MD5 f6a0af36b7517ce8a136a909bf30f0e1
BLAKE2b-256 2d70627d20a323c5b7aaf8319214eee1aaea74baf35bcdaf0b2c7726ad575406

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page