etl pipeline for investigations with follow the money data
Project description
investigraph
Research and implementation of an ETL process for a curated and up-to-date public and open-source data catalog of frequently used datasets in investigative journalism.
Using prefect.io for ftm pipeline processing
installation
investigraph
requires at least Python 3.11
pip install investigraph
example datasets
There is a dedicated repo for example datasets built with investigraph.
deployment
docker
docker-compose.yml
for local development / testing, use docker-compose.prod.yml
as a starting point for a production setup. More instructions here
run locally
Install app and dependencies (use a virtualenv):
pip install investigraph
Or, e.g. when using poetry:
poetry add investigraph
After installation, investigraph
as a command should be available:
investigraph --help
Quick run a local dataset definition:
investigraph run -c ./path/to/config.yml
View prefect dashboard:
make server
development
This package is using poetry for packaging and dependencies management, so first install it.
Clone investigraph repository to a local destination.
Within the root directory, run
poetry install --with dev
This installs a few development dependencies, including pre-commit which needs to be registered:
poetry run pre-commit install
Before creating a commit, this checks for correct code formatting (isort, black) and some other useful stuff (see: .pre-commit-config.yaml
)
test
make test
supported by
Media Tech Lab Bayern batch #3
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for investigraph-0.6.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 58ae7ff56bd727326b9b613cbd0a986f2a42300dd93653e861ae23e406fb7561 |
|
MD5 | f1cecd1b3c9080b5c4d6d58f674185e1 |
|
BLAKE2b-256 | d541265e31f79b319f4970e062e1a7136858ce65e2c528266788aca505495b2f |