etl pipeline for investigations with follow the money data
Project description
investigraph
Research and implementation of an ETL process for a curated and up-to-date public and open-source data catalog of frequently used datasets in investigative journalism.
Using prefect.io for ftm pipeline processing
installation
investigraph
requires at least Python 3.11
pip install investigraph
example datasets
There is a dedicated repo for example datasets that can be used as a Block within the prefect.io deployment.
deployment
docker
docker-compose.yml
for local development / testing, use docker-compose.prod.yml
as a starting point for a production setup. More instructions here
run locally
Install app and dependencies (use a virtualenv):
pip install investigraph
Or, e.g. when using poetry:
poetry add investigraph
After installation, investigraph
as a command should be available:
investigraph --help
Quick run a local dataset definition:
investigraph run -c ./path/to/config.yml
Register a local datasets block:
investigraph add-block -b local-file-system/investigraph-local -u ./datasets
Register github datasets block:
investigraph add-block -b github/investigraph-datasets -u https://github.com/investigativedata/investigraph-datasets.git
Run a dataset pipeline from a dataset defined in a registered block:
investigraph run -d ec_meetings -b github/investigraph-datasets
View prefect dashboard:
make server
development
This package is using poetry for packaging and dependencies management, so first install it.
Clone investigraph repository to a local destination.
Within the root directory, run
poetry install --with dev
This installs a few development dependencies, including pre-commit which needs to be registered:
poetry run pre-commit install
Before creating a commit, this checks for correct code formatting (isort, black) and some other useful stuff (see: .pre-commit-config.yaml
)
test
make test
supported by
Media Tech Lab Bayern batch #3
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for investigraph-0.5.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c48fc587bfcb4c2ddcc6efe4d8cb9fd94ce96aac2a449610fb1f200bcdcb5502 |
|
MD5 | f6a0af36b7517ce8a136a909bf30f0e1 |
|
BLAKE2b-256 | 2d70627d20a323c5b7aaf8319214eee1aaea74baf35bcdaf0b2c7726ad575406 |