etl pipeline for investigations with follow the money data
Project description
investigraph
Research and implementation of an ETL process for a curated and up-to-date public and open-source data catalog of frequently used datasets in investigative journalism.
Using prefect.io for ftm pipeline processing
installation
pip install investigraph
example datasets
There is a dedicated repo for example datasets that can be used as a Block within the prefect.io deployment.
deployment
docker
docker-compose.yml
for local development / testing, use docker-compose.prod.yml
as a starting point for a production setup. More instructions here
run locally
Install app and dependencies (use a virtualenv):
pip install investigraph
Or, e.g. when using poetry:
poetry add investigraph
After installation, investigraph
as a command should be available:
investigraph --help
Quick run a local dataset definition:
investigraph run -c ./path/to/config.yml
Register a local datasets block:
investigraph add-block -b local-file-system/investigraph-local -u ./datasets
Register github datasets block:
investigraph add-block -b github/investigraph-datasets -u https://github.com/investigativedata/investigraph-datasets.git
Run a dataset pipeline from a dataset defined in a registered block:
investigraph run -d ec_meetings -b github/investigraph-datasets
View prefect dashboard:
make server
development
This package is using poetry for packaging and dependencies management, so first install it.
Clone investigraph repository to a local destination.
Within the root directory, run
poetry install --with dev
This installs a few development dependencies, including pre-commit which needs to be registered:
poetry run pre-commit install
Before creating a commit, this checks for correct code formatting (isort, black) and some other useful stuff (see: .pre-commit-config.yaml
)
test
make test
supported by
Media Tech Lab Bayern batch #3
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for investigraph-0.3.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8cdb9e34c3c1b394f3244a3117a7f1b77744ae1edee435ef07c6219aef39dbe0 |
|
MD5 | 8e13681c296ca707f1a73e5462d8453d |
|
BLAKE2b-256 | 81883f4028bae5e254ff90431409ced6526b8f191e3943c4219bc01ca444f3b2 |