etl pipeline for investigations with follow the money data
Project description
investigraph
Research and implementation of an ETL process for a curated and up-to-date public and open-source data catalog of frequently used datasets in investigative journalism.
Using prefect.io for ftm pipeline processing
installation
pip install investigraph
example datasets
There is a dedicated repo for example datasets that can be used as a Block within the prefect.io deployment.
deployment
docker
docker-compose.yml
for local development / testing, use docker-compose.prod.yml
as a starting point for a production setup. More instructions here
run locally
Install app and dependencies (use a virtualenv):
pip install investigraph
Or, e.g. when using poetry:
poetry add investigraph
After installation, investigraph
as a command should be available:
investigraph --help
Quick run a local dataset definition:
investigraph run -c ./path/to/config.yml
Register a local datasets block:
investigraph add-block -b local-file-system/investigraph-local -u ./datasets
Register github datasets block:
investigraph add-block -b github/investigraph-datasets -u https://github.com/investigativedata/investigraph-datasets.git
Run a dataset pipeline from a dataset defined in a registered block:
investigraph run -d ec_meetings -b github/investigraph-datasets
View prefect dashboard:
make server
development
This package is using poetry for packaging and dependencies management, so first install it.
Clone investigraph repository to a local destination.
Within the root directory, run
poetry install --with dev
This installs a few development dependencies, including pre-commit which needs to be registered:
poetry run pre-commit install
Before creating a commit, this checks for correct code formatting (isort, black) and some other useful stuff (see: .pre-commit-config.yaml
)
test
make test
supported by
Media Tech Lab Bayern batch #3
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for investigraph-0.3.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f0ffbd6c34552ef86c68468aff685e59b5539bd0e0db2ba2231e20639a298e1e |
|
MD5 | 17f0d5740d084b9fc368106995b95bd9 |
|
BLAKE2b-256 | 40fa7fc5237c1382552eb418074206a568ba6fb7c296d732ef3cf980bf3c65be |