etl pipeline for investigations with follow the money data
Project description
investigraph
Research and implementation of an ETL process for a curated and up-to-date public and open-source data catalog of frequently used datasets in investigative journalism.
Using prefect.io for ftm pipeline processing
installation
investigraph
requires at least Python 3.11
pip install investigraph
example datasets
There is a dedicated repo for example datasets built with investigraph.
deployment
docker
docker-compose.yml
for local development / testing, use docker-compose.prod.yml
as a starting point for a production setup. More instructions here
run locally
Install app and dependencies (use a virtualenv):
pip install investigraph
Or, e.g. when using poetry:
poetry add investigraph
After installation, investigraph
as a command should be available:
investigraph --help
Quick run a local dataset definition:
investigraph run -c ./path/to/config.yml
View prefect dashboard:
make server
development
This package is using poetry for packaging and dependencies management, so first install it.
Clone investigraph repository to a local destination.
Within the root directory, run
poetry install --with dev
This installs a few development dependencies, including pre-commit which needs to be registered:
poetry run pre-commit install
Before creating a commit, this checks for correct code formatting (isort, black) and some other useful stuff (see: .pre-commit-config.yaml
)
test
make test
supported by
Media Tech Lab Bayern batch #3
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for investigraph-0.5.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d7a00b720f51736ecf1630c3cbf57c5702a0d028a7d749e5bd5d8a73fcdf7d1e |
|
MD5 | 37b47c73cf9b5504ab66c83c34d210b7 |
|
BLAKE2b-256 | a24eea3256e090bf6dfd3f1ee5828dd3cb07d76e1725d841e01bebce22e032b4 |