etl pipeline for investigations with follow the money data
Project description
investigraph
Research and implementation of an ETL process for a curated and up-to-date public and open-source data catalog of frequently used datasets in investigative journalism.
Using prefect.io for ftm pipeline processing
installation
pip install investigraph
example datasets
There is a dedicated repo for example datasets that can be used as a Block within the prefect.io deployment.
deployment
docker
docker-compose.yml
for local development / testing, use docker-compose.prod.yml
as a starting point for a production setup. More instructions here
run locally
Install app and dependencies (use a virtualenv):
pip install investigraph
Or, e.g. when using poetry:
poetry add investigraph
After installation, investigraph
as a command should be available:
investigraph --help
Quick run a local dataset definition:
investigraph run -c ./path/to/config.yml
Register a local datasets block:
investigraph add-block -b local-file-system/investigraph-local -u ./datasets
Register github datasets block:
investigraph add-block -b github/investigraph-datasets -u https://github.com/investigativedata/investigraph-datasets.git
Run a dataset pipeline from a dataset defined in a registered block:
investigraph run -d ec_meetings -b github/investigraph-datasets
View prefect dashboard:
make server
development
This package is using poetry for packaging and dependencies management, so first install it.
Clone investigraph repository to a local destination.
Within the root directory, run
poetry install --with dev
This installs a few development dependencies, including pre-commit which needs to be registered:
poetry run pre-commit install
Before creating a commit, this checks for correct code formatting (isort, black) and some other useful stuff (see: .pre-commit-config.yaml
)
test
make test
supported by
Media Tech Lab Bayern batch #3
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file investigraph-0.3.2.tar.gz
.
File metadata
- Download URL: investigraph-0.3.2.tar.gz
- Upload date:
- Size: 20.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.6.1 CPython/3.11.5 Linux/6.4.0-3-amd64
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | aff3e1e0b7cb51777edfe1992ec43af019117611fe2d8f4e156d86b841a4292f |
|
MD5 | e5f657631f7ff94c5b6918788759ed33 |
|
BLAKE2b-256 | f0b898d36d3436a9e0aee306f9bc021a28689e1ec9559428a2ae8344ce484f34 |
File details
Details for the file investigraph-0.3.2-py3-none-any.whl
.
File metadata
- Download URL: investigraph-0.3.2-py3-none-any.whl
- Upload date:
- Size: 27.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.6.1 CPython/3.11.5 Linux/6.4.0-3-amd64
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 95ba9ef563710750afa4067080e1e71aa889bef2974bab651770662e235b12f4 |
|
MD5 | 9c9b7534d63a99d00af06cad09ecc254 |
|
BLAKE2b-256 | d09a57d36ecc9929c7e5c5f723c975451e226919ab6cb47a6788a931b7b72524 |