etl pipeline for investigations with follow the money data
Project description
investigraph
Research and implementation of an ETL process for a curated and up-to-date public and open-source data catalog of frequently used datasets in investigative journalism.
Using prefect.io for ftm pipeline processing
installation
pip install investigraph
example datasets
There is a dedicated repo for example datasets that can be used as a Block within the prefect.io deployment.
deployment
docker
docker-compose.yml
for local development / testing, use docker-compose.prod.yml
as a starting point for a production setup. More instructions here
run locally
Clone repo first.
Install app and dependencies (use a virtualenv):
pip install -e .
After installation, investigraph
as a command should be available:
investigraph --help
Quick run a local dataset definition:
investigraph run <dataset_name> -c ./path/to/config.yml
Register a local datasets block:
investigraph add-block -b local-file-system/investigraph-local -u ./datasets
Register github datasets block:
investigraph add-block -b github/investigraph-datasets -u https://github.com/investigativedata/investigraph-datasets.git
Run a dataset pipeline from a dataset defined in a registered block:
investigraph run ec_meetings
View prefect dashboard:
make server
test
make install
make test
supported by
Media Tech Lab Bayern batch #3
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for investigraph-0.0.4-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 97a201cc7274c5d45366b6b02d4d45ecf805e946257668c5d5ab56ac4a10bc9b |
|
MD5 | 8699db160523ead77fad539269eceb82 |
|
BLAKE2b-256 | 51382304790fa4bcd9d7ea7276120ddaf55be0cbf46a0b6fa920a38cf78ec7d7 |