Helper package for irv-datapkg workflow
Project description
Infrastructure Resilience Assessment Data Packages
Standalone workflow to create national scale open-data packages from global open datasets.
Setup
Get the latest code by cloning this repository:
git clone git@github.com:nismod/irv-datapkg.git
or
git clone https://github.com/nismod/irv-datapkg.git
Install Python and packages - suggest using micromamba:
micromamba create -f environment.yml
Activate the environment:
micromamba activate datapkg
Run
The data packages are produced using a
snakemake
workflow.
The workflow expects ZENODO_TOKEN
to be set as an environment variable - this
must be set before running any workflow steps.
If not interacting with Zenodo, this can be a dummy string:
echo "placeholder" > ZENODO_TOKEN
Export from the file to the environment:
export ZENODO_TOKEN=$(cat ZENODO_TOKEN)
Check what will be run, if we ask for everything produced by the rule all
,
before running the workflow for real:
snakemake --dry-run all
Run the workflow, asking for all
, using 8 cores, with verbose log messages:
snakemake --cores 8 --verbose all
Upload and publish
To publish, first create a Zenodo token,
save it and export it as the ZENODO_TOKEN
environment variable.
Upload a single data package:
snakemake --cores 1 zenodo/GBR.deposited
Publish (cannot be undone) either programmatically:
snakemake --cores 1 zenodo/GBR.published
Or after review online, through the Zenodo website (sandbox, live)
Post-publication
To get a quick list of DOIs from the Zenodo package json:
cat zenodo/*.deposition.json | jq '.metadata.prereserve_doi.doi'
To generate records.csv
with details of published packages:
python scripts/published_metadata.py
Development Notes
In case of warnings about GDAL_DATA
not being set, try running:
export GDAL_DATA=$(gdal-config --datadir)
To format the workflow definition Snakefile
:
snakefmt Snakefile
To format the Python helper scripts:
black scripts
Related work
These Python libraries may be a useful place to start analysis of the data in the packages produced by this workflow:
snkit
helps clean network datanismod-snail
is designed to help implement infrastructure exposure, damage and risk calculations
The open-gira
repository contains a larger
workflow for global-scale open-data infrastructure risk and resilience analysis.
Acknowledgments
MIT License, Copyright (c) 2023 Tom Russell and irv-datapkg contributors
This research received funding from the FCDO Climate Compatible Growth Programme. The views expressed here do not necessarily reflect the UK government's official policies.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file irv-datapkg-0.1.2.tar.gz
.
File metadata
- Download URL: irv-datapkg-0.1.2.tar.gz
- Upload date:
- Size: 4.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.0.0 CPython/3.12.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c029ee8b008ba479402664b6fd5d4ed9e270330cae5702b1f3de2d9ef58e46e4 |
|
MD5 | a3e649db82df2eac802ddccfec97f38d |
|
BLAKE2b-256 | b3698b23170de4d6f6692d92e59698133901ccee1f83bb16b1bcc22764dd8efc |
File details
Details for the file irv_datapkg-0.1.2-py3-none-any.whl
.
File metadata
- Download URL: irv_datapkg-0.1.2-py3-none-any.whl
- Upload date:
- Size: 4.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.0.0 CPython/3.12.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0295dc3eb7e8a317a1c8b20a708dca56ba2b1397d99dc39b297c322ee94b9c47 |
|
MD5 | 3aafafa0e118c6a379247da7883292fc |
|
BLAKE2b-256 | 5649482f34419fe6fed6aa64c45429cf9d0766ee228a10f88cd01b76c158bfce |