A tool to rescue (download) data from CKAN portals.
Project description
CKAN Rescue
A Python CLI tool to rescue (download) data from CKAN portals that implement the ckanext-datajson extension. This tool downloads all datasets and their distributions from a CKAN portal's data.json endpoint, organizing them in a structured directory format.
Description
CKAN Rescue allows you to bulk download datasets from CKAN data portals by fetching their data.json file and downloading all associated data files. The tool creates an organized directory structure based on the portal's homepage and dataset identifiers, making it easy to archive or backup entire data portals.
Key features:
- Parallel downloads with configurable thread count
- Organized directory structure by portal and dataset
- Comprehensive logging of successful and failed downloads
- Preserves original filenames when available
- Handles large data portals efficiently
Installation from PyPI
Install the latest version using pip:
pip install ckan-rescue
Or install using uv:
uv add ckan-rescue
How to Use
Basic Usage
ckan-dcat-download <data.json_url>
Advanced Usage
# Specify output directory
ckan-dcat-download https://example.com/data.json -o /path/to/output
# Use more threads for faster downloads
ckan-dcat-download https://example.com/data.json -t 10
# Combine options
ckan-dcat-download https://example.com/data.json -o downloads -t 8
Command Line Options
url(required): URL of the data.json file from the CKAN portal-o, --output: Output directory (default:output)-t, --threads: Number of threads for parallel downloads (default: 5)-v, --version: Show version information-h, --help: Show help message
Examples
Download from a government data portal:
ckan-dcat-download https://data.gov/data.json
Download to a specific directory with 10 parallel threads:
ckan-dcat-download https://opendata.city.gov/data.json -o city_data -t 10
Output Structure
The tool creates the following directory structure:
output/
└── <portal_homepage>/
├── data.json # Original data.json file
├── logs.txt # Download logs
└── data/
└── <dataset_id>/
└── <distribution_id>/
└── <filename> # Downloaded data file
Example Output Structure
output/
└── data.example.gov/
├── data.json
├── logs.txt
└── data/
├── population-data-2023/
│ ├── csv-distribution/
│ │ └── population.csv
│ └── json-distribution/
│ └── population.json
└── budget-dataset/
└── excel-distribution/
└── budget_2023.xlsx
How to Develop
This project uses uv for dependency management and development.
Prerequisites
Install uv if you haven't already:
curl -LsSf https://astral.sh/uv/install.sh | sh
Development Setup
- Clone the repository:
git clone https://github.com/pdelboca/ckan-rescue.git
cd ckan-rescue
- Create and activate a virtual environment:
uv venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
- Install the project in development mode:
uv pip install -e .
Local Testing
Test your changes locally:
# Install in development mode
uv pip install -e .
# Test the CLI
ckan-dcat-download --help
How to Publish to PyPI
This project uses uv for building and publishing to PyPI.
Publishing Steps
- Update version: Update the version of the project:
uv version --bump [patch|minor|major]
- Build the package:
uv build
- Create tag and commit files:
git add pyproject.toml uv.lock # Edited by uv version --bump
git commit -a -m "bump: Release v<NEW_VERSION>"
git tag "v<NEW_VERSION>"
git push --tags
- Publish to PyPI:
# Publish to PyPI
uv publish --token <YOUR_PYPI_TOKEN>
# Or publish to TestPyPI first (recommended)
uv publish --index-url https://test.pypi.org/simple/
- Create Github Release: Create a Github Release to document the new version.
Issues
If you encounter any problems or have feature requests, please file an issue at GitHub Issues.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ckan_rescue-0.0.4.tar.gz.
File metadata
- Download URL: ckan_rescue-0.0.4.tar.gz
- Upload date:
- Size: 5.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e9866159d85b8d6bf86cb6d3b0fcc4285b97f42df8d6ae275ce73e09196148fa
|
|
| MD5 |
f273a671ae66c51f533feb8cff104953
|
|
| BLAKE2b-256 |
3a1e291285cd6b86214f0076ca3f26d972ac326302580e1be89a290a09d76b65
|
File details
Details for the file ckan_rescue-0.0.4-py3-none-any.whl.
File metadata
- Download URL: ckan_rescue-0.0.4-py3-none-any.whl
- Upload date:
- Size: 6.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7a9ee01a099b7737ea2f779e331981d25df24fe3d301d708fdaaba9aef434a43
|
|
| MD5 |
4904aeebbc643ad0627d2fdab48a0000
|
|
| BLAKE2b-256 |
88ac52a87566ade9dab12e2da93ad1c22a19382dbd52cd9bb23b99310da8388d
|