CLI tool to download and sync files from udata platforms (data.public.lu and others)
Project description
udata-dl
A CLI tool to download and synchronize files from udata platforms (data.public.lu and others).
Features
- Download all files from any udata platform organization
- Support for multiple udata instances (data.public.lu, or any custom instance)
- Organize files by dataset
- Intelligent synchronization with checksum verification:
- Uses checksums from API when available
- Re-downloads files without checksums to ensure freshness
- Only skips downloads when checksums match
- Automatic cleanup: Deletes local files that have been removed from the platform
- Force re-download option
- Dry-run mode to preview downloads
- Progress tracking with rich console output
- No authentication required
- API endpoints which can be referenced from some datasets are excluded from download.
Installation
Using pipx (Recommended)
pipx install udata-dl
Using pip
pip install udata-dl
From source
git clone <repository-url>
cd udata-dl
pip install .
Usage
Basic Usage
Download all files from an organization:
udata-dl societe-nationale-des-chemins-de-fer-luxembourgeois
This will download all files to ./societe-nationale-des-chemins-de-fer-luxembourgeois/ organized by dataset.
Download a single dataset:
udata-dl --dataset daily-meteorological-parameters-luxembourg-findel-airport-wmo-id-06590
This will download only the specified dataset. The organization is automatically determined from the dataset's metadata, and files are saved to the appropriate directory structure.
Options
udata-dl [OPTIONS] [ORGANIZATION]
Arguments:
ORGANIZATION: The identifier (ID or slug) of the organization (required unless--datasetis used)
Options:
-d, --dataset DATASET: Download only a specific dataset (by ID or slug). Mutually exclusive with ORGANIZATION.-o, --output PATH: Output directory for downloaded files (default: .`)-u, --api-url URL: Base URL of the udata API (default:https://data.public.lu/api/1)-f, --force: Force download even if files already exist-n, --dry-run: Show what would be downloaded without actually downloading-l, --log-file PATH: Save logs to a file--version: Show version and exit--help: Show help message and exit
Note: You must specify either ORGANIZATION or --dataset, but not both. They are mutually exclusive.
Examples
Download all datasets from an organization:
udata-dl societe-nationale-des-chemins-de-fer-luxembourgeois
Download a single dataset (organization is auto-detected from dataset metadata):
udata-dl --dataset daily-meteorological-parameters-luxembourg-findel-airport-wmo-id-06590
Use a different udata instance:
udata-dl my-organization --api-url https://data.other-instance.org/api/1
Use just the domain (will automatically add https and /api/1):
udata-dl my-organization -u data.other-instance.org
Download to a custom directory:
udata-dl societe-nationale-des-chemins-de-fer-luxembourgeois -o /path/to/data
Force re-download all files:
udata-dl societe-nationale-des-chemins-de-fer-luxembourgeois --force
Preview what would be downloaded (dry run):
udata-dl societe-nationale-des-chemins-de-fer-luxembourgeois --dry-run
Download a single dataset in dry-run mode:
udata-dl --dataset my-dataset-slug --dry-run
Save logs to a file:
udata-dl societe-nationale-des-chemins-de-fer-luxembourgeois --log-file download.log
Synchronization
The tool supports synchronization for both organizations and individual datasets:
- First run: Downloads all files to your local directory
- Subsequent runs:
- Skips files that match the checksum from the API
- Re-downloads files without checksums to ensure they're up to date
- Deletes local files that no longer exist in the API
- Force mode: Re-downloads everything regardless of checksums
Automatic cleanup:
- Files removed from udata are automatically deleted locally
- Empty dataset directories are cleaned up
- This keeps your local mirror in perfect sync
To keep your local copy in sync, simply run the same command periodically:
udata-dl societe-nationale-des-chemins-de-fer-luxembourgeois
File Organization
Files are organized in the following structure:
output_directory/
└── organization-slug/
├── dataset-slug-1/
│ ├── file1.csv
│ ├── file2.pdf
│ └── ...
├── dataset-slug-2/
│ ├── file1.json
│ └── ...
└── ...
The tool automatically fetches the organization's slug from the API and uses it for the folder name, making the structure more readable and URL-friendly.
Finding Organizations and Datasets
Finding Organizations
To find an organization on data.public.lu:
- Visit data.public.lu
- Navigate to the organization's page
- You can use either:
- The organization slug from the URL:
https://data.public.lu/fr/organizations/{slug}/ - The organization ID (also works)
- The organization slug from the URL:
The tool accepts both formats and will automatically resolve the slug for folder naming.
Example organization slugs from data.public.lu:
societe-nationale-des-chemins-de-fer-luxembourgeois- CFL (Luxembourg Railways)administration-de-la-gestion-de-leau- Water Management Administrationstatec-institut-national-de-la-statistique-et-des-etudes-economiques-du-grand-duche-de-luxembourg- STATEC
Finding Datasets
To find a specific dataset on data.public.lu:
- Visit data.public.lu
- Navigate to the dataset's page
- Use the dataset slug from the URL:
https://data.public.lu/fr/datasets/{slug}/
Example dataset slug:
daily-meteorological-parameters-luxembourg-findel-airport-wmo-id-06590- Daily weather data from Luxembourg Findel Airport
You can download a dataset directly without specifying its organization:
udata-dl --dataset daily-meteorological-parameters-luxembourg-findel-airport-wmo-id-06590
The tool will automatically determine the organization from the dataset metadata and use it for folder structure.
Requirements
- Python 3.8 or higher
- Internet connection
Dependencies
click- Command-line interface frameworkrequests- HTTP library for API callsrich- Beautiful terminal output
Development
Setup Development Environment
git clone <repository-url>
cd udata-dl
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -e .
Project Structure
udata-dl/
├── udata_dl/
│ ├── __init__.py # Package initialization
│ ├── cli.py # CLI interface
│ └── downloader.py # Download and sync logic
├── tests/
│ ├── __init__.py # Test package
│ ├── conftest.py # Shared fixtures
│ ├── test_downloader.py # Downloader tests
│ ├── test_cli.py # CLI tests
│ └── test_integration.py # Integration tests
├── pyproject.toml # Project configuration
├── pytest.ini # Pytest configuration
├── README.md # This file
└── LICENSE # MIT License
Running Tests
Install development dependencies:
pip install -e ".[dev]"
Run unit tests (fast, no network required):
pytest
Run with coverage report:
pytest --cov=udata_dl --cov-report=html
Run integration tests (requires network, slower):
pytest -m integration
Run all tests including integration:
pytest -m ""
Run specific test file:
pytest tests/test_downloader.py -v
For detailed testing information, see TESTING.md.
API Reference
The tool uses the data.public.lu API v1:
- API Base:
https://data.public.lu/api/1 - Endpoint:
GET /datasets/ - Documentation: https://data.public.lu/api/1/swagger.json
License
MIT License - see LICENSE file for details.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Support
For issues and questions:
- Create an issue on the GitHub repository
- Check existing issues for solutions
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file udata_dl-0.4.0.tar.gz.
File metadata
- Download URL: udata_dl-0.4.0.tar.gz
- Upload date:
- Size: 20.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
715afb767aaa173cc2890a6e5accefd176b3571b0b17e2dbe235e41741c76760
|
|
| MD5 |
ef3f30c471c866f85643f523efe61c56
|
|
| BLAKE2b-256 |
63744436cdf28905e6742222ddd5058cf11ed25c6d63ac2718f1574ebe648630
|
File details
Details for the file udata_dl-0.4.0-py3-none-any.whl.
File metadata
- Download URL: udata_dl-0.4.0-py3-none-any.whl
- Upload date:
- Size: 12.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f0a459533a706052e2adf761167db5b392d7b6c26811b001f5fd1c89bdbdb0eb
|
|
| MD5 |
505760d4b9040fcea6e283a7dca07a1e
|
|
| BLAKE2b-256 |
10ac54cb8c7a1b275b3abe14cf2d9d06f5d599321abdcb8085f2250a9a6819a8
|