Skip to main content

CLI tool to download and sync files from udata platforms (data.public.lu and others)

Project description

udata-dl

A CLI tool to download and synchronize files from udata platforms (data.public.lu and others).

Features

  • Download all files from any udata platform organization
  • Support for multiple udata instances (data.public.lu, or any custom instance)
  • Organize files by dataset
  • Intelligent synchronization with checksum verification:
    • Uses checksums from API when available
    • Re-downloads files without checksums to ensure freshness
    • Only skips downloads when checksums match
  • Automatic cleanup: Deletes local files that have been removed from the platform
  • Force re-download option
  • Option to download only latest file of a dataset
  • Dry-run mode to preview downloads
  • Progress tracking with rich console output
  • No authentication required
  • API endpoints which can be referenced from some datasets are excluded from download.

Installation

Using pipx (Recommended)

pipx install udata-dl

Using pip

pip install udata-dl

From source

git clone <repository-url>
cd udata-dl
pip install .

Usage

Basic Usage

Download all files from an organization:

udata-dl societe-nationale-des-chemins-de-fer-luxembourgeois

This will download all files to ./societe-nationale-des-chemins-de-fer-luxembourgeois/ organized by dataset.

Download a single dataset:

udata-dl --dataset daily-meteorological-parameters-luxembourg-findel-airport-wmo-id-06590

This will download only the specified dataset. The organization is automatically determined from the dataset's metadata, and files are saved to the appropriate directory structure.

Download the latest file of a dataset:

udata-dl --dataset operations-delta-des-vehicules-au-luxembourg --latest

This will download only the latest file of the specified dataset. The organization is automatically determined from the dataset's metadata, and files are saved to the appropriate directory structure.

Options

udata-dl [OPTIONS] [ORGANIZATION]

Arguments:

  • ORGANIZATION: The identifier (ID or slug) of the organization (required unless --dataset is used)

Options:

  • -d, --dataset DATASET: Download only a specific dataset (by ID or slug). Mutually exclusive with ORGANIZATION.
  • -o, --output PATH: Output directory for downloaded files (default: .`)
  • -u, --api-url URL: Base URL of the udata API (default: https://data.public.lu/api/1)
  • -f, --force: Force download even if files already exist
  • -n, --dry-run: Show what would be downloaded without actually downloading
  • `--latest': Download only the latest file of a given dataset
  • -l, --log-file PATH: Save logs to a file
  • --version: Show version and exit
  • --help: Show help message and exit

Note: You must specify either ORGANIZATION or --dataset, but not both. They are mutually exclusive.

Examples

Download all datasets from an organization:

udata-dl societe-nationale-des-chemins-de-fer-luxembourgeois

Download a single dataset (organization is auto-detected from dataset metadata):

udata-dl --dataset daily-meteorological-parameters-luxembourg-findel-airport-wmo-id-06590

Use a different udata instance:

udata-dl my-organization --api-url https://data.other-instance.org/api/1

Use just the domain (will automatically add https and /api/1):

udata-dl my-organization -u data.other-instance.org

Download to a custom directory:

udata-dl societe-nationale-des-chemins-de-fer-luxembourgeois -o /path/to/data

Force re-download all files:

udata-dl societe-nationale-des-chemins-de-fer-luxembourgeois --force

Preview what would be downloaded (dry run):

udata-dl societe-nationale-des-chemins-de-fer-luxembourgeois --dry-run

Download a single dataset in dry-run mode:

udata-dl --dataset my-dataset-slug --dry-run

Save logs to a file:

udata-dl societe-nationale-des-chemins-de-fer-luxembourgeois --log-file download.log

Synchronization

The tool supports synchronization for both organizations and individual datasets:

  1. First run: Downloads all files to your local directory
  2. Subsequent runs:
    • Skips files that match the checksum from the API
    • Re-downloads files without checksums to ensure they're up to date
    • Deletes local files that no longer exist in the API
  3. Force mode: Re-downloads everything regardless of checksums

Automatic cleanup:

  • Files removed from udata are automatically deleted locally
  • Empty dataset directories are cleaned up
  • This keeps your local mirror in perfect sync

To keep your local copy in sync, simply run the same command periodically:

udata-dl societe-nationale-des-chemins-de-fer-luxembourgeois

File Organization

Files are organized in the following structure:

output_directory/
└── organization-slug/
    ├── dataset-slug-1/
    │   ├── file1.csv
    │   ├── file2.pdf
    │   └── ...
    ├── dataset-slug-2/
    │   ├── file1.json
    │   └── ...
    └── ...

The tool automatically fetches the organization's slug from the API and uses it for the folder name, making the structure more readable and URL-friendly.

Finding Organizations and Datasets

Finding Organizations

To find an organization on data.public.lu:

  1. Visit data.public.lu
  2. Navigate to the organization's page
  3. You can use either:
    • The organization slug from the URL: https://data.public.lu/fr/organizations/{slug}/
    • The organization ID (also works)

The tool accepts both formats and will automatically resolve the slug for folder naming.

Example organization slugs from data.public.lu:

  • societe-nationale-des-chemins-de-fer-luxembourgeois - CFL (Luxembourg Railways)
  • administration-de-la-gestion-de-leau - Water Management Administration
  • statec-institut-national-de-la-statistique-et-des-etudes-economiques-du-grand-duche-de-luxembourg - STATEC

Finding Datasets

To find a specific dataset on data.public.lu:

  1. Visit data.public.lu
  2. Navigate to the dataset's page
  3. Use the dataset slug from the URL: https://data.public.lu/fr/datasets/{slug}/

Example dataset slug:

  • daily-meteorological-parameters-luxembourg-findel-airport-wmo-id-06590 - Daily weather data from Luxembourg Findel Airport

You can download a dataset directly without specifying its organization:

udata-dl --dataset daily-meteorological-parameters-luxembourg-findel-airport-wmo-id-06590

The tool will automatically determine the organization from the dataset metadata and use it for folder structure.

Requirements

  • Python 3.8 or higher
  • Internet connection

Dependencies

  • click - Command-line interface framework
  • requests - HTTP library for API calls
  • rich - Beautiful terminal output

Development

Setup Development Environment

git clone <repository-url>
cd udata-dl
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -e .

Project Structure

udata-dl/
├── udata_dl/
│   ├── __init__.py      # Package initialization
│   ├── cli.py           # CLI interface
│   └── downloader.py    # Download and sync logic
├── tests/
│   ├── __init__.py      # Test package
│   ├── conftest.py      # Shared fixtures
│   ├── test_downloader.py  # Downloader tests
│   ├── test_cli.py      # CLI tests
│   └── test_integration.py # Integration tests
├── pyproject.toml       # Project configuration
├── pytest.ini           # Pytest configuration
├── README.md            # This file
└── LICENSE              # MIT License

Running Tests

Install development dependencies:

pip install -e ".[dev]"

Run unit tests (fast, no network required):

pytest

Run with coverage report:

pytest --cov=udata_dl --cov-report=html

Run integration tests (requires network, slower):

pytest -m integration

Run all tests including integration:

pytest -m ""

Run specific test file:

pytest tests/test_downloader.py -v

For detailed testing information, see TESTING.md.

API Reference

The tool uses the data.public.lu API v1:

License

MIT License - see LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Support

For issues and questions:

  • Create an issue on the GitHub repository
  • Check existing issues for solutions

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

udata_dl-0.4.4.tar.gz (21.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

udata_dl-0.4.4-py3-none-any.whl (13.4 kB view details)

Uploaded Python 3

File details

Details for the file udata_dl-0.4.4.tar.gz.

File metadata

  • Download URL: udata_dl-0.4.4.tar.gz
  • Upload date:
  • Size: 21.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for udata_dl-0.4.4.tar.gz
Algorithm Hash digest
SHA256 45978aafce1a19172a3157f108a060a1f5bca00dddf7b08e5a7389c875d72bd7
MD5 8931648b5a1558bb03d4638ad776a955
BLAKE2b-256 97fbf16a7d907583875b0ddd8e1bf6214b10aa05254b91699ca6342905c16587

See more details on using hashes here.

File details

Details for the file udata_dl-0.4.4-py3-none-any.whl.

File metadata

  • Download URL: udata_dl-0.4.4-py3-none-any.whl
  • Upload date:
  • Size: 13.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for udata_dl-0.4.4-py3-none-any.whl
Algorithm Hash digest
SHA256 54befcdbb3a7e97b7bc28cf07fa54666b982a25ea3c4a653dd127db770200ff1
MD5 75d6d79db834f3b6dac5a6c1487c2c8b
BLAKE2b-256 89fda6f4445a80b1a62079b8576e95853a7f1843f7a2a3a8ba465034e93d5ead

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page