Skip to main content

Simple Python interface to any SDMX 2.1 REST API (Eurostat, ISTAT, and more)

Project description

PyPI version GitHub deepwiki License: MIT

opensdmx

Simple Python CLI and library for any SDMX 2.1 REST API. Default provider: Eurostat. Built-in support for ISTAT, OECD, ECB, World Bank, and more.

Best used with AI. opensdmx works well on its own, but it shines when driven by an AI agent: the CLI is designed to be composed, queried, and orchestrated step by step. For a guided, interactive experience — dataset discovery, schema exploration, filter selection, and data retrieval — pair it with the sdmx-explorer Agent Skill included in this repo.

Installation

As a CLI tool (recommended — available system-wide):

uv tool install opensdmx

As a library (for use in Python projects):

uv add opensdmx
# or
pip install opensdmx

CLI quick start

opensdmx search "unemployment"
opensdmx info une_rt_m
opensdmx constraints une_rt_m geo
opensdmx get une_rt_m --freq M --geo IT --sex T --out data.csv

Python quick start

import opensdmx

# Default provider: Eurostat
datasets = opensdmx.all_available()
print(datasets.head())

# Search by keyword
results = opensdmx.search_dataset("unemployment")

# One-liner retrieval (Eurostat default)
data = opensdmx.fetch("une_rt_m", freq="M", geo="IT", sex="T", age="TOTAL")

# Switch provider
opensdmx.set_provider("istat")
opensdmx.set_provider("oecd")
opensdmx.set_provider("ecb")

Providers

import opensdmx

# Built-in presets
opensdmx.set_provider("eurostat")   # default
opensdmx.set_provider("istat")
opensdmx.set_provider("oecd")
opensdmx.set_provider("ecb")
opensdmx.set_provider("worldbank")

# Custom provider (agency_id optional)
opensdmx.set_provider("https://mysdmx.org/rest")
opensdmx.set_provider("https://mysdmx.org/rest", agency_id="XYZ", rate_limit=1.0)

# Check active provider
opensdmx.get_provider()  # returns dict with base_url, agency_id, rate_limit, language

Note on output columns: Eurostat uses the compact SDMX-CSV format (dimensions + TIME_PERIOD + OBS_VALUE). Other providers (ECB, OECD, etc.) return the generic text/csv format, which includes additional series metadata columns (TITLE, UNIT, DECIMALS, etc.). This is expected behavior — filter columns with standard tools if needed.

Provider via CLI and environment variables

Use --provider (or -p) on any command, or set OPENSDMX_PROVIDER once for the whole session:

# Per-command
opensdmx search "inflation" --provider ecb
opensdmx get EXR --provider https://data-api.ecb.europa.eu/service --FREQ D

# Session-wide via env var
export OPENSDMX_PROVIDER=ecb
opensdmx search "inflation"
opensdmx get EXR --FREQ D --CURRENCY USD

# Custom URL with agency
export OPENSDMX_PROVIDER=https://mysdmx.org/rest
export OPENSDMX_AGENCY=XYZ
opensdmx get MYDATASET

Python API

Function Description
set_provider(name_or_url, ...) Set active provider ('eurostat', 'istat', or custom URL)
get_provider() Return active provider config dict
all_available() List all datasets → Polars DataFrame
search_dataset(keyword) Search by keyword in description
load_dataset(id) Create a dataset object (dict)
print_dataset(ds) Print dataset summary
dimensions_info(ds) Dimension metadata → Polars DataFrame
get_dimension_values(ds, dim) Codelist values for a dimension
get_available_values(ds) Values actually present in the data (via availableconstraint)
set_filters(ds, **kwargs) Set dimension filters
reset_filters(ds) Reset all filters to "." (all)
get_data(ds, ...) Retrieve data → Polars DataFrame
fetch(id, ..., **filters) One-liner: load dataset + set filters + get data
set_timeout(seconds) Get/set API timeout (default: 300 s)
parse_time_period(series) Convert SDMX time strings to dates

get_data and fetch parameters

Parameter Type Description
start_period str Start date: "2020", "2020-Q1", "2020-01"
end_period str End date (same formats)
last_n_observations int Return only last N observations per series
first_n_observations int Return only first N observations per series

Example: EU Unemployment Rate

import opensdmx
from plotnine import ggplot, aes, geom_line, geom_point, labs, theme_minimal, scale_x_date

# Eurostat monthly unemployment by sex and age
ds = opensdmx.load_dataset("une_rt_m")
ds = opensdmx.set_filters(ds, freq="M", geo="IT", sex="T", age="TOTAL", s_adj="SA", unit="PC_ACT")
data = opensdmx.get_data(ds, start_period="2015", last_n_observations=60)

import polars as pl
data = data.with_columns(pl.col("OBS_VALUE").cast(pl.Float64))

plot = (
    ggplot(data.to_pandas(), aes(x="TIME_PERIOD", y="OBS_VALUE"))
    + geom_line(color="#1f77b4", size=1)
    + geom_point(color="#1f77b4", size=0.8)
    + labs(title="Italy Unemployment Rate (Monthly)", x="Year", y="Rate (%)")
    + scale_x_date(date_breaks="2 years", date_labels="%Y")
    + theme_minimal()
)
plot.save("unemployment.png", dpi=150, width=10, height=5)

CLI

Commands

All commands accept --provider (-p) to select the provider.

Command Description
opensdmx search <keyword> [--n N] [-p provider] Keyword search in dataset descriptions (default: 20 results)
opensdmx search --semantic <query> [--n N] Semantic search (requires opensdmx embed)
opensdmx embed [-p provider] Build semantic embeddings cache via Ollama
opensdmx info <id> [-p provider] Show dataset metadata and dimensions
opensdmx values <id> <dim> [-p provider] Show codelist values for a dimension (case-insensitive)
opensdmx constraints <id> [dim] [-p provider] Show values actually present in the dataflow (via availableconstraint)
opensdmx get <id> [--DIM VALUE] [--start-period P] [--end-period P] [--last-n N] [--first-n N] [--out file.csv|.parquet|.json] [-p provider] Download data
opensdmx plot <id|file.csv> [--DIM VALUE] [--geom line|bar|barh|point|scatter] [--out file] [-p provider] Plot data as chart
opensdmx blacklist [-p provider] List and remove datasets from the unavailability blacklist

Examples

# Eurostat (default)
opensdmx search "unemployment"
opensdmx search "unemployment" --n 5
opensdmx info une_rt_m
opensdmx values une_rt_m FREQ          # case-insensitive: freq works too
opensdmx constraints une_rt_m
opensdmx constraints une_rt_m geo
opensdmx get une_rt_m --freq M --geo IT --out data.csv
opensdmx get une_rt_m --freq M --geo IT --out data.parquet
opensdmx plot une_rt_m --freq M --geo IT --geom line
opensdmx plot data.csv --geom scatter --x TIME_PERIOD --y OBS_VALUE

# Other providers
opensdmx search "disoccupazione" --provider istat
opensdmx get 151_929 --provider istat --FREQ A --REF_AREA IT --out data.csv
opensdmx search "GDP" --provider oecd
opensdmx search "inflation" --provider ecb

Semantic search setup

Requires Ollama with the nomic-embed-text-v2-moe model:

ollama pull nomic-embed-text-v2-moe
opensdmx embed              # build embeddings for default provider (eurostat)
opensdmx embed -p istat     # build embeddings for ISTAT
opensdmx search --semantic "unemployment"

Tip: semantic search matches meaning, not exact words. Try synonyms or related terms for better results (e.g. "jobless" instead of "unemployment").

Caching

Cache is namespaced per provider under ~/.cache/opensdmx/{AGENCY_ID}/.

File Content Default TTL
dataflows.parquet Dataset catalog 7 days
cache.db — structures + codelists Dimensions, codelist descriptions and values 30 days
cache.db — constraints Available constraint values per dataflow 7 days

Environment variables:

Variable Description
OPENSDMX_PROVIDER Provider name or custom base URL (session-wide default)
OPENSDMX_AGENCY Agency ID for custom URL providers
OPENSDMX_DATAFLOWS_CACHE_TTL Dataset catalog TTL in seconds (default: 604800 — 7 days)
OPENSDMX_METADATA_CACHE_TTL Structure/codelist TTL in seconds (default: 2592000 — 30 days)
OPENSDMX_CONSTRAINTS_CACHE_TTL Constraints TTL in seconds (default: 604800 — 7 days)

See .env.example for a ready-to-use template.

Timeout

opensdmx.set_timeout()      # get current timeout (default: 300s)
opensdmx.set_timeout(600)   # set to 10 minutes

Acknowledgements

Inspired by istatR by @jfulponi and istatapi by @Attol8.

License

MIT License — Copyright (c) 2026 Andrea Borruso

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

opensdmx-0.2.8.tar.gz (30.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

opensdmx-0.2.8-py3-none-any.whl (36.5 kB view details)

Uploaded Python 3

File details

Details for the file opensdmx-0.2.8.tar.gz.

File metadata

  • Download URL: opensdmx-0.2.8.tar.gz
  • Upload date:
  • Size: 30.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.2

File hashes

Hashes for opensdmx-0.2.8.tar.gz
Algorithm Hash digest
SHA256 9da910564cba818bf774bf473378fb58669d9c936d4ce911acbbfb5ad4688617
MD5 c78398ac99c1b7259693a7b2e48ea41a
BLAKE2b-256 c977889d7c1c15127f9a4fffc8884562133fd3ffda85fe56a717d58048c07fd9

See more details on using hashes here.

File details

Details for the file opensdmx-0.2.8-py3-none-any.whl.

File metadata

  • Download URL: opensdmx-0.2.8-py3-none-any.whl
  • Upload date:
  • Size: 36.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.2

File hashes

Hashes for opensdmx-0.2.8-py3-none-any.whl
Algorithm Hash digest
SHA256 d4eb6daabbdbc39ffb4ce774ad138007a1a28b6900b9041d6e516eb2f71bfd59
MD5 288bf855edd652e850d518c4bc668a8c
BLAKE2b-256 6f471f9f625dd80b9c5230df31b6df6e3fb7a90ab0b46354e169ee94de2218be

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page