World Bank Open Data helpers — Python library + CLI mirroring the Stata wbopendata surface (discovery, data, country-context, multilingual, linewrap).

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

jpazvd

These details have not been verified by PyPI

Project description

wb-api-tools

Python library + CLI for the World Bank Open Data API — the Stata wbopendata surface, packaged for modern Python.

pip install wb-api-tools

wb-api-tools wraps the World Bank's WDI / IBRD APIs in two thin, well-tested interfaces: a Python library you import (import wb_api_tools as wb) and a console script (wb-api-tools <subcommand>). It mirrors the surface of the Stata wbopendata package (v18.x lineage) so workflows port cleanly between the two ecosystems.

Quick start

After pip install wb-api-tools, populate the offline metadata cache once (~30 s; downloads three small YAML files to ~/.cache/wbopendata/):

wb-api-tools sync

Then any of the five examples below works. Full runnable notebook: examples/readme_examples.ipynb — GitHub renders it inline (DataFrame tables + figures), or open in Jupyter / Colab to re-execute.

1. Population time-series (multiple countries)

import wb_api_tools as wb

df = wb.get_data(
    ["SP.POP.TOTL"], "BRA;USA;IND",
    date="2000:2023", long=True, no_basic=True,
)
df["pop_billions"] = df["value"] / 1e9
print(df.head(3)[["country", "date", "pop_billions"]].to_string(index=False))
#  country  date  pop_billions
#   Brazil  2000      0.174018
#   Brazil  2001      0.176301
#   Brazil  2002      0.178503

Population time-series, 2000-2023

2. Cross-country bar chart (G7, latest year)

df = wb.get_data(
    ["NY.GDP.PCAP.PP.KD"],
    "CAN;DEU;FRA;GBR;ITA;JPN;USA",
    date="2022", long=True, no_basic=True,
)
df["gdp_pcap_k"] = df["value"] / 1000
print(df.sort_values("gdp_pcap_k")[["country", "gdp_pcap_k"]].to_string(index=False))
#         country  gdp_pcap_k
#           Japan   44.972344
#           Italy   52.333327
#  United Kingdom   53.139151
#          France   53.673814
#          Canada   58.321061
#         Germany   63.676088
#   United States   72.679258

G7 GDP per capita PPP, 2022

3. Bivariate scatter — poverty vs GDP per capita

Two indicators, all countries, single year (mirrors Stata wbopendata_examples.ado example 04). We fit three candidate functional forms and overlay the one with the highest R²:

import numpy as np
from scipy.optimize import curve_fit

df = wb.get_data(
    ["SI.POV.DDAY", "NY.GDP.PCAP.PP.KD"], "all",
    date="2019",
)
df = df.dropna(subset=["SI.POV.DDAY", "NY.GDP.PCAP.PP.KD"])
df = df[df["region"].notna() & (df["region"] != "NA")]
print(f"countries with both indicators in 2019: {len(df)}")
# countries with both indicators in 2019: 78

x = df["NY.GDP.PCAP.PP.KD"].to_numpy()
y = df["SI.POV.DDAY"].to_numpy()
# Logistic 4PL is the principled choice — y is bounded in [0, 100%], so a
# sigmoid that respects both asymptotes is the right family.
def logistic_4pl(x, a, b, c, d):
    return d + (a - d) / (1.0 + (x / c) ** b)
popt, _ = curve_fit(logistic_4pl, x, y,
                    p0=[100.0, 1.0, float(np.median(x)), 0.0], maxfev=20000)

# R^2 against linear (log) and quadratic (log) baselines:
#   Linear    (log GDP):    R^2 = 0.503
#   Quadratic (log GDP):    R^2 = 0.775
#   Logistic 4PL:           R^2 = 0.834   <-- best fit, plotted in black

Poverty vs GDP per capita with logistic 4PL fit, 2019

4. Discovery workflow: search → info → fetch

res = wb.search("education spending", limit=3)
print(f"matches: {res['total']:,}")
# matches: 19

wb.info("SE.XPD.TOTL.GD.ZS")
# {'code': 'SE.XPD.TOTL.GD.ZS',
#  'name': 'Government expenditure on education, total (% of GDP)',
#  'source_name': 'World Development Indicators',
#  'topic_names': ['Education'],
#  ...}

5. Enrich a user DataFrame with country context

Mirrors Stata wbopendata, match(varname) [basic geo]:

import pandas as pd

user_df = pd.DataFrame({
    "iso3": ["BRA", "USA", "IND", "DEU", "JPN"],
    "my_metric": [1.2, 3.4, 5.6, 7.8, 9.0],
})
wb.enrich_country_context(user_df, iso_col="iso3", geo=True)
# iso3  my_metric region  ...  capital         latitude   longitude
#  BRA        1.2    LCN  ...  Brasilia        -15.7801   -47.9292
#  USA        3.4    NAC  ...  Washington D.C.  38.8895   -77.032
#  ...

What's new in v0.3.0

MINOR release. New CLI capabilities + a README/docs refresh for the PyPI landing page:

CLI: --out - streams the full CSV to stdout (Unix convention; pipeable into jq, csvkit, etc. without a disk round-trip).
CLI: .json / .jsonl / .ndjson output formats via the same --out dispatcher (records orient for .json, line-delimited for the others). Web-friendly + streaming-friendly.
CLI: status lines routed to stderr so --out - produces a clean, parseable CSV stream on stdout.
README restructured for PyPI-first audience: 5 worked examples with figures, Common Indicators starter table, Troubleshooting, Citation.
examples/readme_examples.{py,ipynb} — runnable script + paired Jupyter notebook (GitHub renders inline, no clone required).
Example 3 demonstrates a 3-way functional-form comparison (linear-log / quadratic-log / logistic 4PL); logistic wins at R² = 0.834.

See CHANGELOG.md for the full per-release log.

Common indicators

A starter set of high-traffic World Bank indicator codes. The full universe is 29,511 indicators; use wb.search(...) or Data Catalog to discover more.

Category	Code	Indicator
Population	`SP.POP.TOTL`	Population, total
Population	`SP.URB.TOTL.IN.ZS`	Urban population (% of total)
Economy	`NY.GDP.MKTP.CD`	GDP (current US$)
Economy	`NY.GDP.PCAP.PP.KD`	GDP per capita, PPP (constant 2017 international $)
Economy	`NE.TRD.GNFS.ZS`	Trade (% of GDP)
Poverty	`SI.POV.DDAY`	Poverty headcount at $3.00/day (2021 PPP)
Poverty	`SI.POV.GINI`	Gini index
Education	`SE.XPD.TOTL.GD.ZS`	Government expenditure on education (% of GDP)
Education	`SE.PRM.ENRR`	Gross primary enrollment ratio
Education	`SE.SEC.CMPT.LO.ZS`	Lower secondary completion rate
Health	`SP.DYN.LE00.IN`	Life expectancy at birth
Health	`SH.DYN.MORT`	Under-5 mortality rate
Health	`SH.STA.MMRT`	Maternal mortality ratio
Environment	`EN.ATM.CO2E.PC`	CO2 emissions per capita (metric tons)
Environment	`AG.LND.FRST.ZS`	Forest area (% of land area)

Project surfaces

wb-api-tools is the Python distribution of a dual Stata + Python repo (jpazvd/wb-api-repo) on a parallel v0.x track to the upstream Stata wbopendata (Stata Journal v18.x).

Surface	Entry point	Reference
Python library	`wb_api_tools.{discovery,data,text}` (re-exported at the package root)	docs/PYTHON_USER_GUIDE.md
Python CLI	`wb-api-tools <subcmd>` (after install) or `python -m wb_api_tools <subcmd>`	`--help` on every subcommand
Stata package	`src/w/wbopendata.ado` in the GitHub repo (v17.4.0)	`help wbopendata` in Stata, or `src/w/wbopendata.sthlp`
YAML metadata cache	`~/.cache/wbopendata/_wbopendata_{indicators,sources,topics}.yaml` (XDG-aware)	populated by `wb-api-tools sync`

Python CLI

After pip install, use the wb-api-tools console script (or python -m wb_api_tools if PATH doesn't include scripts). Each subcommand has --help for full flag descriptions.

Subcommand	Purpose
`countries`	Fetch country metadata
`indicators`	Fetch indicator metadata (legacy CSV/parquet/yaml dump)
`data`	Fetch indicator data; `--no-basic` skips country-context auto-merge, `--geo` adds capital/lat/lon, `--language es` switches the API path
`sources`	List WB data sources (`--all` for the full set)
`alltopics`	List all WB topic categories
`info <id>`	Show full metadata for one indicator (from YAML cache)
`describe <id>`	Fetch fresh metadata for one indicator (live API; `--language` supported)
`search [term]`	Paginated indicator search; `--source`, `--topic`, `--field`, `--exact`
`sync`	Populate / refresh the YAML metadata cache from the live WB API

Example:

wb-api-tools data \
    --indicators SP.POP.TOTL,NY.GDP.MKTP.CD \
    --countries "BRA;USA;IND" \
    --date 2010:2020 \
    --geo --long --out _data/wb/pop_gdp_long.csv

Output is written to --out — six file formats supported by extension:

Extension	Format	Notes
`.csv`	Comma-separated	Default fallback for unknown extensions too
`.parquet`	Apache Parquet	Columnar; small + fast for analytics
`.json`	JSON records, pretty-printed	`[{...}, {...}]` indent=2
`.jsonl` / `.ndjson`	Line-delimited JSON	Streaming-friendly for `jq`, Spark, BigQuery
`.yaml` / `.yml`	YAML records	Stata-friendly

Plus two stdout modes:

--out - → full CSV streamed to stdout (pipeable into other tools)
--out omitted → 20-row preview to stdout (head only, not parseable)

Stata package

src/w/wbopendata.ado (in the GitHub repo, not on PyPI) is the v17.4.0 dispatcher; current surface mirrors the Python library:

wbopendata, sources / allsources / alltopics / info / search / describe discovery commands
wbopendata, indicator(X) clear data fetch with noBASIC, geo, language(es), cache(days), sync
linewrap(W) maxlength(N) linewrapformat(stack|newline|lines|smcl) for graph-title and SMCL formatting

Open src/w/wbopendata.sthlp in Stata's viewer or run help wbopendata once the package is on the adopath. The Python-side docs/PYTHON_USER_GUIDE.md §5 has a row-by-row Stata ↔ Python parity table.

YAML metadata cache

The offline metadata cache lives in a per-user XDG-aware directory (typically ~/.cache/wbopendata/ on POSIX or ~/AppData/Local/wbopendata/ on Windows; override with $WBOPENDATA_YAML_DIR):

_wbopendata_indicators.yaml — 29,511 indicators (~18 MB)
_wbopendata_sources.yaml — 71 sources
_wbopendata_topics.yaml — 21 topics

Discovery commands (info, search, sources, alltopics) read from this cache for microsecond lookups. After pip install, populate it once:

wb-api-tools sync                # download + write all three YAMLs (~30 s first time)
wb-api-tools sync --commit --tag # git-commit + tag (dev mode only)

A semi-monthly GitHub Action (.github/workflows/wb_metadata_nightly.yml — file name is historical; cron runs on the 1st and 15th of every month at 02:17 UTC) keeps the repo-committed cache fresh. Manually triggerable via workflow_dispatch.

Documentation

docs/PYTHON_USER_GUIDE.md — Python library + CLI reference (Stata .sthlp equivalent)
docs/PYTHON_DEMO.md — captured live-API transcript from the 7-section walkthrough
docs/EXAMPLES.md — end-to-end workflows (API, Stata, Python)
docs/AGE_BANDS.md — standard 5-year age band codes for population indicators
examples/readme_examples.ipynb — runnable Jupyter notebook for the Quick-start examples above
examples/readme_examples.py — paired Python script (regenerates the figures in docs/figures/)
CHANGELOG.md — per-release change log
doc/VERSIONING_POLICY.md — semver policy + component-level .ado version headers

Troubleshooting

YAML metadata not found in cache — run wb-api-tools sync once. The package ships without a YAML cache (would push the wheel size up needlessly); sync downloads + writes the three files to ~/.cache/wbopendata/ in ~30 s.

Cache lives somewhere unexpected — the resolution order is OS-specific (see src/wb_api_tools/cache.py):

$WBOPENDATA_YAML_DIR wins on every platform when set.
POSIX (Linux / macOS): otherwise $XDG_CACHE_HOME/wbopendata/ if set, else ~/.cache/wbopendata/.
Windows: otherwise $LOCALAPPDATA/wbopendata/ if set, else ~/AppData/Local/wbopendata/.

Set the env var to point at a shared directory if working across machines.

Corporate proxy blocks api.worldbank.org — the WB API responds to plain HTTPS over port 443 with no auth. If wb-api-tools sync hangs, check your proxy whitelist or set HTTPS_PROXY in your environment.

UnicodeEncodeError on Windows — country names contain accented characters that Windows' default cp1252 can't represent. Set PYTHONIOENCODING=utf-8 in your environment before running, or use a Unicode-aware terminal (Windows Terminal, modern PowerShell).

wb-api-tools sync takes ~30 s — is it stuck? — that's normal first-run behaviour: it fetches 29,511 indicators in batches of 10,000 from the /v2/indicator endpoint. Subsequent reads come from the local YAML cache (microseconds).

Citation

If wb-api-tools supports a published paper or working paper, please cite both the package and the underlying Stata implementation:

@misc{azevedo_wbapitools_2026,
  author       = {Azevedo, Jo{\~a}o Pedro},
  title        = {{wb-api-tools}: World Bank Open Data helpers for Python},
  year         = {2026},
  publisher    = {PyPI},
  url          = {https://pypi.org/project/wb-api-tools/}
}

@misc{azevedo_wbopendata_2011,
  author       = {Azevedo, Jo{\~a}o Pedro},
  title        = {{wbopendata}: Stata module to access World Bank databases},
  year         = {2011},
  publisher    = {Statistical Software Components, Boston College},
  number       = {S457234},
  url          = {https://ideas.repec.org/c/boc/bocode/s457234.html}
}

Source data: World Bank Open Data — https://data.worldbank.org/.

Development

git clone https://github.com/jpazvd/wb-api-repo.git
cd wb-api-repo
pip install -e ".[test]"
PYTHONIOENCODING=utf-8 python -m pytest tests/   # 71 cases across discovery, wb_text, wb_api_tools, cli

Useful Makefile targets:

make wb-update-metadata   # refresh YAML cache (v0.1.0 pipeline)
make wb-metadata          # legacy YAML builder (pre-Phase-0)
make wb-metadata-csv      # legacy CSV builder
make wb-config            # batch data pulls from config.yaml

To regenerate the Quick-start figures from live API data, install the [examples] extras group first (pulls in matplotlib + scipy + nbformat + jupyter + nbconvert — none of these are runtime deps of wb-api-tools):

pip install -e ".[examples]"
WBOPENDATA_YAML_DIR=src/_ python examples/readme_examples.py        # PNG + SVG to docs/figures/
WBOPENDATA_YAML_DIR=src/_ python examples/_build_readme_notebook.py # rebuild + execute the .ipynb

Branch model: feature work on develop; releases tag from main.

Integration

The Python CLI and library plug into:

Makefiles / pipelines (make wb-update-metadata, cron, GitHub Actions)
Stata workflows (export CSV → import delimited, or use the Stata package directly)
R workflows (readr::read_csv or arrow::read_parquet)
Jupyter notebooks for ad-hoc analysis

License

See LICENSE.md. Developed to bridge Stata wbopendata workflows with modern Python pipelines for reproducible UNICEF / World Bank style analytics.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

jpazvd

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.3.0

May 24, 2026

0.2.1

May 24, 2026

0.2.0

May 24, 2026

0.2.0rc1 pre-release

May 24, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wb_api_tools-0.3.0.tar.gz (61.1 kB view details)

Uploaded May 24, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

wb_api_tools-0.3.0-py3-none-any.whl (48.8 kB view details)

Uploaded May 24, 2026 Python 3

File details

Details for the file wb_api_tools-0.3.0.tar.gz.

File metadata

Download URL: wb_api_tools-0.3.0.tar.gz
Upload date: May 24, 2026
Size: 61.1 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for wb_api_tools-0.3.0.tar.gz
Algorithm	Hash digest
SHA256	`08039a0df8f8a4d8efce9d0d544ffac6ee01cb1a62a323cefd886a83feaac170`
MD5	`6987594b84c3f817afef2a364f521260`
BLAKE2b-256	`f1bb74385d9fd0300027bff8a8621c13eb8f98e3e4e6858af784eb64610ec205`

See more details on using hashes here.

Provenance

The following attestation bundles were made for wb_api_tools-0.3.0.tar.gz:

Publisher: publish.yml on jpazvd/wb-api-repo

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: wb_api_tools-0.3.0.tar.gz
- Subject digest: 08039a0df8f8a4d8efce9d0d544ffac6ee01cb1a62a323cefd886a83feaac170
- Sigstore transparency entry: 1623452076
- Sigstore integration time: May 24, 2026
Source repository:
- Permalink: jpazvd/wb-api-repo@ca790d3d80d53cc33c6cb70d351863c7c476ac7b
- Branch / Tag: refs/tags/v0.3.0
- Owner: https://github.com/jpazvd
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@ca790d3d80d53cc33c6cb70d351863c7c476ac7b
- Trigger Event: push

File details

Details for the file wb_api_tools-0.3.0-py3-none-any.whl.

File metadata

Download URL: wb_api_tools-0.3.0-py3-none-any.whl
Upload date: May 24, 2026
Size: 48.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for wb_api_tools-0.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7044c23a6a05c267625350d61f03d70b2d65214e3f2d1a8f31a7345f98208fc6`
MD5	`4f671c79d09ace9400797cdc157597ba`
BLAKE2b-256	`7d1b636fbe47ff820752af7081f7359445a9cad49dcf1bcd3458a31f619bc57a`

See more details on using hashes here.

Provenance

The following attestation bundles were made for wb_api_tools-0.3.0-py3-none-any.whl:

Publisher: publish.yml on jpazvd/wb-api-repo

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: wb_api_tools-0.3.0-py3-none-any.whl
- Subject digest: 7044c23a6a05c267625350d61f03d70b2d65214e3f2d1a8f31a7345f98208fc6
- Sigstore transparency entry: 1623452205
- Sigstore integration time: May 24, 2026
Source repository:
- Permalink: jpazvd/wb-api-repo@ca790d3d80d53cc33c6cb70d351863c7c476ac7b
- Branch / Tag: refs/tags/v0.3.0
- Owner: https://github.com/jpazvd
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@ca790d3d80d53cc33c6cb70d351863c7c476ac7b
- Trigger Event: push

wb-api-tools 0.3.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

wb-api-tools

Quick start

1. Population time-series (multiple countries)

2. Cross-country bar chart (G7, latest year)

3. Bivariate scatter — poverty vs GDP per capita

4. Discovery workflow: search → info → fetch

5. Enrich a user DataFrame with country context

What's new in v0.3.0

Common indicators

Project surfaces

Python CLI

Stata package

YAML metadata cache

Documentation

Troubleshooting

Citation

Development

Integration

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance