Core data types used by OWID for managing data.
Project description
owid-catalog
A Pythonic library for working with OWID data.
The owid-catalog library is the foundation of Our World in Data's data management system. It provides:
- Data APIs: Access OWID's published data through unified client interfaces
- Data Structures: Enhanced pandas DataFrames with rich metadata support
Installation
pip install owid-catalog
Quick Examples
Accessing OWID Data
from owid.catalog import fetch, search
# Search for charts (default)
charts = search("population")
tb = charts[0].fetch()
# Fetch data from OWID Chart at ourworldindata.org/grapher/life-expectancy
tb = fetch("life-expectancy")
# Search for tables
tables = search("population", kind="table", namespace="un")
tb = tables[0].fetch()
# Search indicators (using semantic search)
search("renewable energy", kind="indicator")
Working with Data Structures
from owid.catalog import Table
from owid.catalog import processing as pr
# Tables are pandas DataFrames with metadata
tb = Table(df, metadata={"short_name": "population"})
# Metadata propagates through operations
tb_filtered = tb[tb["year"] > 2000] # Keeps metadata
tb_merged = pr.merge(tb1, tb2, on="country") # Merges metadata
Documentation
For detailed documentation, see:
- API Reference: ChartsAPI, IndicatorsAPI, TablesAPI
- Data Structures: Dataset, Table, Variable, metadata handling
- Full Documentation: Complete library documentation
Architecture
graph TB
etl -->|reads| snapshot[upstream datasets]
etl -->|generates| s3[data catalog]
catalog[owid-catalog] -->|queries| s3
This library is part of OWID's ETL project, which contains recipes for all datasets we publish.
Development
You need Python 3.10+, uv and make installed. Clone the repo, then you can simply run:
# run all unit tests and CI checks
make test
# watch for changes, then run all checks
make watch
Changelog
v1.0.1
- ResponseSet ergonomics
- Remove deprecated
ResponseSet.resultsproperty (use.itemsinstead) - Add
.to_dict()method for serializing results to plain dicts (useful for AI/LLM context windows) - Add
all_fieldsparameter to.to_frame()to temporarily override display mode without mutating instance state
- Remove deprecated
v1.0.0
- New unified Client API
owid.catalog.Clientas single entry point withChartsAPI,IndicatorsAPI,TablesAPI- Quick access via
search()andfetch()convenience functions - Rich result types:
ChartResult,IndicatorResult,TableResultwithResponseSetcontainer
- Charts API
- Fetch chart data by slug, URL, or slug with query params
- Parse chart slugs from grapher/explorer URLs via
parse_chart_slug() - Explorer best-effort fetching with graceful error handling
set_ui_advanced()/set_ui_basic()for display configuration
- Tables API
- Search catalog by table, namespace, version, dataset, and channel
- Fetch tables directly by catalog path
- Embedded catalog index with local caching
- Indicators API
- Semantic search via
search.owid.iovector embeddings - Sort by relevance (similarity + popularity blend) or similarity only
fetch()for single-column indicator orfetch_table()for the full table
- Semantic search via
- Search & discovery
- Fuzzy, exact, contains, and regex matching modes
.latest()filtering to keep only newest versions- Popularity scores (0.0-1.0) from analytics views, results sorted by popularity
refresh_indexparameter to force catalog index reload
- Data structures integration
- All
fetch()methods returnowid.catalog.Tablewith full metadata CatalogPathhelper for parsing catalog paths- Lazy loading with
load_data=Falsefor deferred data access
- All
- Library reorganization
- Restructured into
owid.catalog.core(data structures) andowid.catalog.api(remote access) catalog.find()deprecated in favor ofClient().tables.search()(backwards compat maintained)- Legacy code moved to
owid.catalog.api.legacy - New dependencies:
pydanticv2.0+
- Restructured into
- Private data support
- Private datasets served from separate R2 bucket
- API can fetch private data from private bucket
- Performance
- Vectorized operations replacing
iterrows()in TablesAPI - Embedded catalog index loading (removed ETLCatalog dependency)
- Modularized search into helper methods
- Vectorized operations replacing
- Other
- Thumbnail display in
ResponseSetfor chart results - JSON output format support
- Comprehensive exception handling:
ChartNotFoundError,LicenseError - API URLs immutable with Pydantic
Field(frozen=True)
- Thumbnail display in
See previous versions
v0.4.5
- Allow both
tableanddatasetparameters infind()(they can now be used together) - Migrate from pyright to ty type checker for improved type checking
v0.4.4
- Enhanced
find()with better search capabilities:- Case-insensitive search by default (use
case=Truefor case-sensitive) - Regex support enabled by default for
tableanddatasetparameters - New fuzzy search with
fuzzy=True- typo-tolerant matching sorted by relevance - Configurable fuzzy threshold (0-100) to control match strictness
- Case-insensitive search by default (use
- New dependency:
rapidfuzzfor fuzzy string matching
v0.4.3
- Fixed minor bugs
v0.4.0
- Highlights
- Support for Python 3.10-3.13 (was 3.11-3.13)
- Drop support for Python 3.9 (breaking change)
- Others
- Deprecate Walden.
- Dependencies: Change
rdataforpyreadr. - Support: indicator dimensions.
- Support: MDIMs.
- Switched from Poetry to UV package manager.
- New decorator
@keep_metadatato propagate metadata in pandas functions.
- Fixes:
Table.apply,groupby.apply, metadata propagation, type hinting, etc.
v0.3.11
- Add support for Python 3.12 in
pypackage.toml
v0.3.10
- Add experimental chart data API in
owid.catalog.charts
v0.3.9
- Switch from isort & black & fake8 to ruff
v0.3.8
- Pin dataclasses-json==0.5.8 to fix error with python3.9
v0.3.7
- Fix bugs.
- Improve metadata propagation.
- Improve metadata YAML file handling, to have common definitions.
- Remove
DatasetMeta.origins.
v0.3.6
- Fixed tons of bugs
processing.pymodule with pandas-like functions that propagate metadata- Support for Dynamic YAML files
- Support for R2 alongside S3
v0.3.5
- Remove
catalog.frames; useowid-repackpackage instead - Relax dependency constraints
- Add optional
channelargument toDatasetMeta - Stop supporting metadata in Parquet format, load JSON sidecar instead
- Fix errors when creating new Table columns
v0.3.4
- Bump
pyarrowdependency to enable Python 3.11 support
v0.3.3
- Add more arguments to
Table.__init__that are often used in ETL - Add
Dataset.update_metadatafunction for updating metadata from YAML file - Python 3.11 support via update of
pyarrowdependency
v0.3.2
- Fix a bug in
Catalog.__getitem__() - Replace
mypytype checker bypyright
v0.3.1
- Sort imports with
isort - Change black line length to 120
- Add
grapherchannel - Support path-based indexing into catalogs
v0.3.0
- Update
OWID_CATALOG_VERSIONto 3 - Support multiple formats per table
- Support reading and writing
parquetfiles with embedded metadata - Optional
repackargument when adding tables to dataset - Underscore
| - Get
versionfield fromDatasetMetainit - Resolve collisions of
underscore_tablefunction - Convert
versiontostrand load jsondimensions
v0.2.9
- Allow multiple channels in
catalog.findfunction
v0.2.8
- Update
OWID_CATALOG_VERSIONto 2
v0.2.7
- Split datasets into channels (
garden,meadow,open_numbers, ...) and make garden default one - Add
.find_latestmethod to Catalog
v0.2.6
- Add flag
is_publicfor public/private datasets - Enforce snake_case for table, dataset and variable short names
- Add fields
published_byandpublished_atto Source- Added a list of supported and unsupported operations on columns
- Updated
pyarrow
v0.2.5
- Fix ability to load remote CSV tables
v0.2.4
- Update the default catalog URL to use a CDN
v0.2.3
- Fix methods for finding and loading data from a
LocalCatalog
v0.2.2
- Repack frames to compact dtypes on
Table.to_feather()
v0.2.1
- Fix key typo used in version check
v0.2.0
- Copy dataset metadata into tables, to make tables more traceable
- Add API versioning, and a requirement to update if your version of this library is too old
v0.1.1
- Add support for Python 3.8
v0.1.0
- Initial release, including searching and fetching data from a remote catalog
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file owid_catalog-1.0.1.tar.gz.
File metadata
- Download URL: owid_catalog-1.0.1.tar.gz
- Upload date:
- Size: 342.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f81b8384c6159b5cba10340da6e557b9964d1af4985bc81be78a767137618bb4
|
|
| MD5 |
677841420913bae3e128e6ecaec60e85
|
|
| BLAKE2b-256 |
fd869b8f06ae9a89ad908a8ae0283c03e8ed544a32cc99894eb8255f310fae86
|
File details
Details for the file owid_catalog-1.0.1-py3-none-any.whl.
File metadata
- Download URL: owid_catalog-1.0.1-py3-none-any.whl
- Upload date:
- Size: 126.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3de9b1de3f21a98cbb833edd1cdf5e9c298c02bcf2ecdf59e0a3052ed60678dd
|
|
| MD5 |
be4916cc2fb2703fe5f8c46692f95690
|
|
| BLAKE2b-256 |
e1eb985ee6e88306b63d1a1a6b5c9ada842a7e81f92d99f159b2dbeeee3bf31a
|