Skip to main content

Abstraction layer for Living Standards Measurement Survey data

Project description

#+TITLE: LSMS_Library
#+AUTHOR: Ethan Ligon
#+OPTIONS: toc:nil

[[https://doi.org/10.5281/zenodo.17258079][https://zenodo.org/badge/796958546.svg]]

A Python library providing a uniform interface to Living Standards Measurement Study (LSMS) household surveys from multiple countries and years, without the data loss typical of traditional harmonization approaches.

* The Problem

LSMS datasets are invaluable for studying poverty, consumption, and household welfare across developing countries. However, each country's survey uses different:
- Variable names and encodings
- Food classification systems
- Questionnaire structures
- File formats and organization

Researchers typically spend weeks learning each new dataset's idiosyncrasies or use pre-harmonized datasets that sacrifice detail and comparability. Cross-country or longitudinal analyses become prohibitively time-consuming.

* The Solution

LSMS_Library provides an *abstraction layer* that gives you a consistent interface to work with any supported LSMS dataset. Instead of harmonizing the data itself (which loses information), we harmonize the /way you access/ the data.

* Installation

From PyPI (when available):

#+begin_src bash
pip install LSMS_Library
#+end_src

From github (current release):

#+begin_src bash
pip install git+https://github.com/ligon/LSMS_Library.git@v0.7.0
#+end_src

From a source checkout (for contributors):

#+begin_src bash
git clone https://github.com/ligon/LSMS_Library.git
cd LSMS_Library
poetry install
#+end_src

* Quick Start

#+begin_src python
import lsms_library as ll

# Single-country access
uga = ll.Country('Uganda')
uga.waves # ['2005-06', '2009-10', ..., '2019-20']
uga.data_scheme # ['food_acquired', 'household_roster', ...]
food = uga.food_expenditures() # Standardized DataFrame, all waves

# Cross-country analysis
roster = ll.Feature('household_roster')
roster.countries # ['Burkina_Faso', 'Ethiopia', 'Mali', 'Uganda', ...]
df = roster() # Harmonized DataFrame across all countries
#+end_src

* Data Access

This library abstracts over Living Standards Measurement Study (LSMS) survey data. The underlying microdata belongs to the respective national statistics offices and the World Bank; users must accept the [[https://microdata.worldbank.org/][World Bank Microdata Library]]'s terms of use before accessing it.

** Authentication: the WB Microdata API key

1. Register at https://microdata.worldbank.org/ (free).
2. Accept the terms of use for the LSMS collections you want to access.
3. Get your API key from your account dashboard.
4. Create =~/.config/lsms_library/config.yml= with:

#+begin_src yaml
microdata_api_key: your_key_here
# data_dir: /path/to/override # same as LSMS_DATA_DIR env var
#+end_src

or set =MICRODATA_API_KEY= as an environment variable.
5. On =import lsms_library=, the library validates your key against the WB catalog and automatically unlocks access to the S3 read cache. No further setup is required.

** The S3 cache

Once your WB API key is validated, the library unlocks a read-only S3 cache that mirrors the WB Microdata downloads. This is a convenience -- the S3 cache is faster than the WB NADA API and reduces load on the WB service -- but it is not a separate access layer. The WB terms of use are the authoritative gate; the S3 cache just provides the same data faster.

Decrypted plaintext credentials are written to =~/.config/lsms_library/s3_creds= (or the path in the =LSMS_S3_CREDS= environment variable), not into the package tree -- so the library is safe to install from a wheel into a read-only site-packages directory.

** Non-interactive environments

In CI, Docker builds, or other non-interactive contexts, set =LSMS_SKIP_AUTH=1= to suppress the import-time authentication flow. In that mode you are responsible for ensuring =~/.config/lsms_library/s3_creds= exists (e.g. via a CI secret mount) before the first data access.

** Data cache location

Parquet caches materialize under the platform-appropriate user data directory:

- Linux: =~/.local/share/lsms_library/= by default
- Override with =LSMS_DATA_DIR= env var or =data_dir= in =config.yml=

See the [[https://ligon.github.io/LSMS_Library/guide/caching/][caching guide]] for details on =assume_cache_fresh=, the =Country= and =Feature= classes, and per-country build methods.

* Documentation

Full documentation is available at [[https://ligon.github.io/LSMS_Library][ligon.github.io/LSMS_Library]], including:

- **[[https://ligon.github.io/LSMS_Library/getting-started/][Getting Started]]** -- installation and first steps
- **[[https://ligon.github.io/LSMS_Library/guide/country/][Country Guide]]** -- single-country workflows, harmonization pipeline, derived tables
- **[[https://ligon.github.io/LSMS_Library/guide/feature/][Feature Guide]]** -- cross-country analysis with =ll.Feature=
- **[[https://ligon.github.io/LSMS_Library/guide/caching/][Caching]]** -- performance tuning, build backends, cache management
- **[[https://ligon.github.io/LSMS_Library/guide/panel-data/][Panel Data]]** -- longitudinal analysis and ID harmonization
- **[[https://ligon.github.io/LSMS_Library/api/country/][API Reference]]** -- complete class documentation (auto-generated from source)

* Contributing

See [[file:CONTRIBUTING.org][CONTRIBUTING.org]] for detailed guidelines on adding new datasets using DVC.

* Citation

If you use LSMS_Library in your research, please cite:

#+begin_src bibtex
@software{ligon25:lsms_library,
author = {Ethan Ligon},
title = {{\tt LSMS_Library}: Abstraction layer for working with Living Standards Measurement Surveys},
year = 2025,
doi = {10.5281/zenodo.17258079},
url = {https://pypi.org/project/lsms_library/}
}
#+end_src

* License

See the [[file:LICENSE][LICENSE]] file in the repository for details.

* Acknowledgments

This project builds on data collection efforts by:
- The World Bank's Living Standards Measurement Study (LSMS) team
- National statistical offices in participating countries
- The LSMS-ISA initiative

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lsms_library-0.8.0.tar.gz (3.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lsms_library-0.8.0-py3-none-any.whl (6.7 MB view details)

Uploaded Python 3

File details

Details for the file lsms_library-0.8.0.tar.gz.

File metadata

  • Download URL: lsms_library-0.8.0.tar.gz
  • Upload date:
  • Size: 3.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for lsms_library-0.8.0.tar.gz
Algorithm Hash digest
SHA256 af109501fa0d824df1634d34d7263cfe2c73e65f0af47a574744382c69426c2f
MD5 ae82f47f140a3ad05714a18e52eca91e
BLAKE2b-256 d3a51db2bee006dd8e1b4a1d259619625e03a20c320755b791686d9507c11d85

See more details on using hashes here.

Provenance

The following attestation bundles were made for lsms_library-0.8.0.tar.gz:

Publisher: publish.yml on ligon/LSMS_Library

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file lsms_library-0.8.0-py3-none-any.whl.

File metadata

  • Download URL: lsms_library-0.8.0-py3-none-any.whl
  • Upload date:
  • Size: 6.7 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for lsms_library-0.8.0-py3-none-any.whl
Algorithm Hash digest
SHA256 39a54886989bbb88345a7f2d8b83c192b1e88f5f609202ae7eac4522a46d9b91
MD5 adf72439c87c130a32057f62af7ca42b
BLAKE2b-256 88efe4955a1a40b5a929e4ccb71984d132cb664ab6d1ecc67b6db7e273cd33d1

See more details on using hashes here.

Provenance

The following attestation bundles were made for lsms_library-0.8.0-py3-none-any.whl:

Publisher: publish.yml on ligon/LSMS_Library

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page