Skip to main content

Abstraction layer for Living Standards Measurement Survey data

Project description

#+TITLE: LSMS_Library
#+AUTHOR: Ethan Ligon
#+OPTIONS: toc:nil

[[https://doi.org/10.5281/zenodo.17258079][https://zenodo.org/badge/796958546.svg]]

A Python library providing a uniform interface to Living Standards Measurement Study (LSMS) household surveys from multiple countries and years, without the data loss typical of traditional harmonization approaches.

* The Problem

LSMS datasets are invaluable for studying poverty, consumption, and household welfare across developing countries. However, each country's survey uses different:
- Variable names and encodings
- Food classification systems
- Questionnaire structures
- File formats and organization

Researchers typically spend weeks learning each new dataset's idiosyncrasies or use pre-harmonized datasets that sacrifice detail and comparability. Cross-country or longitudinal analyses become prohibitively time-consuming.

* The Solution

LSMS_Library provides an *abstraction layer* that gives you a consistent interface to work with any supported LSMS dataset. Instead of harmonizing the data itself (which loses information), we harmonize the /way you access/ the data.

* Installation

From PyPI (when available):

#+begin_src bash
pip install LSMS_Library
#+end_src

From github (current release):

#+begin_src bash
pip install git+https://github.com/ligon/LSMS_Library.git@v0.7.0
#+end_src

From a source checkout (for contributors):

#+begin_src bash
git clone https://github.com/ligon/LSMS_Library.git
cd LSMS_Library
poetry install
#+end_src

* Quick Start

#+begin_src python
import lsms_library as ll

# Single-country access
uga = ll.Country('Uganda')
uga.waves # ['2005-06', '2009-10', ..., '2019-20']
uga.data_scheme # ['food_acquired', 'household_roster', ...]
food = uga.food_expenditures() # Standardized DataFrame, all waves

# Cross-country analysis
roster = ll.Feature('household_roster')
roster.countries # ['Burkina_Faso', 'Ethiopia', 'Mali', 'Uganda', ...]
df = roster() # Harmonized DataFrame across all countries
#+end_src

* Data Access

This library abstracts over Living Standards Measurement Study (LSMS) survey data. The underlying microdata belongs to the respective national statistics offices and the World Bank; users must accept the [[https://microdata.worldbank.org/][World Bank Microdata Library]]'s terms of use before accessing it.

** Authentication: the WB Microdata API key

1. Register at https://microdata.worldbank.org/ (free).
2. Accept the terms of use for the LSMS collections you want to access.
3. Get your API key from your account dashboard.
4. Create =~/.config/lsms_library/config.yml= with:

#+begin_src yaml
microdata_api_key: your_key_here
# data_dir: /path/to/override # same as LSMS_DATA_DIR env var
#+end_src

or set =MICRODATA_API_KEY= as an environment variable.
5. On =import lsms_library=, the library validates your key against the WB catalog and automatically unlocks access to the S3 read cache. No further setup is required.

** The S3 cache

Once your WB API key is validated, the library unlocks a read-only S3 cache that mirrors the WB Microdata downloads. This is a convenience -- the S3 cache is faster than the WB NADA API and reduces load on the WB service -- but it is not a separate access layer. The WB terms of use are the authoritative gate; the S3 cache just provides the same data faster.

Decrypted plaintext credentials are written to =~/.config/lsms_library/s3_creds= (or the path in the =LSMS_S3_CREDS= environment variable), not into the package tree -- so the library is safe to install from a wheel into a read-only site-packages directory.

** Non-interactive environments

In CI, Docker builds, or other non-interactive contexts, set =LSMS_SKIP_AUTH=1= to suppress the import-time authentication flow. In that mode you are responsible for ensuring =~/.config/lsms_library/s3_creds= exists (e.g. via a CI secret mount) before the first data access.

** Data cache location

Parquet caches materialize under the platform-appropriate user data directory:

- Linux: =~/.local/share/lsms_library/= by default
- Override with =LSMS_DATA_DIR= env var or =data_dir= in =config.yml=

See the [[https://ligon.github.io/LSMS_Library/guide/caching/][caching guide]] for details on =assume_cache_fresh=, the =Country= and =Feature= classes, and per-country build methods.

* Documentation

Full documentation is available at [[https://ligon.github.io/LSMS_Library][ligon.github.io/LSMS_Library]], including:

- **[[https://ligon.github.io/LSMS_Library/getting-started/][Getting Started]]** -- installation and first steps
- **[[https://ligon.github.io/LSMS_Library/guide/country/][Country Guide]]** -- single-country workflows, harmonization pipeline, derived tables
- **[[https://ligon.github.io/LSMS_Library/guide/feature/][Feature Guide]]** -- cross-country analysis with =ll.Feature=
- **[[https://ligon.github.io/LSMS_Library/guide/caching/][Caching]]** -- performance tuning, build backends, cache management
- **[[https://ligon.github.io/LSMS_Library/guide/panel-data/][Panel Data]]** -- longitudinal analysis and ID harmonization
- **[[https://ligon.github.io/LSMS_Library/api/country/][API Reference]]** -- complete class documentation (auto-generated from source)

* Contributing

See [[file:CONTRIBUTING.org][CONTRIBUTING.org]] for detailed guidelines on adding new datasets using DVC.

* Citation

If you use LSMS_Library in your research, please cite:

#+begin_src bibtex
@software{ligon25:lsms_library,
author = {Ethan Ligon},
title = {{\tt LSMS_Library}: Abstraction layer for working with Living Standards Measurement Surveys},
year = 2025,
doi = {10.5281/zenodo.17258079},
url = {https://pypi.org/project/lsms_library/}
}
#+end_src

* License

See the [[file:LICENSE][LICENSE]] file in the repository for details.

* Acknowledgments

This project builds on data collection efforts by:
- The World Bank's Living Standards Measurement Study (LSMS) team
- National statistical offices in participating countries
- The LSMS-ISA initiative

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lsms_library-0.7.1.tar.gz (2.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lsms_library-0.7.1-py3-none-any.whl (5.4 MB view details)

Uploaded Python 3

File details

Details for the file lsms_library-0.7.1.tar.gz.

File metadata

  • Download URL: lsms_library-0.7.1.tar.gz
  • Upload date:
  • Size: 2.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for lsms_library-0.7.1.tar.gz
Algorithm Hash digest
SHA256 be40bab886a8079d5823861bec61cac1b436f51b3c8f4e96026d976787c05353
MD5 b58b202e099c86bec65594014c2645fa
BLAKE2b-256 d971bfff58381d2f8136f9ac6f3abd4480db8ba68fed4cf817f8b9dfa8464733

See more details on using hashes here.

File details

Details for the file lsms_library-0.7.1-py3-none-any.whl.

File metadata

  • Download URL: lsms_library-0.7.1-py3-none-any.whl
  • Upload date:
  • Size: 5.4 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for lsms_library-0.7.1-py3-none-any.whl
Algorithm Hash digest
SHA256 1fbdab7cd5571b7705ba6e11b57e9a9864e76b029d1812a02341a012f450a9b2
MD5 c798fc6798f5830e91741bec1999a344
BLAKE2b-256 2af6477d098af995b0f27cfd7da637e471a89d8137891aadad3f6c1860e87b22

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page