Skip to main content

Abstraction layer for Living Standards Measurement Survey data

Project description

#+TITLE: LSMS_Library
#+AUTHOR: Ethan Ligon
#+OPTIONS: toc:nil

[[https://doi.org/10.5281/zenodo.17258079][https://zenodo.org/badge/796958546.svg]]

A Python library providing a uniform interface to Living Standards Measurement Study (LSMS) household surveys from multiple countries and years, without the data loss typical of traditional harmonization approaches.

* The Problem

LSMS datasets are invaluable for studying poverty, consumption, and household welfare across developing countries. However, each country's survey uses different:
- Variable names and encodings
- Food classification systems
- Questionnaire structures
- File formats and organization

Researchers typically spend weeks learning each new dataset's idiosyncrasies or use pre-harmonized datasets that sacrifice detail and comparability. Cross-country or longitudinal analyses become prohibitively time-consuming.

* The Solution

LSMS_Library provides an *abstraction layer* that gives you a consistent interface to work with any supported LSMS dataset. Instead of harmonizing the data itself (which loses information), we harmonize the /way you access/ the data.

* Installation

From PyPI (when available):

#+begin_src bash
pip install LSMS_Library
#+end_src

From github (current release):

#+begin_src bash
pip install git+https://github.com/ligon/LSMS_Library.git@v0.7.0
#+end_src

From a source checkout (for contributors):

#+begin_src bash
git clone https://github.com/ligon/LSMS_Library.git
cd LSMS_Library
poetry install
#+end_src

* Quick Start

#+begin_src python
import lsms_library as ll

# Single-country access
uga = ll.Country('Uganda')
uga.waves # ['2005-06', '2009-10', ..., '2019-20']
uga.data_scheme # ['food_acquired', 'household_roster', ...]
food = uga.food_expenditures() # Standardized DataFrame, all waves

# Cross-country analysis
roster = ll.Feature('household_roster')
roster.countries # ['Burkina_Faso', 'Ethiopia', 'Mali', 'Uganda', ...]
df = roster() # Harmonized DataFrame across all countries
#+end_src

* Data Access

This library abstracts over Living Standards Measurement Study (LSMS) survey data. The underlying microdata belongs to the respective national statistics offices and the World Bank; users must accept the [[https://microdata.worldbank.org/][World Bank Microdata Library]]'s terms of use before accessing it.

** Authentication: the WB Microdata API key

1. Register at https://microdata.worldbank.org/ (free).
2. Accept the terms of use for the LSMS collections you want to access.
3. Get your API key from your account dashboard.
4. Create =~/.config/lsms_library/config.yml= with:

#+begin_src yaml
microdata_api_key: your_key_here
# data_dir: /path/to/override # same as LSMS_DATA_DIR env var
#+end_src

or set =MICRODATA_API_KEY= as an environment variable.
5. On =import lsms_library=, the library validates your key against the WB catalog and automatically unlocks access to the S3 read cache. No further setup is required.

** The S3 cache

Once your WB API key is validated, the library unlocks a read-only S3 cache that mirrors the WB Microdata downloads. This is a convenience -- the S3 cache is faster than the WB NADA API and reduces load on the WB service -- but it is not a separate access layer. The WB terms of use are the authoritative gate; the S3 cache just provides the same data faster.

Decrypted plaintext credentials are written to =~/.config/lsms_library/s3_creds= (or the path in the =LSMS_S3_CREDS= environment variable), not into the package tree -- so the library is safe to install from a wheel into a read-only site-packages directory.

** Non-interactive environments

In CI, Docker builds, or other non-interactive contexts, set =LSMS_SKIP_AUTH=1= to suppress the import-time authentication flow. In that mode you are responsible for ensuring =~/.config/lsms_library/s3_creds= exists (e.g. via a CI secret mount) before the first data access.

** Data cache location

Parquet caches materialize under the platform-appropriate user data directory:

- Linux: =~/.local/share/lsms_library/= by default
- Override with =LSMS_DATA_DIR= env var or =data_dir= in =config.yml=

See the [[https://ligon.github.io/LSMS_Library/guide/caching/][caching guide]] for details on =assume_cache_fresh=, the =Country= and =Feature= classes, and per-country build methods.

* Documentation

Full documentation is available at [[https://ligon.github.io/LSMS_Library][ligon.github.io/LSMS_Library]], including:

- **[[https://ligon.github.io/LSMS_Library/getting-started/][Getting Started]]** -- installation and first steps
- **[[https://ligon.github.io/LSMS_Library/guide/country/][Country Guide]]** -- single-country workflows, harmonization pipeline, derived tables
- **[[https://ligon.github.io/LSMS_Library/guide/feature/][Feature Guide]]** -- cross-country analysis with =ll.Feature=
- **[[https://ligon.github.io/LSMS_Library/guide/caching/][Caching]]** -- performance tuning, build backends, cache management
- **[[https://ligon.github.io/LSMS_Library/guide/panel-data/][Panel Data]]** -- longitudinal analysis and ID harmonization
- **[[https://ligon.github.io/LSMS_Library/api/country/][API Reference]]** -- complete class documentation (auto-generated from source)

* Contributing

See [[file:CONTRIBUTING.org][CONTRIBUTING.org]] for detailed guidelines on adding new datasets using DVC.

* Citation

If you use LSMS_Library in your research, please cite:

#+begin_src bibtex
@software{ligon25:lsms_library,
author = {Ethan Ligon},
title = {{\tt LSMS_Library}: Abstraction layer for working with Living Standards Measurement Surveys},
year = 2025,
doi = {10.5281/zenodo.17258079},
url = {https://pypi.org/project/lsms_library/}
}
#+end_src

* License

See the [[file:LICENSE][LICENSE]] file in the repository for details.

* Acknowledgments

This project builds on data collection efforts by:
- The World Bank's Living Standards Measurement Study (LSMS) team
- National statistical offices in participating countries
- The LSMS-ISA initiative

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lsms_library-0.7.0.tar.gz (2.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lsms_library-0.7.0-py3-none-any.whl (5.3 MB view details)

Uploaded Python 3

File details

Details for the file lsms_library-0.7.0.tar.gz.

File metadata

  • Download URL: lsms_library-0.7.0.tar.gz
  • Upload date:
  • Size: 2.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for lsms_library-0.7.0.tar.gz
Algorithm Hash digest
SHA256 4b14d22beceaa90777480550c23628214d7f6aac64c55f5324d7510602b94551
MD5 7c244d0a4aa18a1473918a056299b570
BLAKE2b-256 9a6e917fcc3d2db8284868109ff8615f5ba60ddbc86cf098ab20db536b962051

See more details on using hashes here.

File details

Details for the file lsms_library-0.7.0-py3-none-any.whl.

File metadata

  • Download URL: lsms_library-0.7.0-py3-none-any.whl
  • Upload date:
  • Size: 5.3 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for lsms_library-0.7.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6c47acb0af76b6c91ed870b53139aecd9a74f3a96de0cce1a45d7cd54c303a9c
MD5 2f39e86848f6113ea4d4c06b14694468
BLAKE2b-256 eeb59056c7d635c4646f4f6d1b575e9445ddc4201d67b09267776afaa0c82336

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page