Abstraction layer for Living Standards Measurement Survey data
Project description
#+TITLE: LSMS_Library
#+AUTHOR: Ethan Ligon
#+OPTIONS: toc:nil
[[https://doi.org/10.5281/zenodo.17258079][https://zenodo.org/badge/796958546.svg]]
A Python library providing a uniform interface to Living Standards Measurement Study (LSMS) household surveys from multiple countries and years, without the data loss typical of traditional harmonization approaches.
* The Problem
LSMS datasets are invaluable for studying poverty, consumption, and household welfare across developing countries. However, each country's survey uses different:
- Variable names and encodings
- Food classification systems
- Questionnaire structures
- File formats and organization
Researchers typically spend weeks learning each new dataset's idiosyncrasies or use pre-harmonized datasets that sacrifice detail and comparability. Cross-country or longitudinal analyses become prohibitively time-consuming.
* The Solution
LSMS_Library provides an *abstraction layer* that gives you a consistent interface to work with any supported LSMS dataset. Instead of harmonizing the data itself (which loses information), we harmonize the /way you access/ the data.
* Installation
From PyPI (when available):
#+begin_src bash
pip install LSMS_Library
#+end_src
From github (current release):
#+begin_src bash
pip install git+https://github.com/ligon/LSMS_Library.git@v0.7.0
#+end_src
From a source checkout (for contributors):
#+begin_src bash
git clone https://github.com/ligon/LSMS_Library.git
cd LSMS_Library
poetry install
#+end_src
* Quick Start
#+begin_src python
import lsms_library as ll
# Single-country access
uga = ll.Country('Uganda')
uga.waves # ['2005-06', '2009-10', ..., '2019-20']
uga.data_scheme # ['food_acquired', 'household_roster', ...]
food = uga.food_expenditures() # Standardized DataFrame, all waves
# Cross-country analysis
roster = ll.Feature('household_roster')
roster.countries # ['Burkina_Faso', 'Ethiopia', 'Mali', 'Uganda', ...]
df = roster() # Harmonized DataFrame across all countries
#+end_src
* Data Access
This library abstracts over Living Standards Measurement Study (LSMS) survey data. The underlying microdata belongs to the respective national statistics offices and the World Bank; users must accept the [[https://microdata.worldbank.org/][World Bank Microdata Library]]'s terms of use before accessing it.
** Authentication: the WB Microdata API key
1. Register at https://microdata.worldbank.org/ (free).
2. Accept the terms of use for the LSMS collections you want to access.
3. Get your API key from your account dashboard.
4. Create =~/.config/lsms_library/config.yml= with:
#+begin_src yaml
microdata_api_key: your_key_here
# data_dir: /path/to/override # same as LSMS_DATA_DIR env var
#+end_src
or set =MICRODATA_API_KEY= as an environment variable.
5. On =import lsms_library=, the library validates your key against the WB catalog and automatically unlocks access to the S3 read cache. No further setup is required.
** The S3 cache
Once your WB API key is validated, the library unlocks a read-only S3 cache that mirrors the WB Microdata downloads. This is a convenience -- the S3 cache is faster than the WB NADA API and reduces load on the WB service -- but it is not a separate access layer. The WB terms of use are the authoritative gate; the S3 cache just provides the same data faster.
Decrypted plaintext credentials are written to =~/.config/lsms_library/s3_creds= (or the path in the =LSMS_S3_CREDS= environment variable), not into the package tree -- so the library is safe to install from a wheel into a read-only site-packages directory.
** Non-interactive environments
In CI, Docker builds, or other non-interactive contexts, set =LSMS_SKIP_AUTH=1= to suppress the import-time authentication flow. In that mode you are responsible for ensuring =~/.config/lsms_library/s3_creds= exists (e.g. via a CI secret mount) before the first data access.
** Data cache location
Parquet caches materialize under the platform-appropriate user data directory:
- Linux: =~/.local/share/lsms_library/= by default
- Override with =LSMS_DATA_DIR= env var or =data_dir= in =config.yml=
See the [[https://ligon.github.io/LSMS_Library/guide/caching/][caching guide]] for details on =assume_cache_fresh=, the =Country= and =Feature= classes, and per-country build methods.
* Documentation
Full documentation is available at [[https://ligon.github.io/LSMS_Library][ligon.github.io/LSMS_Library]], including:
- **[[https://ligon.github.io/LSMS_Library/getting-started/][Getting Started]]** -- installation and first steps
- **[[https://ligon.github.io/LSMS_Library/guide/country/][Country Guide]]** -- single-country workflows, harmonization pipeline, derived tables
- **[[https://ligon.github.io/LSMS_Library/guide/feature/][Feature Guide]]** -- cross-country analysis with =ll.Feature=
- **[[https://ligon.github.io/LSMS_Library/guide/caching/][Caching]]** -- performance tuning, build backends, cache management
- **[[https://ligon.github.io/LSMS_Library/guide/panel-data/][Panel Data]]** -- longitudinal analysis and ID harmonization
- **[[https://ligon.github.io/LSMS_Library/api/country/][API Reference]]** -- complete class documentation (auto-generated from source)
* Contributing
See [[file:CONTRIBUTING.org][CONTRIBUTING.org]] for detailed guidelines on adding new datasets using DVC.
* Citation
If you use LSMS_Library in your research, please cite:
#+begin_src bibtex
@software{ligon25:lsms_library,
author = {Ethan Ligon},
title = {{\tt LSMS_Library}: Abstraction layer for working with Living Standards Measurement Surveys},
year = 2025,
doi = {10.5281/zenodo.17258079},
url = {https://pypi.org/project/lsms_library/}
}
#+end_src
* License
See the [[file:LICENSE][LICENSE]] file in the repository for details.
* Acknowledgments
This project builds on data collection efforts by:
- The World Bank's Living Standards Measurement Study (LSMS) team
- National statistical offices in participating countries
- The LSMS-ISA initiative
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
lsms_library-0.7.0.tar.gz
(2.3 MB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file lsms_library-0.7.0.tar.gz.
File metadata
- Download URL: lsms_library-0.7.0.tar.gz
- Upload date:
- Size: 2.3 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4b14d22beceaa90777480550c23628214d7f6aac64c55f5324d7510602b94551
|
|
| MD5 |
7c244d0a4aa18a1473918a056299b570
|
|
| BLAKE2b-256 |
9a6e917fcc3d2db8284868109ff8615f5ba60ddbc86cf098ab20db536b962051
|
File details
Details for the file lsms_library-0.7.0-py3-none-any.whl.
File metadata
- Download URL: lsms_library-0.7.0-py3-none-any.whl
- Upload date:
- Size: 5.3 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6c47acb0af76b6c91ed870b53139aecd9a74f3a96de0cce1a45d7cd54c303a9c
|
|
| MD5 |
2f39e86848f6113ea4d4c06b14694468
|
|
| BLAKE2b-256 |
eeb59056c7d635c4646f4f6d1b575e9445ddc4201d67b09267776afaa0c82336
|