LSMS_Library

Abstraction layer for Living Standards Measurement Survey data

These details have not been verified by PyPI

Project links

Project description

#+TITLE: LSMS_Library
#+AUTHOR: Ethan Ligon
#+OPTIONS: toc:nil

[[https://doi.org/10.5281/zenodo.17258079][https://zenodo.org/badge/796958546.svg]]

A Python library providing a uniform interface to Living Standards Measurement Study (LSMS) household surveys from multiple countries and years, without the data loss typical of traditional harmonization approaches.

* The Problem

LSMS datasets are invaluable for studying poverty, consumption, and household welfare across developing countries. However, each country's survey uses different:
- Variable names and encodings
- Food classification systems
- Questionnaire structures
- File formats and organization

Researchers typically spend weeks learning each new dataset's idiosyncrasies or use pre-harmonized datasets that sacrifice detail and comparability. Cross-country or longitudinal analyses become prohibitively time-consuming.

* The Solution

LSMS_Library provides an *abstraction layer* that gives you a consistent interface to work with any supported LSMS dataset. Instead of harmonizing the data itself (which loses information), we harmonize the /way you access/ the data.

* Installation

From PyPI (when available):

#+begin_src bash
pip install LSMS_Library
#+end_src

From github (current release):

#+begin_src bash
pip install git+https://github.com/ligon/LSMS_Library.git@v0.7.0
#+end_src

From a source checkout (for contributors):

#+begin_src bash
git clone https://github.com/ligon/LSMS_Library.git
cd LSMS_Library
poetry install
#+end_src

* Quick Start

#+begin_src python
import lsms_library as ll

# Single-country access
uga = ll.Country('Uganda')
uga.waves          # ['2005-06', '2009-10', ..., '2019-20']
uga.data_scheme    # ['food_acquired', 'household_roster', ...]
food = uga.food_expenditures()   # Standardized DataFrame, all waves

# Cross-country analysis
roster = ll.Feature('household_roster')
roster.countries   # ['Burkina_Faso', 'Ethiopia', 'Mali', 'Uganda', ...]
df = roster()      # Harmonized DataFrame across all countries
#+end_src

* Data Access

This library abstracts over Living Standards Measurement Study (LSMS) survey data.  The underlying microdata belongs to the respective national statistics offices and the World Bank; users must accept the [[https://microdata.worldbank.org/][World Bank Microdata Library]]'s terms of use before accessing it.

** Authentication: the WB Microdata API key

1. Register at https://microdata.worldbank.org/ (free).
2. Accept the terms of use for the LSMS collections you want to access.
3. Get your API key from your account dashboard.
4. Create =~/.config/lsms_library/config.yml= with:

   #+begin_src yaml
   microdata_api_key: your_key_here
   # data_dir: /path/to/override   # same as LSMS_DATA_DIR env var
   #+end_src

   or set =MICRODATA_API_KEY= as an environment variable.
5. On =import lsms_library=, the library validates your key against the WB catalog and automatically unlocks access to the S3 read cache.  No further setup is required.

** The S3 cache

Once your WB API key is validated, the library unlocks a read-only S3 cache that mirrors the WB Microdata downloads.  This is a convenience -- the S3 cache is faster than the WB NADA API and reduces load on the WB service -- but it is not a separate access layer.  The WB terms of use are the authoritative gate; the S3 cache just provides the same data faster.

Decrypted plaintext credentials are written to =~/.config/lsms_library/s3_creds= (or the path in the =LSMS_S3_CREDS= environment variable), not into the package tree -- so the library is safe to install from a wheel into a read-only site-packages directory.

** Non-interactive environments

In CI, Docker builds, or other non-interactive contexts, set =LSMS_SKIP_AUTH=1= to suppress the import-time authentication flow.  In that mode you are responsible for ensuring =~/.config/lsms_library/s3_creds= exists (e.g. via a CI secret mount) before the first data access.

** Data cache location

Parquet caches materialize under the platform-appropriate user data directory:

- Linux: =~/.local/share/lsms_library/= by default
- Override with =LSMS_DATA_DIR= env var or =data_dir= in =config.yml=

See the [[https://ligon.github.io/LSMS_Library/guide/caching/][caching guide]] for details on =assume_cache_fresh=, the =Country= and =Feature= classes, and per-country build methods.

* Documentation

Full documentation is available at [[https://ligon.github.io/LSMS_Library][ligon.github.io/LSMS_Library]], including:

- **[[https://ligon.github.io/LSMS_Library/getting-started/][Getting Started]]** -- installation and first steps
- **[[https://ligon.github.io/LSMS_Library/guide/country/][Country Guide]]** -- single-country workflows, harmonization pipeline, derived tables
- **[[https://ligon.github.io/LSMS_Library/guide/feature/][Feature Guide]]** -- cross-country analysis with =ll.Feature=
- **[[https://ligon.github.io/LSMS_Library/guide/caching/][Caching]]** -- performance tuning, build backends, cache management
- **[[https://ligon.github.io/LSMS_Library/guide/panel-data/][Panel Data]]** -- longitudinal analysis and ID harmonization
- **[[https://ligon.github.io/LSMS_Library/api/country/][API Reference]]** -- complete class documentation (auto-generated from source)

* Contributing

See [[file:CONTRIBUTING.org][CONTRIBUTING.org]] for detailed guidelines on adding new datasets using DVC.

* Citation

If you use LSMS_Library in your research, please cite:

#+begin_src bibtex
@software{ligon25:lsms_library,
  author =    {Ethan Ligon},
  title =     {{\tt LSMS_Library}: Abstraction layer for working with Living Standards Measurement Surveys},
  year =      2025,
  doi = {10.5281/zenodo.17258079},
  url = {https://pypi.org/project/lsms_library/}
}
#+end_src

* License

See the [[file:LICENSE][LICENSE]] file in the repository for details.

* Acknowledgments

This project builds on data collection efforts by:
- The World Bank's Living Standards Measurement Study (LSMS) team
- National statistical offices in participating countries
- The LSMS-ISA initiative

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.7.2

May 9, 2026

0.7.1

Apr 29, 2026

0.7.0

Apr 13, 2026

0.2.10.dev0 pre-release

Oct 7, 2025

0.2.9.dev0 pre-release

Sep 13, 2025

0.2.7

Aug 9, 2025

0.2.6

Aug 9, 2025

0.2.5

Aug 8, 2025

0.2.4

Aug 8, 2025

0.2.3

Aug 8, 2025

0.2.2

Aug 8, 2025

0.2.1

Aug 8, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lsms_library-0.7.2.tar.gz (2.4 MB view details)

Uploaded May 9, 2026 Source

File details

Details for the file lsms_library-0.7.2.tar.gz.

File metadata

Download URL: lsms_library-0.7.2.tar.gz
Upload date: May 9, 2026
Size: 2.4 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.2.1 CPython/3.12.12 Linux/6.6.99-09128-g14e87a8a9b71

File hashes

Hashes for lsms_library-0.7.2.tar.gz
Algorithm	Hash digest
SHA256	`71393db4c451c8702cff08f290815a37279498768b6c8b51c38404a4ecd3fa6a`
MD5	`59f66e3b7d735be37030443c345d632a`
BLAKE2b-256	`7161e1a05104d804b0ecf230f6eb20f4b01a7bbca6a2f3c57c53c2c699a65e4f`

See more details on using hashes here.

LSMS_Library 0.7.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes