LSMS-Library

Abstraction layer for Living Standards Measurement Survey data

These details have not been verified by PyPI

Project description

* Streaming dvc files
A =dvc pull= will download dvc files to your local repository.
But this may not be the best way to proceed! In particular, =dvc=
offers an api which permits one to "stream" or cache files, leaving
your storage local to the working repository free of big data
files.

To illustrate,
#+begin_src python
import dvc.api
import pandas as pd

with dvc.api.open('BigRemoteFile.dta',mode='rb') as dta:
df = pd.read_stata(dta)
#+end_src
This will result in a =pandas.DataFrame= in RAM, but will use no
additional disk (except that, depending on what's being used as the
dvc store, the file may actually be stored in =.dvc/cache=; this
cache can be cleared with =dvc gc=).

** Pulling dvc files
If you need the actual file instead of a "stream" you can instead
"pull" the dvc files, using
#+begin_src sh
dvc pull
#+end_src
and files should be added from the remote dvc data store to your
working repository.

* Adding New Data
** Additional S3 Credentials
Write access to the remote s3 repository requires additional credentials; contact =ligon@berkeley.edu= to obtain these.

** Procedure to Add Data
To add a new LSMS-style survey to the repo, you'll follow the
following steps. Here we give the example of adding a 2015--16
survey from Uganda, obtained from
https://microdata.worldbank.org/index.php/catalog/3460. The same
steps should work for you /mutatis mutandis/:

1. Create a directory corresponding to the country or area; e.g.,
#+begin_src sh
mkdir Uganda
#+end_src
2. Create a /sub/-directory indicating the time period for the
survey; e.g.,
#+begin_src sh
mkdir Uganda/2015-16
#+end_src
3. Create a =Documentation= sub-directory for each survey; e.g.,
#+begin_src sh
mkdir Uganda/2015-16/Documentation
#+end_src
In this directory include the following files:
- SOURCE :: A text file giving both a url (if available) and
citation information for the dataset.
- LICENSE :: A text file containing a description of the license
or other terms under which you've obtained the data.
4. Add other documentation useful for understanding the data to the
=Documentation= sub-directory.

5. Add all the contents of the =Documentation= folder to the =git= repo;
e.g.,
#+begin_src sh
cd ./Uganda/2015-16/Documentation
git add .
git commit -m"Add Uganda 2015-16 documentation to repo."
git push
#+end_src

6. Create a =Data= sub-directory for each survey; e.g.,
#+begin_src sh
mkdir Uganda/2015-16/Data
#+end_src

7. Obtain a copy of the data you're interested in, perhaps as a zip
file or other archive. Store this in some temporary place, and
unzip (or whatever) the files into the relevant Country/Year/Data
directory, taking care to preserve any useful directory structure
in the archive. E.g.,
#+begin_src sh
cd Uganda/2015-16 && unzip -j /tmp/UGA_2015_UNPS_v01_M_STATA8.zip
#+end_src
8. Add the data you've unarchived to =dvc=, then add the /pointers/
(i.e., files with a .dvc extension to git). For the Uganda case we assume that
all the relevant data comes in the form of =stata= *.dta files,
since this is what we downloaded from the World Bank. For example,
#+begin_src sh
cd ../Data
dvc add *.dta
git commit -m"Add Uganda/2015-16/Data/*.dta files to dvc store."
git pull && git push
#+end_src
9. Push the data files to the dvc store. Make sure you have good
internet connection! Then a simple
#+begin_src sh
dvc push
#+end_src
will copy the data to the remote data store. NB: If this is the
first time you've done this for this repository, then you'll
first need to jump through some simple hoops to authenticate with
gdrive.
10. With the files pushed to the dvc store, you won't need them
locally anymore, so you can do something like
#+begin_src sh
cd ../Data && rm *.dta
#+end_src
or (if you have a more complex directory structure) perhaps
#+begin_src sh
find ../Data -name \*.dta -exec rm \{\} \;
#+end_src

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.7.0

Apr 13, 2026

0.2.10.dev0 pre-release

Oct 7, 2025

0.2.9.dev0 pre-release

Sep 13, 2025

0.2.7

Aug 9, 2025

0.2.6

Aug 9, 2025

0.2.5

Aug 8, 2025

This version

0.2.4

Aug 8, 2025

0.2.3

Aug 8, 2025

0.2.2

Aug 8, 2025

0.2.1

Aug 8, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lsms_library-0.2.4.tar.gz (18.7 MB view details)

Uploaded Aug 8, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

lsms_library-0.2.4-py3-none-any.whl (21.7 MB view details)

Uploaded Aug 8, 2025 Python 3

File details

Details for the file lsms_library-0.2.4.tar.gz.

File metadata

Download URL: lsms_library-0.2.4.tar.gz
Upload date: Aug 8, 2025
Size: 18.7 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.1.1 CPython/3.11.5 Linux/6.6.76-08096-g300882a0a131

File hashes

Hashes for lsms_library-0.2.4.tar.gz
Algorithm	Hash digest
SHA256	`a3da9fa79588c49bf146753be1638e9243504552cc39bf673642c95976399d15`
MD5	`acfd476dfa0234292a7d684dbaa41c10`
BLAKE2b-256	`5888b11134f90aef484c99217cc9a3008ef0c7ba77fa5e389f4cfdb19baa2e6d`

See more details on using hashes here.

File details

Details for the file lsms_library-0.2.4-py3-none-any.whl.

File metadata

Download URL: lsms_library-0.2.4-py3-none-any.whl
Upload date: Aug 8, 2025
Size: 21.7 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.1.1 CPython/3.11.5 Linux/6.6.76-08096-g300882a0a131

File hashes

Hashes for lsms_library-0.2.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`80a17335a28abe6191a3f560ba453a6e32139ae12684dec19a0ccb39906d4d8c`
MD5	`e894b7ad4dd76bbc4447630900ceb0a0`
BLAKE2b-256	`e263ff3283565fff445378ab989e9a2bca9eeb98d0a73b6029ebb264e37c105c`

See more details on using hashes here.

LSMS-Library 0.2.4

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes