Modern Python library for downloading and analyzing Chilean CASEN survey data
Project description
usecasen
Python Library for CASEN Survey Analysis (Chile)
Professional Python library for downloading and analyzing data from Chile's CASEN Survey (Encuesta de Caracterización Socioeconómica Nacional). Features 100% in-memory operations, intelligent file scoring, and Stata 17+ integration.
Author: Maykol Medrano | Email: mmedrano2@uc.cl | GitHub: @MaykolMedrano
Features
- 100% In-Memory: Download and decompress without disk I/O
- Intelligent Scoring: Smart file detection ported from MATA
- Stata Integration: Vectorized injection (100x faster than loops)
- Variable Search: Efficient search without loading full datasets
- Codebook Access: Automatic value label extraction
- Cache System: Instant searches after first download
- Cross-Platform: Windows, macOS, Linux compatible
Installation
PyPI (Recommended)
pip install usecasen
From Source
git clone https://github.com/MaykolMedrano/usecasen.git
cd usecasen/python
pip install -e .
Quick Start
import casen
# Download single year
df = casen.download(2022)
# Returns: DataFrame with 202,231 rows × 918 columns
# Download multiple years
results = casen.download_batch([2017, 2022])
# Returns: {2017: DataFrame, 2022: DataFrame}
# Search variables
results = casen.search("educacion")
# Get value labels
labels = casen.get_labels("region", year=2022)
# Returns: {1: 'Tarapacá', 2: 'Antofagasta', ...}
API Reference
Functions
| Function | Description |
|---|---|
download(year) |
Download CASEN data for a single year |
download_batch(years) |
Download multiple years at once |
search(pattern) |
Search variables by name or label |
get_labels(variable, year) |
Get value labels (codebook) |
Options
| Parameter | Values | Default | Description |
|---|---|---|---|
year |
1990-2024 | required | Survey year |
to_stata |
True | False |
False |
Inject into Stata memory |
verbose |
True | False |
True |
Display progress |
regex |
True | False |
False |
Use regex in search |
Stata 17+ Integration
python:
import casen
df = casen.download(2022, to_stata=True)
end
describe
summarize
* Search variables from Stata
python:
import casen
results = casen.search("ingreso", verbose=True)
end
Package Structure
python/
├── casen/
│ ├── __init__.py # Public API
│ ├── downloader.py # Download and scraping logic
│ ├── metadata.py # Search and labels
│ ├── stata_io.py # Stata integration
│ └── utils.py # Shared utilities
├── setup.py
├── requirements.txt
├── LICENSE
└── README.md
Scoring System (MATA Port)
The library uses an intelligent scoring system to detect the correct files:
Rewards: "casen" (+30), "stata" (+100), ".dta" (+80), year (+50)
Penalties: "spss" (-100), "sas" (-80), "csv" (-50), "manual" (-60)
Compatibility
| Requirement | Version |
|---|---|
| Python | 3.8+ |
| Stata | 17+ (optional) |
| Pandas | 1.3+ |
| Requests | 2.25+ |
| Pyreadstat | 1.2.7+ (fallback for legacy .dta versions) |
| RAR extractor | WinRAR / 7-Zip / unrar / unar / bsdtar (for older CASEN years) |
Citation
@software{usecasen2025,
author = {Maykol Medrano},
title = {usecasen: Python Library for CASEN Survey Analysis},
version = {1.0.0},
year = {2025},
url = {https://github.com/MaykolMedrano/usecasen}
}
Contributing
- Fork the repository
- Create feature branch:
git checkout -b feature/NewFeature - Commit changes:
git commit -m 'Add NewFeature' - Push:
git push origin feature/NewFeature - Open Pull Request
Data Source
Data provided by Chile's Ministry of Social Development: https://observatorio.ministeriodesarrollosocial.gob.cl/
License
MIT License — See LICENSE for details.
Version: 1.0.0 | Python: 3.8+ | Stata: 17+ (optional)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file usecasen-1.0.0.tar.gz.
File metadata
- Download URL: usecasen-1.0.0.tar.gz
- Upload date:
- Size: 20.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9fc727a841a9de8299c4a2437c1260d574ae578451c37fac8618f6dd51bf2303
|
|
| MD5 |
6300e27f637032e214ff26334f5a8ad7
|
|
| BLAKE2b-256 |
97350efa8db117e11813e51d4cfba94e3b2c9f2d139f1561b8deee20001a3b12
|
File details
Details for the file usecasen-1.0.0-py3-none-any.whl.
File metadata
- Download URL: usecasen-1.0.0-py3-none-any.whl
- Upload date:
- Size: 20.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3f23f0a41bddf04d85fcf578b13ec959b1a017a9fc239aca87db3e77095e3821
|
|
| MD5 |
864c48c2df4dfd9fae890db6c9dca679
|
|
| BLAKE2b-256 |
808ffa502b080b6faef0e97b1c453ccc77d03b4ea83ae38a5dfd70ae746135a8
|