Skip to main content

Modern Python library for downloading and analyzing Chilean CASEN survey data

Project description

usecasen

Python Library for CASEN Survey Analysis (Chile)

License: MIT Python Version

Professional Python library for downloading and analyzing data from Chile's CASEN Survey (Encuesta de Caracterización Socioeconómica Nacional). Features 100% in-memory operations, intelligent file scoring, and Stata 17+ integration.

Author: Maykol Medrano | Email: mmedrano2@uc.cl | GitHub: @MaykolMedrano


Features

  • 100% In-Memory: Download and decompress without disk I/O
  • Intelligent Scoring: Smart file detection ported from MATA
  • Stata Integration: Vectorized injection (100x faster than loops)
  • Variable Search: Efficient search without loading full datasets
  • Codebook Access: Automatic value label extraction
  • Cache System: Instant searches after first download
  • Cross-Platform: Windows, macOS, Linux compatible

Installation

PyPI (Recommended)

pip install usecasen

From Source

git clone https://github.com/MaykolMedrano/usecasen.git
cd usecasen/python
pip install -e .

Quick Start

import casen

# Download single year
df = casen.download(2022)
# Returns: DataFrame with 202,231 rows × 918 columns

# Download multiple years
results = casen.download_batch([2017, 2022])
# Returns: {2017: DataFrame, 2022: DataFrame}

# Search variables
results = casen.search("educacion")

# Get value labels
labels = casen.get_labels("region", year=2022)
# Returns: {1: 'Tarapacá', 2: 'Antofagasta', ...}

API Reference

Functions

Function Description
download(year) Download CASEN data for a single year
download_batch(years) Download multiple years at once
search(pattern) Search variables by name or label
get_labels(variable, year) Get value labels (codebook)

Options

Parameter Values Default Description
year 1990-2024 required Survey year
to_stata True | False False Inject into Stata memory
verbose True | False True Display progress
regex True | False False Use regex in search

Stata 17+ Integration

python:
import casen
df = casen.download(2022, to_stata=True)
end

describe
summarize

* Search variables from Stata
python:
import casen
results = casen.search("ingreso", verbose=True)
end

Package Structure

python/
├── casen/
│   ├── __init__.py      # Public API
│   ├── downloader.py    # Download and scraping logic
│   ├── metadata.py      # Search and labels
│   ├── stata_io.py      # Stata integration
│   └── utils.py         # Shared utilities
├── setup.py
├── requirements.txt
├── LICENSE
└── README.md

Scoring System (MATA Port)

The library uses an intelligent scoring system to detect the correct files:

Rewards: "casen" (+30), "stata" (+100), ".dta" (+80), year (+50)

Penalties: "spss" (-100), "sas" (-80), "csv" (-50), "manual" (-60)


Compatibility

Requirement Version
Python 3.8+
Stata 17+ (optional)
Pandas 1.3+
Requests 2.25+
Pyreadstat 1.2.7+ (fallback for legacy .dta versions)
RAR extractor WinRAR / 7-Zip / unrar / unar / bsdtar (for older CASEN years)

Citation

@software{usecasen2025,
  author = {Maykol Medrano},
  title = {usecasen: Python Library for CASEN Survey Analysis},
  version = {1.0.0},
  year = {2025},
  url = {https://github.com/MaykolMedrano/usecasen}
}

Contributing

  1. Fork the repository
  2. Create feature branch: git checkout -b feature/NewFeature
  3. Commit changes: git commit -m 'Add NewFeature'
  4. Push: git push origin feature/NewFeature
  5. Open Pull Request

Data Source

Data provided by Chile's Ministry of Social Development: https://observatorio.ministeriodesarrollosocial.gob.cl/


License

MIT License — See LICENSE for details.


Version: 1.0.0 | Python: 3.8+ | Stata: 17+ (optional)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

usecasen-1.0.0.tar.gz (20.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

usecasen-1.0.0-py3-none-any.whl (20.1 kB view details)

Uploaded Python 3

File details

Details for the file usecasen-1.0.0.tar.gz.

File metadata

  • Download URL: usecasen-1.0.0.tar.gz
  • Upload date:
  • Size: 20.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for usecasen-1.0.0.tar.gz
Algorithm Hash digest
SHA256 9fc727a841a9de8299c4a2437c1260d574ae578451c37fac8618f6dd51bf2303
MD5 6300e27f637032e214ff26334f5a8ad7
BLAKE2b-256 97350efa8db117e11813e51d4cfba94e3b2c9f2d139f1561b8deee20001a3b12

See more details on using hashes here.

File details

Details for the file usecasen-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: usecasen-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 20.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for usecasen-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3f23f0a41bddf04d85fcf578b13ec959b1a017a9fc239aca87db3e77095e3821
MD5 864c48c2df4dfd9fae890db6c9dca679
BLAKE2b-256 808ffa502b080b6faef0e97b1c453ccc77d03b4ea83ae38a5dfd70ae746135a8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page