Research-grade CLI for Brazilian public microdata, dashboards, and LLM-safe analytics
Project description
Brasil CLI
A research-grade CLI for Brazilian public microdata.
📊 Interactive data essay → · 🇧🇷 Artigo PT · 🇬🇧 Article EN
Why this exists
Brazil takes excellent photographs of itself. The IBGE's PNADC — a continuous household survey that reaches roughly a quarter-million Brazilians each year — is as meticulous a census as any nation conducts. And yet, between the raw fixed-width files the government publishes and anything a citizen, journalist, or policy analyst could read with their own eyes, there is a vast field of friction: SAS layouts, archaic encodings, inflation deflators, nominal minimum-wage splines, replicate weights nobody teaches. In that friction, the country hides from itself.
This repository is an attempt to close that gap. It compresses the painful
path from official microdata into a single, auditable command-line tool
(brasil) whose output — CSVs, SQLite tables, JSON payloads, a rich terminal
dashboard, and the interactive data essay in this folder —
any Brazilian (or anyone interested in the country) can read, reproduce, and
challenge. Numbers are not neutral, but auditability is. If a claim about
Brazilian inequality cannot be traced back to a bootstrap weight, a deflator,
and a specific UF row in the PNADC, it does not belong in public debate.
Canonical executable:
brasil· Compatibility alias:pnad
What the project covers
Official data sources
- PNADC trimestral microdata
- PNADC anual visita 5 microdata (work + benefits + pensions + capital decomposition)
- Censo 2022 aggregated income files
- TSE eleitorado open-data resources
- BCB / IPCA inflation series
- BCB / minimum wage nominal monthly series (BCB 1619)
Core outputs
- extracted CSVs · labeled CSVs · IPCA-deflated CSVs
- SQLite databases
- terminal dashboards (pretty + JSON)
- interactive HTML essay (
docs/index.html)
Core interfaces
brasil ibge-sync · brasil pipeline-run · brasil pipeline-run-anual · brasil query · brasil renda-por-faixa-sm · brasil dashboard
Highlights
- End-to-end pipeline from official raw files to analytic outputs.
- Both trimestral labor-income and anual full household-income composition views.
- Auto-refreshes IPCA and minimum wage references.
- Builds SQLite outputs for low-friction analytics and LLM-driven workflows.
brasil querydefaults to read-only SQL, safe for agentic use.brasil dashboardproduces weighted estimates, 95% confidence intervals (bootstrap over 200 IBGE replicate weights), and a statistical audit seal on every render.- Annual dashboard includes explicit income lenses:
- total household income
- income excluding social benefits
- income excluding public transfers
- work-only income
- Visual layer (
docs/index.html) renders the same data as 15 interactive Plotly charts with a PT/EN toggle, suitable for GitHub Pages.
Install
git clone https://github.com/ArvorCo/PNAD
cd PNAD
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
pip install -e .
You get both executables:
brasil --help
pnad --help
60-second quickstart
# 1) sync official docs + latest quarterly PNADC
brasil ibge-sync
# 2) build trimestral analytic outputs
brasil pipeline-run --raw latest
# 3) sync full scope (annual + census + TSE)
brasil ibge-sync --full
# 4) build annual visita 5 outputs
brasil pipeline-run-anual --raw latest
# 5) inspect with the terminal dashboard
brasil dashboard
# 6) render the interactive HTML essay
python docs/build_index.py
open docs/index.html
# 7) query the SQLite database (read-only by default)
brasil query \
--db data/outputs/brasil.sqlite \
--sql "SELECT name FROM sqlite_master WHERE type='table' ORDER BY name"
Main generated outputs:
data/outputs/base_labeled_npv.csv— trimestral, labeled, IPCA-adjusteddata/outputs/base_anual_labeled_npv.csv— annual visita 5, labeled, IPCA-adjusteddata/outputs/brasil.sqlite— SQLite withbase_labeled_npvandbase_anual_labeled_npvtablesdocs/index.html— static interactive essay (bilingual)data/outputs/ipca.csv— IPCA series
Typical workflows
1. Sync official data
brasil ibge-sync # latest quarterly scope
brasil ibge-sync --year 2025 --quarter 3 # a specific quarter
brasil ibge-sync --year 2025 --all-in-year
brasil ibge-sync --full # trimestral + annual + census + TSE
2. Build trimestral PNADC outputs
brasil pipeline-run \
--raw latest \
--layout data/originals/input_PNADC_trimestral.sas \
--sqlite data/outputs/brasil.sqlite
3. Build annual visita 5 outputs
brasil pipeline-run-anual \
--raw data/raw/pnadc_anual_visita5/PNADC_2024_visita5.txt \
--layout data/originals/pnadc_anual_visita5/input_PNADC_2024_visita5.txt \
--sqlite data/outputs/brasil.sqlite
4. Compute income bands
# Brazil-level distribution (with bootstrap CI)
brasil renda-por-faixa-sm \
--input data/outputs/base_labeled_npv.csv \
--group-by pais \
--format json
# UF ranking
brasil renda-por-faixa-sm \
--input data/outputs/base_labeled_npv.csv \
--group-by uf \
--uf-order renda_desc
5. Run the dashboard
# auto-discover and combine quarterly + annual when both exist
brasil dashboard
# explicit annual view (the one that separates work from benefits)
brasil dashboard \
--input data/outputs/base_anual_labeled_npv.csv \
--mode anual \
--composition-by-band \
--dependency-ranking
# export structured JSON for downstream tools or LLMs
brasil dashboard --format json > data/outputs/dashboard.json
6. Query with SQLite
# list tables
brasil query \
--db data/outputs/brasil.sqlite \
--sql "SELECT name FROM sqlite_master WHERE type='table' ORDER BY name"
# top UFs by household income
brasil query \
--db data/outputs/brasil.sqlite \
--sql "SELECT UF_label AS uf, AVG(VD5001__rendim_domiciliar) AS renda FROM base_anual_labeled_npv GROUP BY 1 ORDER BY 2 DESC LIMIT 10"
7. Build the interactive HTML essay
python docs/build_index.py # rebuilds docs/index.html
python docs/build_hero.py # regenerates docs/assets/hero.png
python -m http.server 8000 -d docs # preview locally
The generated docs/index.html is a self-contained bilingual essay (PT/EN
toggle) with 15 interactive Plotly charts reading the same PNADC data as the
terminal dashboard. It is suitable for GitHub Pages (main:/docs).
Command map
| Command | What it does | Best for |
|---|---|---|
ibge-sync |
Sync official files and docs | keeping local raw data fresh |
pipeline-run |
Build trimestral outputs | labor-income workflows |
pipeline-run-anual |
Build annual visita 5 outputs | full household-income composition |
query |
Run read-only SQL on SQLite | LLMs, analysts, automation |
renda-por-faixa-sm |
Compute income-band distributions with CI | reporting by Brazil / UF |
dashboard |
Rich terminal + JSON dashboard | exploratory analysis, briefing, storytelling |
sqlite-build |
Rebuild a table from CSV | custom pipelines and refreshes |
help-legacy |
Show legacy parser help | low-level extraction tools |
LLM / agent-friendly by design
This project is intentionally useful as an LLM-side tool.
brasil queryandbrasil dashboarddefault to JSON output.- SQL is read-only by default; writes require an explicit
--allow-write. - Query payloads include sampling metadata (CI level, replicate-weight base, method).
- The CLI hides most fragile survey mechanics (fixed-width parsing, replicate weighting, IPCA deflation) from the model.
The repository ships a project-local LLM skill:
That skill teaches an agent when to use each subcommand without falling into the dumbest interface for the question.
Methodology notes
Income definitions
- Quarterly PNADC defaults to work income (
VD4020, fallbackVD4019). - Annual visita 5 uses household total income (
VD5001) plus source decomposition (V5001A2..V5008A2), enabling the labor-vs-benefits split that the quarterly survey cannot support. - Household income distributions are aggregated through
dom_id.
Inflation and minimum wage
- Income is deflated with IPCA to a target month.
- Minimum-wage references come from BCB series 1619.
- If
--targetis omitted, the latest month in the IPCA series is used.
Weights and uncertainty
- Quarterly estimates prefer
V1028(fallbackV1027); annual preferV1032(fallbackV1031). - 95% confidence intervals use bootstrap over 200 replicate weights
(
V1028001..V1028200quarterly;V1032001..V1032200annual). brasil querydoes not infer CI for arbitrary SQL. For uncertainty-aware outputs, preferrenda-por-faixa-sm --format jsonordashboard --format json.
Read-only safety
brasil queryallowsSELECT,WITH,PRAGMA, andEXPLAINby default.- Mutating SQL requires explicit
--allow-write.
Statistical audit seal
Every dashboard render prints an audit seal — a compact checklist that
confirms which weight column was selected, how many replicate columns were
found, whether the bootstrap CI was effective, which IPCA target month was
used, and which minimum-wage reference was applied. The seal includes a short
hash of input + target + rows + households so two observers can verify they
are looking at the same estimate.
Repository layout
scripts/ main CLI and data-processing logic
tests/ pytest suite (50+ tests)
skills/ project-local skills for LLM agents
docs/ technical specs, bilingual essay, HTML builder
analysis/ exploratory analysis artifacts
notebooks/ research notebooks
samples/ tiny fixtures / examples
data/ local scaffold, outputs, raw files, docs
Main code modules:
- scripts/pnad.py — top-level CLI, dashboards, query, sync, pipelines
- scripts/pnadc_cli.py — lower-level extraction and legacy tooling
- scripts/npv_deflators.py — IPCA / deflator logic
- scripts/layout_sas.py — SAS layout parsing
- docs/build_index.py — HTML essay generator
- docs/build_hero.py — static hero PNG generator
Development
python -m pytest -q # run the full suite
ruff check scripts/ docs/ # lint
black --check scripts/ docs/ # formatting
python scripts/pnad.py --help
python -m pytest -q tests/test_dashboard.py # dashboard tests only
Zero-lint policy
This project keeps ruff check scripts/ docs/ and black --check green at
all times. Info-level warnings count. No suppressions. When adding code,
first ensure ruff --fix yields zero issues and black reformats nothing.
Contributing
Good contributions include:
- new survey integrations (PNADS, Censo Demográfico microdata, POF)
- more robust statistical validation
- better annual-income decomposition workflows
- dashboard refinements and new visualizations in
docs/index.html - documentation and examples
- performance improvements for large raw files
- decomposition of the single-file CLI into cleaner modules
Before opening a change:
- run the relevant pytest subset
- keep outputs reproducible (
brasil pipeline-run --raw latestshould produce the same files on two machines given the same raw input) - avoid unsafe SQL defaults
- preserve weighted and uncertainty-aware paths
Project status
Production-useful for:
- exploratory socioeconomic analysis
- journalism and data-essay workflows
- public-policy research
- state-by-state income comparisons
- LLM-assisted analysis of Brazilian official data
It is not an official IBGE or TSE tool. Users should still understand the underlying survey design before publishing strong claims. Start with the bundled interactive essay and the full article (PT) / (EN) for a guided, auditable reading of what the data says.
Community health
Data © IBGE / PNADC · Code © MIT · Prose © CC-BY-4.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file brasil_cli-0.4.0.tar.gz.
File metadata
- Download URL: brasil_cli-0.4.0.tar.gz
- Upload date:
- Size: 106.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c7ebab46fc7930fecfae22bbdcaaccd997c5fcb50e1e1abc326873b6014f78fc
|
|
| MD5 |
6e2af2f6bbba2031f6b56ab73dd8cc73
|
|
| BLAKE2b-256 |
46b5c46da10ae5f7d8d6a92436ed2d2485701e7129643be9c67566334bcb6854
|
Provenance
The following attestation bundles were made for brasil_cli-0.4.0.tar.gz:
Publisher:
release.yml on ArvorCo/PNAD
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
brasil_cli-0.4.0.tar.gz -
Subject digest:
c7ebab46fc7930fecfae22bbdcaaccd997c5fcb50e1e1abc326873b6014f78fc - Sigstore transparency entry: 1340713672
- Sigstore integration time:
-
Permalink:
ArvorCo/PNAD@32e50185a2f8aea3bbf7b480c5987d6d0039a18d -
Branch / Tag:
refs/tags/v0.4.0 - Owner: https://github.com/ArvorCo
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@32e50185a2f8aea3bbf7b480c5987d6d0039a18d -
Trigger Event:
push
-
Statement type:
File details
Details for the file brasil_cli-0.4.0-py3-none-any.whl.
File metadata
- Download URL: brasil_cli-0.4.0-py3-none-any.whl
- Upload date:
- Size: 100.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ae20880810af66fd74b3997a002502282675ebf89a0fc07d6790dfb0c14b861d
|
|
| MD5 |
454ac116ccbd4bbd59b63a887913b583
|
|
| BLAKE2b-256 |
b65eb0ea4cc2bf4ad5d932de7d73cb527e44cb0d72f15c89fd672e029c8a6898
|
Provenance
The following attestation bundles were made for brasil_cli-0.4.0-py3-none-any.whl:
Publisher:
release.yml on ArvorCo/PNAD
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
brasil_cli-0.4.0-py3-none-any.whl -
Subject digest:
ae20880810af66fd74b3997a002502282675ebf89a0fc07d6790dfb0c14b861d - Sigstore transparency entry: 1340713673
- Sigstore integration time:
-
Permalink:
ArvorCo/PNAD@32e50185a2f8aea3bbf7b480c5987d6d0039a18d -
Branch / Tag:
refs/tags/v0.4.0 - Owner: https://github.com/ArvorCo
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@32e50185a2f8aea3bbf7b480c5987d6d0039a18d -
Trigger Event:
push
-
Statement type: