Research-grade CLI for Brazilian public microdata, dashboards, and LLM-safe analytics

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

ArvorCo

These details have not been verified by PyPI

Project links

Project description

Brasil CLI

A research-grade CLI for Brazilian public microdata.

Brasil — renda domiciliar por UF

📊 Interactive data essay → · 🇧🇷 Artigo PT · 🇬🇧 Article EN

Why this exists

Brazil takes excellent photographs of itself. The IBGE's PNADC — a continuous household survey that reaches roughly a quarter-million Brazilians each year — is as meticulous a census as any nation conducts. And yet, between the raw fixed-width files the government publishes and anything a citizen, journalist, or policy analyst could read with their own eyes, there is a vast field of friction: SAS layouts, archaic encodings, inflation deflators, nominal minimum-wage splines, replicate weights nobody teaches. In that friction, the country hides from itself.

This repository is an attempt to close that gap. It compresses the painful path from official microdata into a single, auditable command-line tool (brasil) whose output — CSVs, SQLite tables, JSON payloads, a rich terminal dashboard, and the interactive data essay in this folder — any Brazilian (or anyone interested in the country) can read, reproduce, and challenge. Numbers are not neutral, but auditability is. If a claim about Brazilian inequality cannot be traced back to a bootstrap weight, a deflator, and a specific UF row in the PNADC, it does not belong in public debate.

Canonical executable: brasil · Compatibility alias: pnad

What the project covers

Official data sources

PNADC trimestral microdata
PNADC anual visita 5 microdata (work + benefits + pensions + capital decomposition)
Censo 2022 aggregated income files
TSE eleitorado open-data resources
BCB / IPCA inflation series
BCB / minimum wage nominal monthly series (BCB 1619)

Core outputs

extracted CSVs · labeled CSVs · IPCA-deflated CSVs
SQLite databases
terminal dashboards (pretty + JSON)
interactive HTML essay (docs/index.html)

Core interfaces

brasil ibge-sync · brasil pipeline-run · brasil pipeline-run-anual · brasil query · brasil renda-por-faixa-sm · brasil dashboard

Highlights

End-to-end pipeline from official raw files to analytic outputs.
Both trimestral labor-income and anual full household-income composition views.
Auto-refreshes IPCA and minimum wage references.
Builds SQLite outputs for low-friction analytics and LLM-driven workflows.
brasil query defaults to read-only SQL, safe for agentic use.
brasil dashboard produces weighted estimates, 95% confidence intervals (bootstrap over 200 IBGE replicate weights), and a statistical audit seal on every render.
Annual dashboard includes explicit income lenses:
- total household income
- income excluding social benefits
- income excluding public transfers
- work-only income
Visual layer (docs/index.html) renders the same data as 15 interactive Plotly charts with a PT/EN toggle, suitable for GitHub Pages.

Install

git clone https://github.com/ArvorCo/PNAD
cd PNAD

python -m venv .venv
source .venv/bin/activate

pip install -r requirements.txt
pip install -e .

You get both executables:

brasil --help
pnad --help

60-second quickstart

# 1) sync official docs + latest quarterly PNADC
brasil ibge-sync

# 2) build trimestral analytic outputs
brasil pipeline-run --raw latest

# 3) sync full scope (annual + census + TSE)
brasil ibge-sync --full

# 4) build annual visita 5 outputs
brasil pipeline-run-anual --raw latest

# 5) inspect with the terminal dashboard
brasil dashboard

# 6) render the interactive HTML essay
python docs/build_index.py
open docs/index.html

# 7) query the SQLite database (read-only by default)
brasil query \
  --db data/outputs/brasil.sqlite \
  --sql "SELECT name FROM sqlite_master WHERE type='table' ORDER BY name"

Main generated outputs:

data/outputs/base_labeled_npv.csv — trimestral, labeled, IPCA-adjusted
data/outputs/base_anual_labeled_npv.csv — annual visita 5, labeled, IPCA-adjusted
data/outputs/brasil.sqlite — SQLite with base_labeled_npv and base_anual_labeled_npv tables
docs/index.html — static interactive essay (bilingual)
data/outputs/ipca.csv — IPCA series

Typical workflows

1. Sync official data

brasil ibge-sync                          # latest quarterly scope
brasil ibge-sync --year 2025 --quarter 3  # a specific quarter
brasil ibge-sync --year 2025 --all-in-year
brasil ibge-sync --full                   # trimestral + annual + census + TSE

2. Build trimestral PNADC outputs

brasil pipeline-run \
  --raw latest \
  --layout data/originals/input_PNADC_trimestral.sas \
  --sqlite data/outputs/brasil.sqlite

3. Build annual visita 5 outputs

brasil pipeline-run-anual \
  --raw data/raw/pnadc_anual_visita5/PNADC_2024_visita5.txt \
  --layout data/originals/pnadc_anual_visita5/input_PNADC_2024_visita5.txt \
  --sqlite data/outputs/brasil.sqlite

4. Compute income bands

# Brazil-level distribution (with bootstrap CI)
brasil renda-por-faixa-sm \
  --input data/outputs/base_labeled_npv.csv \
  --group-by pais \
  --format json

# UF ranking
brasil renda-por-faixa-sm \
  --input data/outputs/base_labeled_npv.csv \
  --group-by uf \
  --uf-order renda_desc

5. Run the dashboard

# auto-discover and combine quarterly + annual when both exist
brasil dashboard

# explicit annual view (the one that separates work from benefits)
brasil dashboard \
  --input data/outputs/base_anual_labeled_npv.csv \
  --mode anual \
  --composition-by-band \
  --dependency-ranking

# export structured JSON for downstream tools or LLMs
brasil dashboard --format json > data/outputs/dashboard.json

6. Query with SQLite

# list tables
brasil query \
  --db data/outputs/brasil.sqlite \
  --sql "SELECT name FROM sqlite_master WHERE type='table' ORDER BY name"

# top UFs by household income
brasil query \
  --db data/outputs/brasil.sqlite \
  --sql "SELECT UF_label AS uf, AVG(VD5001__rendim_domiciliar) AS renda FROM base_anual_labeled_npv GROUP BY 1 ORDER BY 2 DESC LIMIT 10"

7. Build the interactive HTML essay

python docs/build_index.py                  # rebuilds docs/index.html
python docs/build_hero.py                   # regenerates docs/assets/hero.png
python -m http.server 8000 -d docs          # preview locally

The generated docs/index.html is a self-contained bilingual essay (PT/EN toggle) with 15 interactive Plotly charts reading the same PNADC data as the terminal dashboard. It is suitable for GitHub Pages (main:/docs).

Command map

Command	What it does	Best for
`ibge-sync`	Sync official files and docs	keeping local raw data fresh
`pipeline-run`	Build trimestral outputs	labor-income workflows
`pipeline-run-anual`	Build annual visita 5 outputs	full household-income composition
`query`	Run read-only SQL on SQLite	LLMs, analysts, automation
`renda-por-faixa-sm`	Compute income-band distributions with CI	reporting by Brazil / UF
`dashboard`	Rich terminal + JSON dashboard	exploratory analysis, briefing, storytelling
`sqlite-build`	Rebuild a table from CSV	custom pipelines and refreshes
`help-legacy`	Show legacy parser help	low-level extraction tools

LLM / agent-friendly by design

This project is intentionally useful as an LLM-side tool.

brasil query and brasil dashboard default to JSON output.
SQL is read-only by default; writes require an explicit --allow-write.
Query payloads include sampling metadata (CI level, replicate-weight base, method).
The CLI hides most fragile survey mechanics (fixed-width parsing, replicate weighting, IPCA deflation) from the model.

The repository ships a project-local LLM skill:

skills/brasil-cli-analyst/SKILL.md

That skill teaches an agent when to use each subcommand without falling into the dumbest interface for the question.

Methodology notes

Income definitions

Quarterly PNADC defaults to work income (VD4020, fallback VD4019).
Annual visita 5 uses household total income (VD5001) plus source decomposition (V5001A2..V5008A2), enabling the labor-vs-benefits split that the quarterly survey cannot support.
Household income distributions are aggregated through dom_id.

Inflation and minimum wage

Income is deflated with IPCA to a target month.
Minimum-wage references come from BCB series 1619.
If --target is omitted, the latest month in the IPCA series is used.

Weights and uncertainty

Quarterly estimates prefer V1028 (fallback V1027); annual prefer V1032 (fallback V1031).
95% confidence intervals use bootstrap over 200 replicate weights (V1028001..V1028200 quarterly; V1032001..V1032200 annual).
brasil query does not infer CI for arbitrary SQL. For uncertainty-aware outputs, prefer renda-por-faixa-sm --format json or dashboard --format json.

Read-only safety

brasil query allows SELECT, WITH, PRAGMA, and EXPLAIN by default.
Mutating SQL requires explicit --allow-write.

Statistical audit seal

Every dashboard render prints an audit seal — a compact checklist that confirms which weight column was selected, how many replicate columns were found, whether the bootstrap CI was effective, which IPCA target month was used, and which minimum-wage reference was applied. The seal includes a short hash of input + target + rows + households so two observers can verify they are looking at the same estimate.

Repository layout

scripts/      main CLI and data-processing logic
tests/        pytest suite (50+ tests)
skills/       project-local skills for LLM agents
docs/         technical specs, bilingual essay, HTML builder
analysis/     exploratory analysis artifacts
notebooks/    research notebooks
samples/      tiny fixtures / examples
data/         local scaffold, outputs, raw files, docs

Main code modules:

scripts/pnad.py — top-level CLI, dashboards, query, sync, pipelines
scripts/pnadc_cli.py — lower-level extraction and legacy tooling
scripts/npv_deflators.py — IPCA / deflator logic
scripts/layout_sas.py — SAS layout parsing
docs/build_index.py — HTML essay generator
docs/build_hero.py — static hero PNG generator

Development

python -m pytest -q                          # run the full suite
ruff check scripts/ docs/                    # lint
black --check scripts/ docs/                 # formatting
python scripts/pnad.py --help
python -m pytest -q tests/test_dashboard.py  # dashboard tests only

Zero-lint policy

This project keeps ruff check scripts/ docs/ and black --check green at all times. Info-level warnings count. No suppressions. When adding code, first ensure ruff --fix yields zero issues and black reformats nothing.

Contributing

Good contributions include:

new survey integrations (PNADS, Censo Demográfico microdata, POF)
more robust statistical validation
better annual-income decomposition workflows
dashboard refinements and new visualizations in docs/index.html
documentation and examples
performance improvements for large raw files
decomposition of the single-file CLI into cleaner modules

Before opening a change:

run the relevant pytest subset
keep outputs reproducible (brasil pipeline-run --raw latest should produce the same files on two machines given the same raw input)
avoid unsafe SQL defaults
preserve weighted and uncertainty-aware paths

Project status

Production-useful for:

exploratory socioeconomic analysis
journalism and data-essay workflows
public-policy research
state-by-state income comparisons
LLM-assisted analysis of Brazilian official data

It is not an official IBGE or TSE tool. Users should still understand the underlying survey design before publishing strong claims. Start with the bundled interactive essay and the full article (PT) / (EN) for a guided, auditable reading of what the data says.

Community health

_{Written, compiled, and maintained by Leonardo Dias with support from Arvor.

Data © IBGE / PNADC · Code © MIT · Prose © CC-BY-4.0}

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

ArvorCo

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.4.0

Apr 19, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

brasil_cli-0.4.0.tar.gz (106.4 kB view details)

Uploaded Apr 19, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

brasil_cli-0.4.0-py3-none-any.whl (100.7 kB view details)

Uploaded Apr 19, 2026 Python 3

File details

Details for the file brasil_cli-0.4.0.tar.gz.

File metadata

Download URL: brasil_cli-0.4.0.tar.gz
Upload date: Apr 19, 2026
Size: 106.4 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for brasil_cli-0.4.0.tar.gz
Algorithm	Hash digest
SHA256	`c7ebab46fc7930fecfae22bbdcaaccd997c5fcb50e1e1abc326873b6014f78fc`
MD5	`6e2af2f6bbba2031f6b56ab73dd8cc73`
BLAKE2b-256	`46b5c46da10ae5f7d8d6a92436ed2d2485701e7129643be9c67566334bcb6854`

See more details on using hashes here.

Provenance

The following attestation bundles were made for brasil_cli-0.4.0.tar.gz:

Publisher: release.yml on ArvorCo/PNAD

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: brasil_cli-0.4.0.tar.gz
- Subject digest: c7ebab46fc7930fecfae22bbdcaaccd997c5fcb50e1e1abc326873b6014f78fc
- Sigstore transparency entry: 1340713672
- Sigstore integration time: Apr 19, 2026
Source repository:
- Permalink: ArvorCo/PNAD@32e50185a2f8aea3bbf7b480c5987d6d0039a18d
- Branch / Tag: refs/tags/v0.4.0
- Owner: https://github.com/ArvorCo
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@32e50185a2f8aea3bbf7b480c5987d6d0039a18d
- Trigger Event: push

File details

Details for the file brasil_cli-0.4.0-py3-none-any.whl.

File metadata

Download URL: brasil_cli-0.4.0-py3-none-any.whl
Upload date: Apr 19, 2026
Size: 100.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for brasil_cli-0.4.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ae20880810af66fd74b3997a002502282675ebf89a0fc07d6790dfb0c14b861d`
MD5	`454ac116ccbd4bbd59b63a887913b583`
BLAKE2b-256	`b65eb0ea4cc2bf4ad5d932de7d73cb527e44cb0d72f15c89fd672e029c8a6898`

See more details on using hashes here.

Provenance

The following attestation bundles were made for brasil_cli-0.4.0-py3-none-any.whl:

Publisher: release.yml on ArvorCo/PNAD

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: brasil_cli-0.4.0-py3-none-any.whl
- Subject digest: ae20880810af66fd74b3997a002502282675ebf89a0fc07d6790dfb0c14b861d
- Sigstore transparency entry: 1340713673
- Sigstore integration time: Apr 19, 2026
Source repository:
- Permalink: ArvorCo/PNAD@32e50185a2f8aea3bbf7b480c5987d6d0039a18d
- Branch / Tag: refs/tags/v0.4.0
- Owner: https://github.com/ArvorCo
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@32e50185a2f8aea3bbf7b480c5987d6d0039a18d
- Trigger Event: push

brasil-cli 0.4.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Brasil CLI

📊 Interactive data essay → · 🇧🇷 Artigo PT · 🇬🇧 Article EN

Why this exists

What the project covers

Official data sources

Core outputs

Core interfaces

Highlights

Install

60-second quickstart

Typical workflows

1. Sync official data

2. Build trimestral PNADC outputs

3. Build annual visita 5 outputs

4. Compute income bands

5. Run the dashboard

6. Query with SQLite

7. Build the interactive HTML essay

Command map

LLM / agent-friendly by design

Methodology notes

Income definitions

Inflation and minimum wage

Weights and uncertainty

Read-only safety

Statistical audit seal

Repository layout

Development

Zero-lint policy

Contributing

Project status

Community health

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance