Brazilian public data integration platform for scientific research
Project description
🇧🇷 Guaraci: Brazilian Public Data Integration Platform
A comprehensive toolkit for accessing, integrating, and analyzing Brazilian public data, with initial focus on public health and Neglected Tropical Diseases (NTDs).
🎯 Overview
Guaraci addresses a critical gap in Brazilian public health data accessibility. While databases exist for high-visibility diseases like COVID-19 and tuberculosis, Neglected Tropical Diseases (NTDs) remain underrepresented in computational epidemiology. Guaraci provides:
- Unified Access: Single interface to multiple Brazilian health databases (DATASUS, SINAN, SIH, SIM, SIA)
- Scientific Reproducibility: Standardized, versioned datasets with complete metadata
- Performance Optimized: Concurrent downloads and memory-efficient processing
- Multiple Interfaces: Both Python API and CLI for different use cases
🚀 Quick Start
Instalação via pip
Escolha conforme a necessidade:
- Núcleo (sem DATASUS nem API):
pip install guaraci - DATASUS (PySUS: SINAN/SIM/SIH):
pip install "guaraci[datasus]" - API (FastAPI/uvicorn/httpx):
pip install "guaraci[api]" - Completo (todos os extras):
pip install "guaraci[full]"
Docker Setup (Recommended)
# Clone the repository
git clone https://github.com/autoaihub/guaraci.git
cd guaraci
# Build the Docker image
docker build -t guaraci .
# Run Guaraci commands
docker run --rm -it -v "$(pwd):/app" guaraci python -m guaraci.cli.main --help
Download SINAN Data (Docker)
# Download data for specific diseases and years
docker run --rm -it -v "$(pwd):/app" guaraci \
python -m guaraci.cli.sinan_cli download 2020 2022 \
--diseases DENG ZIKA --format csv
# Download single disease for one year
docker run --rm -it -v "$(pwd):/app" guaraci \
python -m guaraci.cli.sinan_cli download 2020 2020 \
--diseases RAIV --format csv
Python API (Inside Docker)
# Interactive Python session
docker run --rm -it -v "$(pwd):/app" guaraci python
# Then in Python:
from guaraci.datasus import SinanDataSource
# Initialize SINAN data source
sinan = SinanDataSource()
# Download data
sinan.download(start_year=2020, end_year=2020, diseases=['RAIV'])
# Load as DataFrame
df = sinan.load_dataframe('RAIV')
# Apply filters
filtered = sinan.filter(df, uf='SP')
# Export results
sinan.export(filtered, format='csv', name='raiva_sp')
Available CLI Commands
# Show platform information
docker run --rm guaraci python -m guaraci.cli.main info
# Download SINAN data
docker run --rm -it -v "$(pwd):/app" guaraci \
python -m guaraci.cli.sinan_cli download 2020 2020 --diseases DENG --format csv
# Filter existing data (after download)
docker run --rm -it -v "$(pwd):/app" guaraci \
python -m guaraci.cli.sinan_cli filter DENG --uf SP --output filtered_dengue
# Generate summary statistics
docker run --rm -it -v "$(pwd):/app" guaraci \
python -m guaraci.cli.sinan_cli summary DENG --by UF --metric count
# Get information about available fields
docker run --rm -it -v "$(pwd):/app" guaraci \
python -m guaraci.cli.sinan_cli info DENG
📊 Supported Data Sources
SINAN (Sistema de Informação de Agravos de Notificação)
- Focus: Notifiable diseases surveillance
- Coverage: 2007-present
- Diseases: All SINAN diseases with emphasis on NTDs
- Format: Parquet, CSV, SQLite
Supported Neglected Tropical Diseases
ANIM- Acidentes por Animais PeçonhentosCHAG- Doença de ChagasCHIK- ChikungunyaDENG- DengueESQU- EsquistossomoseHANS- HanseníaseLEIV- Leishmaniose VisceralLTAN- Leishmaniose TegumentarRAIV- Raiva Humana
🛠 Development Setup
Docker-Based Development (Recommended)
# Clone repository
git clone https://github.com/autoaihub/guaraci.git
cd guaraci
# Build the Docker image
docker build -t guaraci .
# Run tests
docker run --rm guaraci python -m pytest tests/ -v
# Interactive development shell
docker run --rm -it -v "$(pwd):/app" guaraci bash
# Run specific commands
docker run --rm -it -v "$(pwd):/app" guaraci python -c "import guaraci; print(guaraci.__version__)"
Windows Users
# Use full paths for volume mounting
docker run --rm -it -v "C:\path\to\guaraci:/app" guaraci python -m guaraci.cli.main info
# Example with actual path (single line)
docker run --rm -it -v "C:\Users\username\Documents\guaraci:/app" guaraci python -m guaraci.cli.sinan_cli download 2020 2020 --diseases RAIV --format csv
# Multi-line with PowerShell backtick continuation
docker run --rm -it -v "C:\Users\username\Documents\guaraci:/app" guaraci `
python -m guaraci.cli.sinan_cli download 2020 2020 --diseases RAIV --format csv
📖 Documentation
Configuration
Guaraci can be configured using environment variables in Docker:
# Run with custom configuration
docker run --rm -it -v "$(pwd):/app" \
-e GUARACI_DATA_ROOT=/app/data \
-e GUARACI_LOG_LEVEL=DEBUG \
-e GUARACI_MAX_CONCURRENT_DOWNLOADS=10 \
guaraci python -m guaraci.cli.sinan_cli download 2020 2020 --diseases DENG
Advanced Usage (Python API in Docker)
# Start interactive Python session
docker run --rm -it -v "$(pwd):/app" guaraci python
# Then in Python:
from guaraci.datasus import SinanDataSource
from guaraci.core.config import config
# View current configuration
print(f"Data root: {config.data_root}")
print(f"Max downloads: {config.max_concurrent_downloads}")
# Initialize with custom settings
sinan = SinanDataSource()
# Download with specific parameters
sinan.download(2020, 2021, diseases=['DENG'])
# Load and process data
df = sinan.load_dataframe('DENG')
# Advanced filtering
filtered = sinan.filter(
df,
uf='SP',
municipio='São Paulo',
ano=2021
)
# Generate summary statistics
summary = sinan.summary(filtered, by='CS_SEXO', metric='count')
print(summary)
# Export results
sinan.export(filtered, format='csv', name='dengue_sp_2021')
🧪 Testing
All testing is done within Docker containers:
# Run all tests
docker run --rm guaraci python -m pytest tests/ -v
# Run with coverage
docker run --rm guaraci python -m pytest tests/ --cov=guaraci --cov-report=term-missing
# Run specific test file
docker run --rm guaraci python -m pytest tests/test_utils.py -v
# Test installation
docker run --rm guaraci python test_install.py
🤝 Contributing
We welcome contributions! Please see our Contributing Guidelines for details.
Development Workflow
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Make your changes
- Add tests for new functionality
- Run the test suite (
pytest) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
👥 Authors & Contributors
-
Luis Felipe Vogel Lopes – Lead Developer (v0.2 and ongoing) – vogel@usp.br
Responsible for the full modernization of Guaraci, including modular architecture, Docker-first workflow, Pydantic configuration system, enhanced CLI, and full testing suite. -
Pedro Guilherme dos Reis Teixeira – Original Author (v0.1) – pedro.guilherme2305@usp.br
Created the initial Guaraci prototype and early SINAN integration. -
Prof. Robson Parmezan Bonidia – Scientific Advisor – ICMC/USP
-
Prof. André Carlos Ponce de Leon Ferreira de Carvalho – Scientific Advisor – ICMC/USP
🙏 Acknowledgments
- PySUS - Foundation for DATASUS integration
- Polars - High-performance DataFrame library
- ICMC/USP - Institutional support
- Brazilian Ministry of Health - Data provision through DATASUS
📚 Citation
If you use Guaraci in your research, please cite:
@software{guaraci2025,
title = {Guaraci: Brazilian Public Data Integration Platform},
author = {Lopes, Luis Felipe Vogel and Teixeira, Pedro Guilherme dos Reis and Bonidia, Robson Parmezan and Carvalho, André Carlos Ponce de Leon Ferreira de},
year = {2025},
version = {0.2},
url = {https://github.com/autoaihub/guaraci}
}
🔗 Links
- Documentation (Coming Soon)
- PyPI Package (Coming Soon)
- Issue Tracker
- DATASUS
- PySUS Documentation
📝 Changelog
Veja CHANGELOG.md para histórico de versões e novidades.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file guaraci-0.3.0.tar.gz.
File metadata
- Download URL: guaraci-0.3.0.tar.gz
- Upload date:
- Size: 32.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
65e355f56d824c3b4a522a028035088836b37fbd1106293f290f15d5652d789b
|
|
| MD5 |
30474fad210258be09c336db037cef91
|
|
| BLAKE2b-256 |
ca2fdf8801c2e99e193736a99487553992fbdaa1299c44cfeb19ef9020c59427
|
File details
Details for the file guaraci-0.3.0-py3-none-any.whl.
File metadata
- Download URL: guaraci-0.3.0-py3-none-any.whl
- Upload date:
- Size: 37.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b63597d687c08d79d820500528200bf3aaae3aed0684dbfb856b23a1ac9a76d8
|
|
| MD5 |
91a6a997d93386120aed39acb3937d58
|
|
| BLAKE2b-256 |
76a4bd181eb36846fd6ca25494066855f627ed1a00b35e656d5dc7dd4b8e1bd2
|