Skip to main content

Una librería modular para construir data pipelines con arquitectura medallion

Project description

Medallion ETL

Una librería modular para construir data pipelines con arquitectura medallion (Bronze-Silver-Gold).

Características

  • Arquitectura medallion (Bronze-Silver-Gold) para procesamiento de datos
  • Interfaz simple para definir nuevos pipelines
  • Funciones reutilizables para cada capa del proceso
  • Modularidad clara entre extracción, validación y carga
  • Compatibilidad con SQLAlchemy para persistencia en bases de datos
  • Integración con Prefect para orquestación de flujos
  • Validación de datos con Pydantic
  • Procesamiento eficiente con Polars

Requisitos

  • Python 3.11+
  • polars>=1.30
  • pydantic>=2.7
  • sqlalchemy>=2.0
  • prefect>=2.0

Instalación

pip install medallion-etl

O desde el código fuente:

git clone https://github.com/usuario/medallion-etl.git
cd medallion-etl
pip install -e .

Estructura de la librería

medallion_etl/
├── bronze/            # Capa de ingesta de datos crudos
├── silver/            # Capa de validación y limpieza
├── gold/              # Capa de transformación y agregación
├── core/              # Componentes centrales de la librería
├── pipelines/         # Definición de flujos completos
├── schemas/           # Modelos Pydantic para validación
├── connectors/        # Conectores para diferentes fuentes/destinos
├── utils/             # Utilidades generales
├── config/            # Configuraciones
└——— templates/         # Plantillas para nuevos pipelines

Uso básico

Crear un pipeline simple

from medallion_etl.core import MedallionPipeline
from medallion_etl.bronze import CSVExtractor
from medallion_etl.silver import SchemaValidator
from medallion_etl.gold import Aggregator
from medallion_etl.schemas import BaseSchema

# Definir esquema de datos
class UserSchema(BaseSchema):
    id: int
    name: str
    age: int
    email: str

# Crear pipeline
pipeline = MedallionPipeline(name="UserPipeline")

# Agregar tareas
pipeline.add_bronze_task(CSVExtractor(name="UserExtractor"))
pipeline.add_silver_task(SchemaValidator(schema_model=UserSchema))
pipeline.add_gold_task(Aggregator(group_by=["age"], aggregations={"id": "count"}))

# Ejecutar pipeline
result = pipeline.run("data/users.csv")
print(result.metadata)

Usar con Prefect

from medallion_etl.core import MedallionPipeline
from medallion_etl.bronze import CSVExtractor

# Crear pipeline
pipeline = MedallionPipeline(name="SimplePipeline")
pipeline.add_bronze_task(CSVExtractor())

# Convertir a flow de Prefect
flow = pipeline.as_prefect_flow()

# Ejecutar flow
flow("data/sample.csv")

Ejemplos

Consulta la carpeta examples/ para ver ejemplos completos de pipelines:

  • weather_pipeline.py: Pipeline para procesar datos meteorológicos
  • sales_etl_pipeline.py: Pipeline ETL para datos de ventas

Contribuir

Las contribuciones son bienvenidas! Por favor, siente libre de enviar un Pull Request.

Licencia

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

medallion_etl-0.1.9.tar.gz (15.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

medallion_etl-0.1.9-py3-none-any.whl (24.6 kB view details)

Uploaded Python 3

File details

Details for the file medallion_etl-0.1.9.tar.gz.

File metadata

  • Download URL: medallion_etl-0.1.9.tar.gz
  • Upload date:
  • Size: 15.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.3 CPython/3.12.10 Windows/11

File hashes

Hashes for medallion_etl-0.1.9.tar.gz
Algorithm Hash digest
SHA256 f68ffb533bb69a481f7dfee86a52499a3d28d68056e4b397c58d2a8ef8e09f63
MD5 a6b1aee2401da09be3550dd49b8d59a6
BLAKE2b-256 c091b17e72b5baf71c3e8ae2a3e260f9f853b76c42d2ea2aebb10cb0e5eb491f

See more details on using hashes here.

File details

Details for the file medallion_etl-0.1.9-py3-none-any.whl.

File metadata

  • Download URL: medallion_etl-0.1.9-py3-none-any.whl
  • Upload date:
  • Size: 24.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.3 CPython/3.12.10 Windows/11

File hashes

Hashes for medallion_etl-0.1.9-py3-none-any.whl
Algorithm Hash digest
SHA256 da66339aa369b5fdf3c06732a476f28935e7625ca44a0c1ca0b7ab950b032806
MD5 00495058243ae33267e850a03cf35cc3
BLAKE2b-256 c9a61309dbb975c2cb9dbd39a4384f29a78c9f27e4c862b375102c4ef809a123

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page