Skip to main content

A minimal, functional Python ETL library for reading, validating, and transforming data using YAML schemas

Project description

Aptoro

PyPI version Python versions License: GPL v3 Code style: ruff

Aptoro is a Xavante word for "preparing the arrows for hunting".

It is a minimal, functional Python ETL library for reading, validating, and transforming data using YAML schemas. Designed for simplicity and correctness, it bridges the gap between raw data files (CSV, JSON) and typed, validated Python objects.

Features

  • Schema-First: Define your data model in simple, readable YAML.
  • Strict Validation: Ensures data quality with type checks, constraints, and range validation.
  • Rich Types: Built-in support for datetime (ISO 8601), url, file, dict, nested objects, and standard primitives.
  • Multi-Format: CSV, JSON, YAML, TOML, and Markdown front-matter (Jekyll/Hugo/Obsidian style).
  • Glob Patterns: Read multiple files at once with read("data/*.md").
  • Functional API: Pure functions and immutable dataclasses make pipelines predictable.
  • Zero Boilerplate: No complex class definitions—just load your schema and go.

Installation

pip install aptoro

CLI Usage

Aptoro provides a command-line interface for validating data files directly.

# Validate a CSV file against a schema
aptoro validate data.csv --schema schema.yaml

# Explicitly specify format
aptoro validate data.txt --schema schema.yaml --format json

Quick Start

from aptoro import load, load_schema, read, validate, to_json

# All-in-one: read + validate
entries = load(source="data.csv", schema="schema.yaml")

# Or step by step pipeline:
schema = load_schema("schema.yaml")
data = read("data.csv")
entries = validate(data, schema)

# Export to JSON
json_str = to_json(entries)

# Export with embedded metadata (self-describing files)
json_meta = to_json(entries, schema=schema, include_meta=True)

Documentation

For full details on the schema language, advanced validation, and API reference, see the Documentation.

Schema Language

Define your data schema in YAML:

name: lexicon_entry
description: Dictionary entries

fields:
  id: str
  lemma: str
  pos: str[noun|verb|adj|adv]     # Constrained values (Enum)
  definition: str
  translation: str?               # Optional field
  examples: list[str]?            # Optional list
  frequency: int = 0              # Default value
  created_at: datetime?           # Optional ISO 8601 datetime
  source_url: url?                # Optional URL

Type Syntax

  • Basic types: str, int, float, bool
  • Specialized types: url, file, datetime
  • Optional: str?, int?, url?, datetime?
  • Default value: str = "default", int = 0, list[str] = [], dict[str, int] = {}
  • Constrained: str[a|b|c]
  • Ranges: int[0..120], float[0.0..1.0]
  • Lists: list[str], list[int]
  • Dicts: dict, dict[str, int], dict[str]
  • Nested objects: type: object with fields block

See DOCS.md for full syntax, including inheritance, nested structures, and front-matter reading.

Supported Formats

  • CSV (auto-detects types)
  • JSON
  • YAML
  • TOML
  • Markdown front-matter (.md files with YAML front matter)

License

GNU General Public License v3 (GPLv3)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aptoro-0.4.0.tar.gz (53.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

aptoro-0.4.0-py3-none-any.whl (40.1 kB view details)

Uploaded Python 3

File details

Details for the file aptoro-0.4.0.tar.gz.

File metadata

  • Download URL: aptoro-0.4.0.tar.gz
  • Upload date:
  • Size: 53.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for aptoro-0.4.0.tar.gz
Algorithm Hash digest
SHA256 dd2a8e224be783d4a431fcd3e79d4f95cd12c188bf6a163958673bc2b244c1bd
MD5 a96b692efff21b4030944c6502528ae2
BLAKE2b-256 b5ced032303812cd228a890b89d214ebdc550a049ea0b3571acd9b8c6f6ae39d

See more details on using hashes here.

File details

Details for the file aptoro-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: aptoro-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 40.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for aptoro-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 74591d47917476fe4b69bf8ab77ae7257bb333a56f15d00c2342fad7eb0bce55
MD5 f030dbf1b09566e8675c3f44d7793b22
BLAKE2b-256 1f3d157d73cef9f54e481263bc9a038b6b4cc5c149f37ee8b5641f45c33d6e67

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page