Upgrade your Pandas ETL process.

These details have not been verified by PyPI

Project links

Project description

Abraxos

Abraxos is a lightweight Python toolkit for robust, row-aware data processing using Pandas and Pydantic. It helps you build resilient ETL pipelines that gracefully handle errors at the row level.

✨ Why Abraxos?

Traditional data pipelines fail completely when they encounter a single bad row. Abraxos changes that:

🛡️ Fault-tolerant by design - isolate and recover from row-level errors
🔍 Full error visibility - see exactly which rows failed and why
🔄 Automatic retry logic - recursive splitting to isolate problem rows
📊 Production-ready - 118 tests, 92% coverage, type-safe

🚀 Features

📄 CSV Ingestion with Bad Line Recovery
Read CSVs in full or in chunks, automatically capturing malformed lines separately.
🔁 Transform DataFrames Resiliently
Apply transformation functions and automatically isolate rows that fail.
🧪 Pydantic-Based Row Validation
Validate each row using Pydantic models, separating valid and invalid records.
🛢️ SQL Insertion with Error Splitting
Insert DataFrames into SQL databases with automatic retry and chunking for failed rows.

📦 Installation

pip install abraxos

With optional dependencies:

# For SQL support
pip install abraxos[sql]

# For Pydantic validation
pip install abraxos[validate]

# For development
pip install abraxos[dev]

# Everything
pip install abraxos[all]

Requirements:

Python 3.10+
pandas >= 1.5.0
numpy >= 1.23.0
Optional: sqlalchemy >= 2.0.0
Optional: pydantic >= 2.0.0

📖 Documentation

Full documentation is available at: https://abraxos.readthedocs.io

🎯 Quick Start

Here are real, tested examples showing Abraxos in action:

🔍 Example 1: Read CSVs with Error Recovery

Abraxos captures malformed lines instead of crashing your pipeline:

from abraxos import read_csv

# Read a CSV that has some malformed lines
result = read_csv("data.csv")

print("Bad lines:", result.bad_lines)
print("\nClean data:")
print(result.dataframe)

Output:

Bad lines: [['TOO', 'MANY', 'COLUMNS', 'HERE']]

Clean data:
   id    name  age
0   1     Joe   28
1   2   Alice   35
2   3  Marcus   40

🧼 Example 2: Transform with Fault Isolation

Apply transformations that automatically isolate problematic rows:

import pandas as pd
from abraxos import transform

df = pd.DataFrame({
    'id': [1, 2, 3],
    'name': ['  Joe  ', '  Alice  ', '  Marcus  '],
    'age': [28, 35, 40]
})

def clean_data(df):
    df = df.copy()
    df["name"] = df["name"].str.strip().str.lower()
    return df

result = transform(df, clean_data)
print("Errors:", result.errors)
print("\nSuccess DataFrame:")
print(result.success_df)

Output:

Errors: []

Success DataFrame:
   id    name  age
0   1     joe   28
1   2   alice   35
2   3  marcus   40

⚡ Example 3: Automatic Error Isolation

When transformation fails on some rows, Abraxos automatically isolates them:

import pandas as pd
from abraxos import transform

df = pd.DataFrame({'value': [1, 2, 0, 3, 4]})

def divide_by_value(df):
    df = df.copy()
    if (df['value'] == 0).any():
        raise ValueError('Cannot divide by zero')
    df['result'] = 100 / df['value']
    return df

result = transform(df, divide_by_value)

print(f"Errors encountered: {len(result.errors)}")
print(f"\nSuccessful rows ({len(result.success_df)}):")
print(result.success_df)
print(f"\nFailed rows ({len(result.errored_df)}):")
print(result.errored_df)

Output:

Errors encountered: 1

Successful rows (4):
   value      result
0      1  100.000000
1      2   50.000000
3      3   33.333333
4      4   25.000000

Failed rows (1):
   value
2      0

Notice how Abraxos automatically isolated the problematic row (value=0) and processed the rest!

✅ Example 4: Validate with Pydantic

Validate each row and separate valid from invalid data:

import pandas as pd
from abraxos import validate
from pydantic import BaseModel

class Person(BaseModel):
    name: str
    age: int

df = pd.DataFrame({
    'name': ['Joe', 'Alice', 'Marcus'],
    'age': [28, 'invalid', 40]
})

result = validate(df, Person)

print("Valid rows:")
print(result.success_df)
print(f"\nNumber of validation errors: {len(result.errors)}")
print("\nInvalid rows:")
print(result.errored_df)

Output:

Valid rows:
     name  age
0     Joe   28
2  Marcus   40

Number of validation errors: 1

Invalid rows:
    name      age
1  Alice  invalid

🗃️ Example 5: SQL Insertion with Retry Logic

Insert data into SQL with automatic error handling:

import pandas as pd
from abraxos import to_sql
from sqlalchemy import create_engine

engine = create_engine("sqlite:///example.db")

df = pd.DataFrame({
    'name': ['Joe', 'Alice', 'Marcus'],
    'age': [28, 35, 40]
})

result = to_sql(df, "people", engine)

print(f"Successful inserts: {result.success_df.shape[0]}")
print(f"Failed rows: {result.errored_df.shape[0]}")

Output:

Successful inserts: 3
Failed rows: 0

Data in database:
     name  age
0     Joe   28
1   Alice   35
2  Marcus   40

📚 Example 6: Process Large Files in Chunks

Read and process large CSV files efficiently:

from abraxos import read_csv

# Read in chunks of 1000 rows
for chunk_result in read_csv("large_file.csv", chunksize=1000):
    print(f"Processing chunk with {len(chunk_result.dataframe)} rows")
    print(f"Bad lines in this chunk: {len(chunk_result.bad_lines)}")
    
    # Process the chunk
    # ... your processing logic here

Output:

Reading in chunks of 2 rows:

Chunk 1:
   id  value
0   1     10
1   2     20

Chunk 2:
   id  value
2   3     30
3   4     40

Chunk 3:
   id  value
4   5     50

🔄 Complete ETL Pipeline Example

Here's a complete example combining multiple features:

from abraxos import read_csv, transform, validate, to_sql
from pydantic import BaseModel
from sqlalchemy import create_engine

# 1. Extract: Read CSV with error recovery
csv_result = read_csv("messy_data.csv")
print(f"Captured {len(csv_result.bad_lines)} bad lines")

# 2. Transform: Clean the data
def clean_data(df):
    df = df.copy()
    df['name'] = df['name'].str.strip().str.title()
    df['age'] = pd.to_numeric(df['age'], errors='coerce')
    return df.dropna()

transform_result = transform(csv_result.dataframe, clean_data)
print(f"Transformed {len(transform_result.success_df)} rows successfully")

# 3. Validate: Ensure data quality
class Person(BaseModel):
    name: str
    age: int

validate_result = validate(transform_result.success_df, Person)
print(f"Validated {len(validate_result.success_df)} rows")
print(f"Validation failed for {len(validate_result.errored_df)} rows")

# 4. Load: Insert into database
engine = create_engine("sqlite:///clean_data.db")
load_result = to_sql(validate_result.success_df, "people", engine)
print(f"Loaded {len(load_result.success_df)} rows to database")

# Save error reports
csv_result.bad_lines  # Malformed CSV lines
transform_result.errored_df  # Rows that failed transformation
validate_result.errored_df  # Rows that failed validation
load_result.errored_df  # Rows that failed to insert

🏗️ API Reference

Core Functions

`read_csv(path, *, chunksize=None, **kwargs) -> ReadCsvResult | Generator`

Read CSV files with automatic bad line recovery.

Returns: ReadCsvResult(bad_lines, dataframe) or generator of results if chunked.

`transform(df, transformer, chunks=2) -> TransformResult`

Apply a transformation function with automatic error isolation.

Returns: TransformResult(errors, errored_df, success_df)

`validate(df, model) -> ValidateResult`

Validate DataFrame rows using a Pydantic model.

Returns: ValidateResult(errors, errored_df, success_df)

`to_sql(df, name, con, *, if_exists='append', chunks=2, **kwargs) -> ToSqlResult`

Insert DataFrame into SQL database with retry logic.

Returns: ToSqlResult(errors, errored_df, success_df)

Utility Functions

split(df, n=2) - Split DataFrame into n parts
clear(df) - Create empty DataFrame with same schema
to_records(df) - Convert DataFrame to list of dicts with None for NaN

🧪 Testing & Development

Abraxos is thoroughly tested and type-safe:

# Install with dev dependencies
pip install -e ".[dev]"

# Run tests with coverage (118 tests, 92% coverage)
pytest

# Run type checking
mypy abraxos  # Success: no issues found

# Run linting and formatting
ruff check .  # All checks passed
ruff format .

Test Coverage:

118 tests passing
92% code coverage
All major code paths tested
Type-safe with mypy

🤝 Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

Quick checklist:

✅ Add tests for new features
✅ Maintain 90%+ coverage
✅ Pass all type checks (mypy abraxos)
✅ Pass all lints (ruff check .)
✅ Update documentation

📝 Changelog

See CHANGELOG.md for version history and migration guides.

📄 License

🧙‍♂️ Author

Crafted by Odos Matthews to bring resilience and magic to data workflows.

⭐ Support

If Abraxos helps your project, consider:

⭐ Starring the repo
🐛 Reporting issues
🤝 Contributing improvements
📢 Sharing with others

Happy data processing! 🚀

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Oct 20, 2025

0.0.7

Jun 30, 2025

0.0.6

Jun 30, 2025

0.0.5

Feb 18, 2025

0.0.4

Feb 9, 2025

0.0.3

Feb 9, 2025

0.0.2

Jan 19, 2025

0.0.1

Jan 14, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

abraxos-0.1.0.tar.gz (27.1 kB view details)

Uploaded Oct 20, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

abraxos-0.1.0-py3-none-any.whl (29.4 kB view details)

Uploaded Oct 20, 2025 Python 3

File details

Details for the file abraxos-0.1.0.tar.gz.

File metadata

Download URL: abraxos-0.1.0.tar.gz
Upload date: Oct 20, 2025
Size: 27.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.8.18

File hashes

Hashes for abraxos-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`267c426961e91e79e63b53b405e905de8a87287c63a2b6e5ecf18467e382d297`
MD5	`c65146e5f5bc4bee64cb9250a6fb154f`
BLAKE2b-256	`a24923fb7c81b49bbf6b18bcfc14a4ffd249b935523936b7e39883e91a2984bf`

See more details on using hashes here.

File details

Details for the file abraxos-0.1.0-py3-none-any.whl.

File metadata

Download URL: abraxos-0.1.0-py3-none-any.whl
Upload date: Oct 20, 2025
Size: 29.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.8.18

File hashes

Hashes for abraxos-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0019f4092ccc485a0b85ba30cc1de4d34dc2277ec333d864839e4382c8c1c836`
MD5	`6e0a2480b14aa03de70970ac5da798a0`
BLAKE2b-256	`2034c18266eb5d6c1a8a44c2c73355847c59a419eda56caca2ec9e32b3b5a6b7`

See more details on using hashes here.

abraxos 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

Abraxos

✨ Why Abraxos?

🚀 Features

📦 Installation

📖 Documentation

🎯 Quick Start

🔍 Example 1: Read CSVs with Error Recovery

🧼 Example 2: Transform with Fault Isolation

⚡ Example 3: Automatic Error Isolation

✅ Example 4: Validate with Pydantic

🗃️ Example 5: SQL Insertion with Retry Logic

📚 Example 6: Process Large Files in Chunks

🔄 Complete ETL Pipeline Example

🏗️ API Reference

Core Functions

read_csv(path, *, chunksize=None, **kwargs) -> ReadCsvResult | Generator

transform(df, transformer, chunks=2) -> TransformResult

validate(df, model) -> ValidateResult

to_sql(df, name, con, *, if_exists='append', chunks=2, **kwargs) -> ToSqlResult

Utility Functions

🧪 Testing & Development

🤝 Contributing

📝 Changelog

📄 License

🧙‍♂️ Author

⭐ Support

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`read_csv(path, *, chunksize=None, **kwargs) -> ReadCsvResult | Generator`

`transform(df, transformer, chunks=2) -> TransformResult`

`validate(df, model) -> ValidateResult`

`to_sql(df, name, con, *, if_exists='append', chunks=2, **kwargs) -> ToSqlResult`