Upgrade you Pandas ETL process.
Project description
Abraxos
Abraxos is a lightweight Python toolkit for robust, row-aware data processing using Pandas and Pydantic. It helps you:
- Read and clean messy CSVs
- Transform data with fault-tolerant functions
- Validate rows using Pydantic models
- Load data into SQL databases with graceful error recovery
🚀 Features
-
📄 CSV Ingestion with Bad Line Recovery
Read CSVs in full or in chunks, and recover malformed lines separately. -
🔁 Transform DataFrames Resiliently
Apply transformation functions and isolate rows that fail. -
🧪 Pydantic-Based Row Validation
Validate each row using a Pydantic model, separating valid and invalid records. -
🛢️ SQL Insertion with Error Splitting
Insert DataFrames into SQL databases with automatic retry and chunking logic.
📦 Installation
pip install abraxos
Abraxo requires Python 3.8+ and depends on: - pandas - numpy - optionally sqlalchemy for SQL I/O - your own pydantic models for validation
🧭 Usage Examples
🔍 Read CSVs with Error Recovery
from abraxos import read_csv
bad_lines, df = read_csv("data.csv")
print("Bad lines:", bad_lines)
print("Clean data:", df.head())
Example Output
Bad lines: [['', 'oops', 'bad', 'row']]
Clean data:
id name age
0 1 Joe 28
1 2 Alice 35
2 3 Marcus 40
🧼 Transform DataFrames with Fault Isolation
from abraxos import transform
def clean_data(df):
df["name"] = df["name"].str.strip().str.lower()
return df
result = transform(df, clean_data)
print("Errors:", result.errors)
print("Success:", result.success_df)
Example Output
Errors: []
Success:
id name age
0 1 joe 28
1 2 alice 35
2 3 marcus 40
✅ Validate Rows Using Pydantic
from abraxos import validate
from pydantic import BaseModel
class Person(BaseModel):
name: str
age: int
result = validate(df, Person())
print("Valid rows:", result.success_df)
print("Validation errors:", result.errors)
Example Output
Valid rows:
name age
0 Joe 28
Validation errors:
[
ValidationError: 1 validation error for Person
age
value is not a valid integer (type=type_error.integer),
ValidationError: 1 validation error for Person
name
none is not an allowed value (type=type_error.none.not_allowed)
]
🗃️ Insert Into SQL With Retry Logic
from abraxos import to_sql
from sqlalchemy import create_engine
engine = create_engine("sqlite:///example.db")
result = to_sql(df, "people", engine)
print("Successful inserts:", result.success_df.shape[0])
print("Failed rows:", result.errored_df)
Example Output
Successful inserts: 2
Failed rows:
name age
2 None 40
🧪 Test Coverage
Abraxo's internal structure is modular and testable. You can run tests via:
pytest tests/
📄 License
MIT License © 2024 Odos Matthews
🧙♂️ Author
Crafted by Odos Matthews to bring some magic to data workflows.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file abraxos-0.0.6-py3-none-any.whl.
File metadata
- Download URL: abraxos-0.0.6-py3-none-any.whl
- Upload date:
- Size: 10.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d35b332846000896eaf91498b099697a2233d41656c4211a70fbe8953a5ded45
|
|
| MD5 |
e12696f5df45f65ef70b90b8097e5914
|
|
| BLAKE2b-256 |
91ff452aa15d28bfb81b9f2449a4b25c42a0fc2fb05f25feea7ab26d007c1573
|