AI Agents for Satif
Project description
SATIF AI
AI toolkit for transforming any input files into any output files.
⚠️ Disclaimer
EXPERIMENTAL STATUS: This package is in early development and not production-ready. The API may change significantly between versions.
BLOCKING I/O: Despite the async API, some operations may contain blocking I/O. This package should be used for testing and experimental purposes only.
Installation
pip install satif-ai
Overview
SATIF AI enables automated transformation of heterogeneous data sources (CSV, Excel, PDF, XML, etc.) into any desired output format in 2 steps:
- Standardization: Ingests heterogeneous source files (CSV, Excel, PDF, XML, etc.) and transforms them into SDIF, a structured intermediate format.
- Transformation: Applies business logic to the standardized data to generate the target output files, with transformation code generated by AI.
Key Features
- Any Format Support: Process virtually any input, even challenging unstructured content (PDFs, complex Excel sheets)
- AI-Powered Code Generation: Automatically generate transformation code from examples and natural language instructions
- Robust Schema Enforcement: Handle input data drift and schema inconsistencies through configurable validation
- SQL-Based Data Processing: Query and manipulate all data using SQL
- Decoupled Processing Stages: Standardize once, transform many times with different logic
Usage
Basic Workflow
import asyncio
from satif_ai import astandardize, atransform
async def main():
# Step 1: Standardize input files into SDIF
sdif_path = await astandardize(
datasource=["data.csv", "reference.xlsx"],
output_path="standardized.sdif",
overwrite=True
)
# Step 2: Transform SDIF into desired output using AI
await atransform(
sdif=sdif_path,
output_target_files="output.json",
instructions="Extract customer IDs and purchase totals, calculate the average purchase value per customer, and output as JSON with customer_id and avg_purchase_value fields.",
llm_model="o4-mini" # Choose AI model based on needs
)
if __name__ == "__main__":
asyncio.run(main())
Architecture
┌─────────────────┐ ┌───────────────────────┐ ┌─────────────────┐
│ Source Files │────▶│ Standardization Layer │────▶│ SDIF File │
│ CSV/Excel/PDF/ │ │ │ │ (SQLite-based) │
│ XML/JSON/etc. │ └───────────────────────┘ └────────┬────────┘
└─────────────────┘ │
│
┌─────────────────┐ ┌───────────────────────┐ │
│ Output Files │◀────│ Transformation Layer │◀─────────────┘
│ Any format │ │ (AI-generated code) │
└─────────────────┘ └───────────────────────┘
SDIF (Standardized Data Interoperable Format) is the intermediate SQLite-based format that:
- Stores structured tables alongside JSON objects and binary media
- Maintains rich metadata about data origins and relationships
- Provides direct SQL queryability for complex transformations
Documentation
For detailed documentation, examples, and advanced features, visit SATIF Documentation.
Contributing
Contributions are welcome! Whether it's bug reports, feature requests, or code contributions, please feel free to get involved.
Contribution Workflow
-
Fork the repository on GitHub.
-
Clone your fork locally:
git clone https://github.com/syncpulse-solutions/satif.git cd satif/libs/ai
-
Create a new branch for your feature or bug fix:
git checkout -b feature/your-feature-name
or
git checkout -b fix/your-bug-fix-name
-
Set up the development environment as described in the From Source (for Development) section:
make install # or poetry install
-
Make your changes. Ensure your code follows the project's style guidelines.
-
Format and lint your code:
make format make lint
-
Run type checks:
make typecheck -
Run tests to ensure your changes don't break existing functionality:
make test
To also generate a coverage report:
make coverage -
Commit your changes with a clear and descriptive commit message.
-
Push your changes to your fork on GitHub:
git push origin feature/your-feature-name
-
Submit a Pull Request (PR) to the
mainbranch of the originalsyncpulse-solutions/satifrepository.
License
This project is licensed under the MIT License.
Maintainer: Bryan Djafer (bryan.djafer@syncpulse.fr)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file satif_ai-0.2.12.tar.gz.
File metadata
- Download URL: satif_ai-0.2.12.tar.gz
- Upload date:
- Size: 40.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f0cf100e150f55a9bc357dbefd424f6e84cfe8e4b19216558e013ed5f5c6ad08
|
|
| MD5 |
537ce465e7eb41019cc44c7327e01d3a
|
|
| BLAKE2b-256 |
43002dd0950574208ce00551224fb6b53095bcac6937ee2095efa9aeae6cf771
|
Provenance
The following attestation bundles were made for satif_ai-0.2.12.tar.gz:
Publisher:
publish_satif_ai.yml on syncpulse-solutions/satif
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
satif_ai-0.2.12.tar.gz -
Subject digest:
f0cf100e150f55a9bc357dbefd424f6e84cfe8e4b19216558e013ed5f5c6ad08 - Sigstore transparency entry: 218267195
- Sigstore integration time:
-
Permalink:
syncpulse-solutions/satif@faa6491813af9f830416b1710636463a9835bb6d -
Branch / Tag:
refs/tags/satif-ai/v0.2.12 - Owner: https://github.com/syncpulse-solutions
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish_satif_ai.yml@faa6491813af9f830416b1710636463a9835bb6d -
Trigger Event:
push
-
Statement type:
File details
Details for the file satif_ai-0.2.12-py3-none-any.whl.
File metadata
- Download URL: satif_ai-0.2.12-py3-none-any.whl
- Upload date:
- Size: 45.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a97c97e9709acb0a31f5ec9f2f30b533f438b6fb904b9bb1f1bc8f0f41a8d788
|
|
| MD5 |
dfa9239e6303d398d500e17c63575b08
|
|
| BLAKE2b-256 |
ad07121193158fe9e2fe98dfa69e143bd67dfbab326e0bf86e0f193c797793a2
|
Provenance
The following attestation bundles were made for satif_ai-0.2.12-py3-none-any.whl:
Publisher:
publish_satif_ai.yml on syncpulse-solutions/satif
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
satif_ai-0.2.12-py3-none-any.whl -
Subject digest:
a97c97e9709acb0a31f5ec9f2f30b533f438b6fb904b9bb1f1bc8f0f41a8d788 - Sigstore transparency entry: 218267208
- Sigstore integration time:
-
Permalink:
syncpulse-solutions/satif@faa6491813af9f830416b1710636463a9835bb6d -
Branch / Tag:
refs/tags/satif-ai/v0.2.12 - Owner: https://github.com/syncpulse-solutions
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish_satif_ai.yml@faa6491813af9f830416b1710636463a9835bb6d -
Trigger Event:
push
-
Statement type: