Skip to main content

AI Agents for Satif

Project description

SATIF AI

License: MIT Python Version Status: Experimental

AI toolkit for transforming any input files into any output files.

⚠️ Disclaimer

EXPERIMENTAL STATUS: This package is in early development and not production-ready. The API may change significantly between versions.

BLOCKING I/O: Despite the async API, some operations may contain blocking I/O. This package should be used for testing and experimental purposes only.

Installation

pip install satif-ai

Overview

SATIF AI enables automated transformation of heterogeneous data sources (CSV, Excel, PDF, XML, etc.) into any desired output format in 2 steps:

  1. Standardization: Ingests heterogeneous source files (CSV, Excel, PDF, XML, etc.) and transforms them into SDIF, a structured intermediate format.
  2. Transformation: Applies business logic to the standardized data to generate the target output files, with transformation code generated by AI.

Key Features

  • Any Format Support: Process virtually any input, even challenging unstructured content (PDFs, complex Excel sheets)
  • AI-Powered Code Generation: Automatically generate transformation code from examples and natural language instructions
  • Robust Schema Enforcement: Handle input data drift and schema inconsistencies through configurable validation
  • SQL-Based Data Processing: Query and manipulate all data using SQL
  • Decoupled Processing Stages: Standardize once, transform many times with different logic

Usage

Basic Workflow

import asyncio
from satif_ai import astandardize, atransform

async def main():
    # Step 1: Standardize input files into SDIF
    sdif_path = await astandardize(
        datasource=["data.csv", "reference.xlsx"],
        output_path="standardized.sdif",
        overwrite=True
    )

    # Step 2: Transform SDIF into desired output using AI
    await atransform(
        sdif=sdif_path,
        output_target_files="output.json",
        instructions="Extract customer IDs and purchase totals, calculate the average purchase value per customer, and output as JSON with customer_id and avg_purchase_value fields.",
        llm_model="o4-mini"  # Choose AI model based on needs
    )

if __name__ == "__main__":
    asyncio.run(main())

Architecture

┌─────────────────┐     ┌───────────────────────┐     ┌─────────────────┐
│  Source Files   │────▶│ Standardization Layer │────▶│   SDIF File     │
│ CSV/Excel/PDF/  │     │                       │     │ (SQLite-based)  │
│ XML/JSON/etc.   │     └───────────────────────┘     └────────┬────────┘
└─────────────────┘                                            │
                                                               │
┌─────────────────┐     ┌───────────────────────┐              │
│  Output Files   │◀────│  Transformation Layer │◀─────────────┘
│ Any format      │     │  (AI-generated code)  │
└─────────────────┘     └───────────────────────┘

SDIF (Standardized Data Interoperable Format) is the intermediate SQLite-based format that:

  • Stores structured tables alongside JSON objects and binary media
  • Maintains rich metadata about data origins and relationships
  • Provides direct SQL queryability for complex transformations

Documentation

For detailed documentation, examples, and advanced features, visit SATIF Documentation.

Contributing

Contributions are welcome! Whether it's bug reports, feature requests, or code contributions, please feel free to get involved.

Contribution Workflow

  1. Fork the repository on GitHub.

  2. Clone your fork locally:

    git clone https://github.com/syncpulse-solutions/satif.git
    cd satif/libs/ai
    
  3. Create a new branch for your feature or bug fix:

    git checkout -b feature/your-feature-name
    

    or

    git checkout -b fix/your-bug-fix-name
    
  4. Set up the development environment as described in the From Source (for Development) section:

    make install  # or poetry install
    
  5. Make your changes. Ensure your code follows the project's style guidelines.

  6. Format and lint your code:

    make format
    make lint
    
  7. Run type checks:

    make typecheck
    
  8. Run tests to ensure your changes don't break existing functionality:

    make test
    

    To also generate a coverage report:

    make coverage
    
  9. Commit your changes with a clear and descriptive commit message.

  10. Push your changes to your fork on GitHub:

    git push origin feature/your-feature-name
    
  11. Submit a Pull Request (PR) to the main branch of the original syncpulse-solutions/satif repository.

License

This project is licensed under the MIT License.

Maintainer: Bryan Djafer (bryan.djafer@syncpulse.fr)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

satif_ai-0.2.12.tar.gz (40.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

satif_ai-0.2.12-py3-none-any.whl (45.8 kB view details)

Uploaded Python 3

File details

Details for the file satif_ai-0.2.12.tar.gz.

File metadata

  • Download URL: satif_ai-0.2.12.tar.gz
  • Upload date:
  • Size: 40.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for satif_ai-0.2.12.tar.gz
Algorithm Hash digest
SHA256 f0cf100e150f55a9bc357dbefd424f6e84cfe8e4b19216558e013ed5f5c6ad08
MD5 537ce465e7eb41019cc44c7327e01d3a
BLAKE2b-256 43002dd0950574208ce00551224fb6b53095bcac6937ee2095efa9aeae6cf771

See more details on using hashes here.

Provenance

The following attestation bundles were made for satif_ai-0.2.12.tar.gz:

Publisher: publish_satif_ai.yml on syncpulse-solutions/satif

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file satif_ai-0.2.12-py3-none-any.whl.

File metadata

  • Download URL: satif_ai-0.2.12-py3-none-any.whl
  • Upload date:
  • Size: 45.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for satif_ai-0.2.12-py3-none-any.whl
Algorithm Hash digest
SHA256 a97c97e9709acb0a31f5ec9f2f30b533f438b6fb904b9bb1f1bc8f0f41a8d788
MD5 dfa9239e6303d398d500e17c63575b08
BLAKE2b-256 ad07121193158fe9e2fe98dfa69e143bd67dfbab326e0bf86e0f193c797793a2

See more details on using hashes here.

Provenance

The following attestation bundles were made for satif_ai-0.2.12-py3-none-any.whl:

Publisher: publish_satif_ai.yml on syncpulse-solutions/satif

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page