An agentic framework for building Data transformations from natural language
Project description
Aiden
An agentic framework for building data transformations from natural language
Installation • Quick Start • Documentation • Examples • Contributing
📋 Table of Contents
🔍 Overview
Aiden is a Python framework that enables you to build data transformations using natural language. It leverages a multi-agent AI architecture to simplify data engineering tasks, making them more accessible and efficient. With Aiden, you can describe your data transformation requirements in plain text, and the framework will generate the necessary code to implement them.
💻 Installation
Using pip or poetry
pip install aiden-ai
# or with poetry
poetry add aiden-ai
SET environment variables
The environment variables are used to configure the AI providers. We use litellm to manage the providers. You can find the list of supported providers here.
export OPENAI_API_KEY="your-openai-api-key"
# or
export ANTHROPIC_API_KEY="your-anthropic-api-key"
# or
export GEMINI_API_KEY="your-google-api-key"
# or ...
Optional Dependencies
For Dagster integration:
pip install aiden-ai[dagster]
# or with poetry
poetry add 'aiden-ai[dagster]'
Development Installation
# Clone the repository
git clone https://github.com/getaiden/aiden-ai.git
cd aiden
# Install dependencies with Poetry
poetry install
# Activate the virtual environment
source .venv/bin/activate
🚀 Quick Start
Here's a simple example to get you started with Aiden:
from aiden import Transformation
from aiden.common.dataset import Dataset
# Define input and output datasets with schemas
input_data = Dataset(
path="./data.csv",
format="csv",
schema={"email": str, "name": str, "signup_date": str}
)
output_data = Dataset(
path="./transformed_data.csv",
format="csv",
schema={"email": str, "name": str, "signup_date": str}
)
# Create a transformation with natural language intent
transformation = Transformation(
intent="Clean the 'email' column and remove invalid entries"
)
# Build and save the transformation
transformation.build(
input_datasets=[input_data],
output_dataset=output_data
)
transformation.save("./email_cleaner.py")
✨ Features
Environment Types
Aiden supports multiple execution environments:
The workdir is the directory where Aiden will store temporary files.
-
Local Environment: Will generate a python artifact that can be executed locally.
from aiden.common.environment import Environment local_env = Environment(type="local", workdir="./local_workdir/") transformation = Transformation( intent="Clean the 'email' column and remove invalid entries", environment=local_env, )
-
Dagster Environment: Will generate a python dagster artifact that can be executed in a dagster environment.
dagster_env = Environment( type="dagster", workdir="./dagster_workdir/" ) transformation = Transformation( intent="Clean the 'email' column and remove invalid entries", environment=dagster_env, )
Provider Configuration
Customize which AI models power each agent in the multi-agent system:
from aiden.common.provider import ProviderConfig
provider_config = ProviderConfig(
manager_provider="openai/gpt-4o",
data_expert_provider="openai/gpt-4o",
data_engineer_provider="openai/gpt-4o",
tool_provider="anthropic/claude-3-7-sonnet-latest",
)
transformation = Transformation(
intent="Clean the 'email' column and remove invalid entries",
)
transformation.build(
input_datasets=[input_data],
output_dataset=output_data,
provider=provider_config,
verbose=True,
)
Dataset Definitions
Explicitly define input and output datasets with schema for transformation:
from aiden.common.dataset import Dataset
dataset = Dataset(
path="./data.csv",
format="csv",
schema={"column1": str, "column2": int}
)
Save result artifact
Save transformations as standalone Python files that can be executed in various environments:
transformation.save("./artifact.py")
Testing Artifacts
Once you've saved your transformation, you can test it in the environment you built with:
-
Local Environment:
# Run the artifact directly with Python python artifact.py
-
Dagster Environment:
# Start the Dagster development server dagster dev -f artifact.py # Then execute the artifact from the Dagster UI
📊 Examples
Here's a comprehensive example showing how to clean email addresses with custom configuration:
from aiden import Transformation
from aiden.common.dataset import Dataset
from aiden.common.environment import Environment
from aiden.common.provider import ProviderConfig
# Configure AI providers for each agent
provider_config = ProviderConfig(
manager_provider="openai/gpt-4o",
data_expert_provider="openai/gpt-4o",
data_engineer_provider="openai/gpt-4o",
tool_provider="anthropic/claude-3-7-sonnet-latest",
)
# Define input and output datasets
in_dev_dataset = Dataset(
path="./emails.csv",
format="csv",
schema={"email": str},
)
out_dev_dataset = Dataset(
path="./clean_emails.csv",
format="csv",
schema={"email": str},
)
# Create local environment with custom workdir
local_env = Environment(
type="local",
workdir="./local_workdir/",
)
# Define transformation with natural language intent using local environment
tr = Transformation(
intent="clean emails column and keep only valid ones.",
environment=local_env,
)
# Build the transformation with specified datasets and providers
tr.build(
input_datasets=[in_dev_dataset],
output_dataset=out_dev_dataset,
provider=provider_config,
verbose=True,
)
# Deploy the transformation
tr.save("./artifact.py")
Check out the examples directory for more use cases.
🤝 Contributing
We welcome contributions to Aiden! Here's how you can help:
- Fork the repository
- Create a feature branch:
git checkout -b feature/amazing-feature - Make your changes
- Run tests:
poetry run pytest tests/unit - Commit your changes:
git commit -m 'Add amazing feature' - Push to the branch:
git push origin feature/amazing-feature - Open a Pull Request
👥 Community
📄 License
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file aiden_ai-0.2.0.tar.gz.
File metadata
- Download URL: aiden_ai-0.2.0.tar.gz
- Upload date:
- Size: 46.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.3 CPython/3.12.10 Linux/6.11.0-1014-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4237a31b64f8d09d41fa6116d0f142381b2e53d7c161c0a7bec7dca3dce56e9b
|
|
| MD5 |
6d44d40a31ec6cecb5c4ba5545a18749
|
|
| BLAKE2b-256 |
7a7adddee09422da6e9f54a526a2d6a18b901911c294a9055f39420ae9570aba
|
File details
Details for the file aiden_ai-0.2.0-py3-none-any.whl.
File metadata
- Download URL: aiden_ai-0.2.0-py3-none-any.whl
- Upload date:
- Size: 62.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.3 CPython/3.12.10 Linux/6.11.0-1014-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0202d892e4c5da652bf48c9474f5083d14ef7167352fa88e3f60fb855339cdcc
|
|
| MD5 |
75516632bd41d1b3fb8419eb552fbf9a
|
|
| BLAKE2b-256 |
9cc6e54f9b898f19a51cf73b2f989f10a25b9919dbd92a26838313b93ef4360a
|