Your backend for LLM powered Big Data Apps
Project description
🎵 Datatune
Perform transformations on your data with natural language using LLMs
Installation
pip install datatune
From source:
pip install -e .
Quick Start
import os
import dask.dataframe as dd
from datatune.core.map import Map
from datatune.core.filter import Filter
from datatune.llm.llm import LLM
from datatune.core.op import finalize
os.environ["OPENAI_API_KEY"] = "your-openai-api-key"
llm = LLM(model_name="gpt-35-turbo")
# Load data from your source with Dask
df = dd.read_csv("tests/test_data/products.csv")
print(df.head())
# Transform data with Map
mapped = Map(
prompt="Extract categories from the description.",
output_fields=["Category", "Subcategory"]
)(llm, df)
# Filter data based on criteria
filtered = Filter(
prompt="Keep only electronics products"
)(llm, mapped)
# Get the final dataframe after cleanup of metadata and deleted rows after operations using `finalize`.
result = finalize(filtered)
result.compute().to_csv("electronics_products.csv")
new_df = dd.read_csv("electronics_products.csv")
print(new_df.head())
products.csv
ProductID Name Price Quantity Description SKU
0 1001 Wireless Mouse 25.99 150 Ergonomic wireless mouse with 2.4GHz connectivity WM-1001
1 1002 Office Chair 89.99 75 Comfortable swivel office chair with lumbar su... OC-2002
2 1003 Coffee Mug 9.49 300 Ceramic mug, 12oz, microwave safe CM-3003
3 1004 LED Monitor 24" 149.99 60 24-inch Full HD LED monitor with HDMI and VGA ... LM-2404
4 1005 Notebook Pack 6.99 500 Pack of 3 ruled notebooks, 100 pages each NP-5005
electronics_products.csv
Unnamed: 0 ProductID Name ... SKU Category Subcategory
0 0 1001 Wireless Mouse ... WM-1001 Electronics Computer Accessories
1 3 1004 LED Monitor 24" ... LM-2404 Electronics Monitors
2 6 1007 USB-C Cable 1m ... UC-7007 Electronics Cables
3 8 1009 Bluetooth Speaker ... BS-9009 Electronics Audio
Features
Map Operation
Transform data with natural language:
customers = dd.read_csv("customers.csv")
mapped = Map(
prompt="Extract country and city from the address field",
output_fields=["country", "city"]
)(llm, customers)
Filter operation
# Filter to marketable products only
marketable = Filter(
prompt="Keep only customers who are from Asia"
)(llm, mapped)
Multiple LLM Support
Datatune works with various LLM providers:
# Using Ollama
from datatune.llm.llm import Ollama
llm = Ollama()
# Using Azure OpenAI
from datatune.llm.llm import Azure
llm = Azure(
model_name="gpt-35-turbo",
api_key=api_key,
api_base=api_base,
api_version=api_version)
More examples in the examples/ folder.
License
MIT License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
datatune-0.0.1.tar.gz
(18.8 kB
view details)
Built Distribution
datatune-0.0.1-py3-none-any.whl
(11.5 kB
view details)
File details
Details for the file datatune-0.0.1.tar.gz
.
File metadata
- Download URL: datatune-0.0.1.tar.gz
- Upload date:
- Size: 18.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
cfb41b99504f56153e326fbe2784c6dacbccb55069e6c551975a92fcdcefef69
|
|
MD5 |
0a912cae3d15298e7d7059ece669fdf2
|
|
BLAKE2b-256 |
6cb99193f8b677fa6ca19e77e44c021544eb3f3b3f169a7c08b64c22ed9b9f7b
|
File details
Details for the file datatune-0.0.1-py3-none-any.whl
.
File metadata
- Download URL: datatune-0.0.1-py3-none-any.whl
- Upload date:
- Size: 11.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
8d919e1929e5d8d7c1c65d848e84a6089ad4f039b89d1b25698eab5b2a81d617
|
|
MD5 |
5f7965ae7f5c3310cfd12425cab6140a
|
|
BLAKE2b-256 |
269a23fc3d20114bfdf5afc72586b7540693b11586ddfe18b5425273e836490c
|