Skip to main content

Your backend for LLM powered Big Data Apps

Project description

🎵 Datatune

PyPI version Python Versions License

Perform transformations on your data with natural language using LLMs

Installation

pip install datatune

From source:

pip install -e .

Quick Start

import os
import dask.dataframe as dd

from datatune.core.map import Map
from datatune.core.filter import Filter
from datatune.llm.llm import LLM
from datatune.core.op import finalize

os.environ["OPENAI_API_KEY"] = "your-openai-api-key"
llm = LLM(model_name="gpt-35-turbo")

# Load data from your source with Dask
df = dd.read_csv("tests/test_data/products.csv")
print(df.head())

# Transform data with Map
mapped = Map(
    prompt="Extract categories from the description.",
    output_fields=["Category", "Subcategory"]
)(llm, df)

# Filter data based on criteria
filtered = Filter(
    prompt="Keep only electronics products"
)(llm, mapped)

# Get the final dataframe after cleanup of metadata and deleted rows after operations using `finalize`.
result = finalize(filtered)
result.compute().to_csv("electronics_products.csv")

new_df = dd.read_csv("electronics_products.csv")
print(new_df.head())

products.csv

   ProductID             Name   Price  Quantity                                        Description      SKU
0       1001   Wireless Mouse   25.99       150  Ergonomic wireless mouse with 2.4GHz connectivity  WM-1001
1       1002     Office Chair   89.99        75  Comfortable swivel office chair with lumbar su...  OC-2002
2       1003       Coffee Mug    9.49       300                  Ceramic mug, 12oz, microwave safe  CM-3003
3       1004  LED Monitor 24"  149.99        60  24-inch Full HD LED monitor with HDMI and VGA ...  LM-2404
4       1005    Notebook Pack    6.99       500          Pack of 3 ruled notebooks, 100 pages each  NP-5005

electronics_products.csv

   Unnamed: 0  ProductID               Name  ...      SKU     Category           Subcategory
0           0       1001     Wireless Mouse  ...  WM-1001  Electronics  Computer Accessories
1           3       1004    LED Monitor 24"  ...  LM-2404  Electronics              Monitors
2           6       1007     USB-C Cable 1m  ...  UC-7007  Electronics                Cables
3           8       1009  Bluetooth Speaker  ...  BS-9009  Electronics                 Audio

Features

Map Operation

Transform data with natural language:

customers = dd.read_csv("customers.csv")
mapped = Map(
    prompt="Extract country and city from the address field",
    output_fields=["country", "city"]
)(llm, customers)

Filter operation

# Filter to marketable products only
marketable = Filter(
    prompt="Keep only customers who are from Asia"
)(llm, mapped)

Multiple LLM Support

Datatune works with various LLM providers:

# Using Ollama
from datatune.llm.llm import Ollama
llm = Ollama()

# Using Azure OpenAI
from datatune.llm.llm import Azure
llm = Azure(
    model_name="gpt-35-turbo",
    api_key=api_key,
    api_base=api_base,
    api_version=api_version)

More examples in the examples/ folder.

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datatune-0.0.1.tar.gz (18.8 kB view details)

Uploaded Source

Built Distribution

datatune-0.0.1-py3-none-any.whl (11.5 kB view details)

Uploaded Python 3

File details

Details for the file datatune-0.0.1.tar.gz.

File metadata

  • Download URL: datatune-0.0.1.tar.gz
  • Upload date:
  • Size: 18.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.2

File hashes

Hashes for datatune-0.0.1.tar.gz
Algorithm Hash digest
SHA256 cfb41b99504f56153e326fbe2784c6dacbccb55069e6c551975a92fcdcefef69
MD5 0a912cae3d15298e7d7059ece669fdf2
BLAKE2b-256 6cb99193f8b677fa6ca19e77e44c021544eb3f3b3f169a7c08b64c22ed9b9f7b

See more details on using hashes here.

File details

Details for the file datatune-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: datatune-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 11.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.2

File hashes

Hashes for datatune-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 8d919e1929e5d8d7c1c65d848e84a6089ad4f039b89d1b25698eab5b2a81d617
MD5 5f7965ae7f5c3310cfd12425cab6140a
BLAKE2b-256 269a23fc3d20114bfdf5afc72586b7540693b11586ddfe18b5425273e836490c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page