Skip to main content

LLM-Driven Data Transformations

Project description

Datu Core

LLM-Driven Data Transformations.

GitHub commit activity GitHub open issues License GitHub open pull requests PyPI version Python versions

Documentation

LLM-Driven Data Transformations

Datu is an AI-powered analyst agent that lets you model, visualize, analyze, and act on your data in minutes, all in plain English without technical expertise required. You can connect Datu Analyst to a variety of tools or MCP servers to perform tasks typically done by data analysts or data scientists. AI Analyst can do:

  • Connect to your data platform

  • Identify data quality issues

  • Identify and model data based on user request

  • Visualise and analyse data to understand "why" behind KPIs

Installation

Ensure you have installed Python 3.11+.

# Create and activate virtual environment
python -m venv .venv
source .venv/bin/activate  # On Windows use: .venv\Scripts\activate

#Install datu core
pip install 'datu-core[all]'

Running the application

# To run application type 
datu

Connect to datasource

As per the current design the application will fetch all the schema that is listed in the profiles.yml. It is to avoid fetching the schema every single time.But it will only work on the target datasource that is selected.

Structure of profiles.yml

datu_demo:
  target: dev-postgres # Target is used to select the datasource that is currently active. Change this if you would like to use a different datasource.
  outputs:
    dev-postgres:
      type: postgres
      host: "{{ env_var('DB_HOST', 'localhost') }}"  # if a environment variable is supplied that gets priority. This is useful for not hardcoding.
      port: 5432
      user: postgres
      password: postgres
      dbname: my_sap_bronz

Configurable parameters

Please checkout datu documentation

Features

  • Dynamic Schema Discovery & Caching:
    Automatically introspects the target database schema and caches the discovered metadata.

  • LLM Integration for SQL Generation:
    Uses OpenAI's API (e.g., GPT-4o-mini) to generate SQL queries that transform raw (Silver) data into a Gold layer format. The system prompt includes a concise summary of the schema to help the LLM generate valid queries.

  • Transformation Preview:
    The generated SQL is previewed by executing a sample query (with a LIMIT) and displaying the result in a formatted HTML table.

  • Persistent View Creation:
    Users can review the transformation preview and then create a view in the Gold layer. This view automatically reflects updates from the underlying Bronze data.

  • CSV Download:
    Users can download the full result of the transformation as a CSV file.

  • User-Friendly Chat Interface:
    The frontend features a ChatGPT-like interface with persistent conversation state, syntax highlighting for code blocks, and copy-to-clipboard functionality.

  • CSV Upload: Upload data as CSV files, in addition to or instead of connecting to a database.

  • Visualizations: Create bar, line, area, scatter, pie, or KPI visualizations to explore your data.

  • Data Catalog:
    View automatically generated business definitions for your fields.

  • Dashboards: Build dashboards with multiple KPIs to share insights with stakeholders.

Documentation

For detailed guidance & examples, explore our documentation:

Contributing ❤️

We welcome contributions! See our Contributing Guide for details on:

  • Reporting bugs & features
  • Development setup
  • Contributing via Pull Requests
  • Code of Conduct
  • Reporting of security issues

Ready to scale?

If you are looking for Datu SaaS then Talk to us

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datu_core-0.0.3.tar.gz (7.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

datu_core-0.0.3-py3-none-any.whl (7.4 MB view details)

Uploaded Python 3

File details

Details for the file datu_core-0.0.3.tar.gz.

File metadata

  • Download URL: datu_core-0.0.3.tar.gz
  • Upload date:
  • Size: 7.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for datu_core-0.0.3.tar.gz
Algorithm Hash digest
SHA256 1d517a7c8243fcc3a2d2497665bb44530691d565662ab04356f27facf4b1b2bd
MD5 a8868158bc49f12850799ca5fa6494f8
BLAKE2b-256 e8aee2bfc582dbb7d64cc0f85ee0a66d438a12a040ce50b634f1da08ab857484

See more details on using hashes here.

Provenance

The following attestation bundles were made for datu_core-0.0.3.tar.gz:

Publisher: build.yaml on Datuanalytics/datu-core

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file datu_core-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: datu_core-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 7.4 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for datu_core-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 8a776717f105bc22e8d8c30c96959a7b745226df327576155b90ea1f46b616ab
MD5 4ff508b584c0a5dfdbbb1b894e28c098
BLAKE2b-256 9ac0b3865325219ba681f24800822e00c1b3382f60d0761585bd4375b955e12a

See more details on using hashes here.

Provenance

The following attestation bundles were made for datu_core-0.0.3-py3-none-any.whl:

Publisher: build.yaml on Datuanalytics/datu-core

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page