Skip to main content

Generate SQL queries from natural language

Project description

GitHub PyPI Documentation
GitHub PyPI Documentation

Vanna.AI - Personalized AI SQL Agent

https://github.com/vanna-ai/vanna/assets/7146154/1901f47a-515d-4982-af50-f12761a3b2ce

How Vanna works

Vanna works in two easy steps - train a model on your data, and then ask questions.

  1. Train a model on your data.
  2. Ask questions.

When you ask a question, we utilize a custom model for your dataset to generate SQL, as seen below. Your model performance and accuracy depends on the quality and quantity of training data you use to train your model. how-vanna-works

Getting started

You can start by automatically training Vanna (currently works for Snowflake) or add manual training data.

Install Vanna

pip install vanna

Depending on the database you're using, you can also install the associated database drivers

pip install 'vanna[snowflake]'

Import Vanna

import vanna as vn

Train with DDL Statements

If you prefer to manually train, you do not need to connect to a database. You can use the train function with other parmaeters like ddl

vn.train(ddl="""
    CREATE TABLE IF NOT EXISTS my-table (
        id INT PRIMARY KEY,
        name VARCHAR(100),
        age INT
    )
""")

Train with Documentation

Sometimes you may want to add documentation about your business terminology or definitions.

vn.train(documentation="Our business defines OTIF score as the percentage of orders that are delivered on time and in full")

Train with SQL

You can also add SQL queries to your training data. This is useful if you have some queries already laying around. You can just copy and paste those from your editor to begin generating new SQL.

vn.train(sql="SELECT * FROM my-table WHERE name = 'John Doe'")

Asking questions

vn.ask("What are the top 10 customers by sales?")
SELECT c.c_name as customer_name,
       sum(l.l_extendedprice * (1 - l.l_discount)) as total_sales
FROM   snowflake_sample_data.tpch_sf1.lineitem l join snowflake_sample_data.tpch_sf1.orders o
        ON l.l_orderkey = o.o_orderkey join snowflake_sample_data.tpch_sf1.customer c
        ON o.o_custkey = c.c_custkey
GROUP BY customer_name
ORDER BY total_sales desc limit 10;
CUSTOMER_NAME TOTAL_SALES
0 Customer#000143500 6757566.0218
1 Customer#000095257 6294115.3340
2 Customer#000087115 6184649.5176
3 Customer#000131113 6080943.8305
4 Customer#000134380 6075141.9635
5 Customer#000103834 6059770.3232
6 Customer#000069682 6057779.0348
7 Customer#000102022 6039653.6335
8 Customer#000098587 6027021.5855
9 Customer#000064660 5905659.6159

Why Vanna?

  1. High accuracy on complex datasets.
    • Vanna’s capabilities are tied to the training data you give it
    • More training data means better accuracy for large and complex datasets
  2. Secure and private.
    • Your database contents are never sent to Vanna’s servers
    • We only see the bare minimum - schemas & queries.
  3. Isolated, custom model.
    • You train a custom model specific to your database and your schema.
    • Nobody else can use your model or view your model’s training data unless you choose to add members to your model or make it public
    • We use a combination of third-party foundational models (OpenAI, Google) and our own LLM.
  4. Self learning.
    • As you use Vanna more, your model continuously improves as we augment your training data
  5. Supports many databases.
    • We have out-of-the-box support Snowflake, BigQuery, Postgres
    • You can easily make a connector for any database
  6. Pretrained models.
    • If you’re a data provider you can publish your models for anyone to use
    • As part of our roadmap, we are in the process of pre-training models for common datasets (Google Ads, Facebook ads, etc)
  7. Choose your front end.
    • Start in a Jupyter Notebook.
    • Expose to business users via Slackbot, web app, Streamlit app, or Excel plugin.
    • Even integrate in your web app for customers.

More resources

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vanna-0.0.25.tar.gz (30.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vanna-0.0.25-py3-none-any.whl (33.3 kB view details)

Uploaded Python 3

File details

Details for the file vanna-0.0.25.tar.gz.

File metadata

  • Download URL: vanna-0.0.25.tar.gz
  • Upload date:
  • Size: 30.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.18

File hashes

Hashes for vanna-0.0.25.tar.gz
Algorithm Hash digest
SHA256 58d9ba70426b312e1ece5b6f88d2943a088fdd5e28e14d07d3b8ec2bc97071a9
MD5 973886e93567a72e71ff199a0458c04c
BLAKE2b-256 31acdfb0015287864f3d2cf349e5f9d116745d5abccbb7deabfcf8aede99179f

See more details on using hashes here.

File details

Details for the file vanna-0.0.25-py3-none-any.whl.

File metadata

  • Download URL: vanna-0.0.25-py3-none-any.whl
  • Upload date:
  • Size: 33.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.18

File hashes

Hashes for vanna-0.0.25-py3-none-any.whl
Algorithm Hash digest
SHA256 2ab715638719aa0fc83878d6d8f6e0395cf2ae0b2c5ac88dd6123fc68d8cb170
MD5 b3567eb25db44c533523822abb112b03
BLAKE2b-256 ac64e7b1f5e709b5b1653b2821df08d86ecc9fe1d0de2ff63754056071d274b9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page