Skip to main content

Generate SQL queries from natural language

Project description

GitHub PyPI Colab Documentation
GitHub PyPI Colab Documentation

Vanna.AI

Vanna is a Python-based AI SQL co-pilot. Our initial users are data-savvy data analysts, data scientists, engineers, and similar people that use Vanna to automate writing complex SQL.

Vanna can:

Natural Language to SQL

sql = vn.generate_sql(question='Who are the top 10 customers?')

Output:

SELECT customer_name,
       total_sales
FROM   (SELECT c.c_name as customer_name,
               sum(l.l_extendedprice * (1 - l.l_discount)) as total_sales,
               row_number() OVER (ORDER BY sum(l.l_extendedprice * (1 - l.l_discount)) desc) as rank
        FROM   snowflake_sample_data.tpch_sf1.lineitem l join snowflake_sample_data.tpch_sf1.orders o
                ON l.l_orderkey = o.o_orderkey join snowflake_sample_data.tpch_sf1.customer c
                ON o.o_custkey = c.c_custkey
        GROUP BY customer_name)
WHERE  rank <= 10;

Run SQL

This function is provided as a convenience. You can choose to run your SQL however you normally do and use the rest of the downstream functions.

df = vn.get_results(cs, database, sql)

Output:

customer_name total_sales
Customer#000000001 68127.72
Customer#000000002 65898.69
...

Generate Plotly Code

plotly_code = vn.generate_plotly_code(question=my_question, sql=sql, df=df)

Output:

fig = go.Figure(go.Bar(
    x=df['CUSTOMER_NAME'],
    y=df['TOTAL_SALES'],
    marker={'color': df['TOTAL_SALES'], 'colorscale': 'Viridis'},
    text=df['TOTAL_SALES'],
    textposition='auto',
))

fig.update_layout(
    title="Top 10 Customers by Sales",
    xaxis_title="Customer",
    yaxis_title="Total Sales",
    xaxis_tickangle=-45,
    yaxis_tickprefix="$",
)

Run Plotly Code

fig = vn.get_plotly_figure(plotly_code=plotly_code, df=df)
fig.show()

Output:

Top 10 Customers by Sales

Improve Your Training Data

vn.store_sql(
    question=my_question,
    sql=sql,
)

How Vanna Works

flowchart LR
    DB[(Known Correct Question-SQL)]
    Try[Try to Use DDL/Documentation]
    SQL(SQL)
    Check{Is the SQL correct?}
    Generate[fa:fa-circle-question Use Examples to Generate]
    DB --> Find
    Question[fa:fa-circle-question Question] --> Find{fa:fa-magnifying-glass Do we have similar questions?}
    Find -- Yes --> Generate
    Find -- No --> Try
    Generate --> SQL
    Try --> SQL
    SQL --> Check
    Check -- Yes --> DB
    Check -- No --> Analyst[fa:fa-glasses Analyst Writes the SQL]
    Analyst -- Adds --> DB

Getting Started

Install Vanna from PyPI and import it:

%pip install vanna
import vanna as vn

Enter your email to set an API Key

This will send a one-time code to your email address. Copy and paste the code into the prompt.

my_email = '' # Enter your email here
vn.login(email=my_email)

Add Training Data

vn.train(
    question="Which products have the highest sales?",
    sql="...",
)

Generate SQL

sql = vn.generate_sql(question="Who are the top 10 customers?")

Documentation

Full Documentation

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vanna-0.0.8.tar.gz (9.9 kB view hashes)

Uploaded Source

Built Distribution

vanna-0.0.8-py3-none-any.whl (8.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page