Skip to main content

AI Assistant for Python Data Analytics

Project description

DataWise

AI Assistant for Python Data Analytics

Capabilities Limitations
Use SQL to transform Pandas dataframes May occasionally generate incorrect results
Use English to visualize Pandas dataframes (beta)

Get your API Key

🔍 Demo

Try out DataWise in your browser:

Open in Colab

🔧 Quick install

Install DataWise client first:

pip install datawise

Configure with your account's API key. Either set it as DATAWISE_API_KEY environment variable before using the library:

export DATAWISE_API_KEY=sk-...

Or set api_key to its value:

from datawise import DataWise

dw = DataWise(api_key="you_api_key_here")

Use SQL to transform Pandas dataframes

You need to install pandas and numpy packages as pre-requisites for SQL query.

pip install pandas numpy

To transform, simply call sql function. You can use SQLite style SQL query to transform Pandas dataframes.

from datawise import DataWise
import pandas as pd

countries = pd.DataFrame({
    "country": ["United States", "United Kingdom", "France", "Germany", "Italy", "Spain", "Canada", "Australia", "Japan", "China"],
    "gdp": [19294482071552, 2891615567872, 2411255037952, 3435817336832, 1745433788416, 1181205135360, 1607402389504, 1490967855104, 4380756541440, 14631844184064],
    "happiness_index": [6.94, 7.16, 6.66, 7.07, 6.38, 6.4, 7.23, 7.22, 5.87, 5.12]
})

dw = DataWise(api_key="you_api_key_here")
df = dw.sql("SELECT COUNT(country) FROM countries", {
  "countries": countries
})
print(df)

The above code will return the following dataframe:

        count
0          10

You can also do joins of multiple dataframes:

from datawise import DataWise
import pandas as pd

countries = pd.DataFrame({
    "country": ["United States", "United Kingdom", "France", "Germany", "Italy", "Spain", "Canada", "Australia", "Japan", "China"],
    "gdp": [19294482071552, 2891615567872, 2411255037952, 3435817336832, 1745433788416, 1181205135360, 1607402389504, 1490967855104, 4380756541440, 14631844184064],
    "happiness_index": [6.94, 7.16, 6.66, 7.07, 6.38, 6.4, 7.23, 7.22, 5.87, 5.12]
})

country_populations = pd.DataFrame({
    "country": ["United States", "United Kingdom", "France", "Germany", "Italy", "Spain", "Canada", "Australia", "Japan", "China"],
    "population": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
})

dw = DataWise(api_key="you_api_key_here")
df = dw.sql("SELECT * FROM countries LEFT JOIN country_populations ON countries.country = country_populations.country", {
  "countries": countries,
  "country_populations": country_populations
})
print(df)

The above code will return the following dataframe:

          country             gdp  happiness_index  population
0   United States  19294482071552             6.94           1
1  United Kingdom   2891615567872             7.16           2
2          France   2411255037952             6.66           3
3         Germany   3435817336832             7.07           4
4           Italy   1745433788416             6.38           5
5           Spain   1181205135360             6.40           6
6          Canada   1607402389504             7.23           7
7       Australia   1490967855104             7.22           8
8           Japan   4380756541440             5.87           9
9           China  14631844184064             5.12          10

Limitations of using SQL to transform Pandas dataframes

  • May occasionally generate incorrect results
  • Ordering of rows is not strict unless ORDER BY clause is specified
  • No support for Window functions: https://www.sqlite.org/windowfunctions.html
  • If SQL query contains WHERE clause with LIKE operator, incorrect result might be generated

Use English to visualize Pandas dataframes (beta)

You can write English to describe how you want to visualize your dataframe. This feature is available as beta feature so accurate result is not guaranteed.

You need to install matplotlib and seaborn packages as pre-requisites for SQL query.

pip install matplotlib seaborn

To visualize, simply call viz function.

from datawise import DataWise

dw = DataWise(api_key="you_api_key_here")
tips = sns.load_dataset("tips")
dw.viz("Show me relationship between total bill and tip. Each day should have different colour. Title is: Total Bill vs Tip", { "tips": tips })

Printing out translated code

You can ask DataWise to print translated code to console using code=True flag.

import logging
import sys

root = logging.getLogger()
root.setLevel(logging.INFO)
handler = logging.StreamHandler(sys.stdout)
handler.setLevel(logging.INFO)
root.addHandler(handler)

...

df = dw.sql("SELECT COUNT(country) FROM countries", {
  "countries": countries
}, code=True)

Error Handling

Errors could happen if we cannot translate the SQL query. Consider the following example:

from datawise import DataWise
import pandas as pd

countries = pd.DataFrame({
    "country": ["United States", "United Kingdom", "France", "Germany", "Italy", "Spain", "Canada", "Australia", "Japan", "China"],
    "gdp": [19294482071552, 2891615567872, 2411255037952, 3435817336832, 1745433788416, 1181205135360, 1607402389504, 1490967855104, 4380756541440, 14631844184064],
    "happiness_index": [6.94, 7.16, 6.66, 7.07, 6.38, 6.4, 7.23, 7.22, 5.87, 5.12]
})

dw = DataWise(api_key="you_api_key_here")
dw.sql("SELECT bad_column FROM bad_table", {
  "countries": countries
})

The above code will give following error message:

ERROR    root:__init__.py:47 We couldn't translate your query. Here is python code we attempted to generate: 
return_df = bad_table['bad_column']

You can modify the SQL query so that it works based on the code we attempted to generate. You can also take the translated code and use it after modifying it to work.

📜 License

DataWise is licensed under the Apache 2.0 License. See the LICENSE file for more details.

🤝 Acknowledgements

  • This project is leverages pandas library by independent contributors, but it's in no way affiliated with the pandas project.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datawise-0.0.23.tar.gz (8.8 kB view details)

Uploaded Source

Built Distribution

datawise-0.0.23-py3-none-any.whl (9.6 kB view details)

Uploaded Python 3

File details

Details for the file datawise-0.0.23.tar.gz.

File metadata

  • Download URL: datawise-0.0.23.tar.gz
  • Upload date:
  • Size: 8.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for datawise-0.0.23.tar.gz
Algorithm Hash digest
SHA256 73bcc5d8f46836c2968479c1efa5c257cccce51b9142253a0f6e01a911224879
MD5 87e699cec514164504839270c94b2776
BLAKE2b-256 c198e72a628211420478718d62df6006ba69f05994e7b0b6518c46cf7d942220

See more details on using hashes here.

File details

Details for the file datawise-0.0.23-py3-none-any.whl.

File metadata

  • Download URL: datawise-0.0.23-py3-none-any.whl
  • Upload date:
  • Size: 9.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for datawise-0.0.23-py3-none-any.whl
Algorithm Hash digest
SHA256 af289f088da28ef2a57b400e37a033d2c279b0156683faf6321bb926de8c5b04
MD5 305f7f87fe80d861860ec84639bfd74c
BLAKE2b-256 cede2f3dda95e687cbdb3c9d95632cbf273089c7d78d5b7a9457449bf9f9259a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page