AI Assistant for Python Data Analytics
Project description
DataWise
AI Assistant for Python Data Analytics
Capabilities | Limitations |
---|---|
Use SQL to transform Pandas dataframes | May occasionally generate incorrect results |
Use English to visualize Pandas dataframes (beta) |
🔍 Demo
Try out DataWise in your browser:
🔧 Quick install
Install DataWise client first:
pip install datawise
Configure with your account's API key.
Either set it as DATAWISE_API_KEY
environment variable before using the library:
export DATAWISE_API_KEY=sk-...
Or set api_key
to its value:
from datawise import DataWise
dw = DataWise(api_key="you_api_key_here")
Use SQL to transform Pandas dataframes
You need to install pandas
and numpy
packages as pre-requisites for SQL query.
pip install pandas numpy
To transform, simply call sql
function. You can use SQLite style SQL query to transform Pandas dataframes.
from datawise import DataWise
import pandas as pd
countries = pd.DataFrame({
"country": ["United States", "United Kingdom", "France", "Germany", "Italy", "Spain", "Canada", "Australia", "Japan", "China"],
"gdp": [19294482071552, 2891615567872, 2411255037952, 3435817336832, 1745433788416, 1181205135360, 1607402389504, 1490967855104, 4380756541440, 14631844184064],
"happiness_index": [6.94, 7.16, 6.66, 7.07, 6.38, 6.4, 7.23, 7.22, 5.87, 5.12]
})
dw = DataWise(api_key="you_api_key_here")
df = dw.sql("SELECT COUNT(country) FROM countries", {
"countries": countries
})
print(df)
The above code will return the following dataframe:
count
0 10
You can also do joins of multiple dataframes:
from datawise import DataWise
import pandas as pd
countries = pd.DataFrame({
"country": ["United States", "United Kingdom", "France", "Germany", "Italy", "Spain", "Canada", "Australia", "Japan", "China"],
"gdp": [19294482071552, 2891615567872, 2411255037952, 3435817336832, 1745433788416, 1181205135360, 1607402389504, 1490967855104, 4380756541440, 14631844184064],
"happiness_index": [6.94, 7.16, 6.66, 7.07, 6.38, 6.4, 7.23, 7.22, 5.87, 5.12]
})
country_populations = pd.DataFrame({
"country": ["United States", "United Kingdom", "France", "Germany", "Italy", "Spain", "Canada", "Australia", "Japan", "China"],
"population": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
})
dw = DataWise(api_key="you_api_key_here")
df = dw.sql("SELECT * FROM countries LEFT JOIN country_populations ON countries.country = country_populations.country", {
"countries": countries,
"country_populations": country_populations
})
print(df)
The above code will return the following dataframe:
country gdp happiness_index population
0 United States 19294482071552 6.94 1
1 United Kingdom 2891615567872 7.16 2
2 France 2411255037952 6.66 3
3 Germany 3435817336832 7.07 4
4 Italy 1745433788416 6.38 5
5 Spain 1181205135360 6.40 6
6 Canada 1607402389504 7.23 7
7 Australia 1490967855104 7.22 8
8 Japan 4380756541440 5.87 9
9 China 14631844184064 5.12 10
Limitations of using SQL to transform Pandas dataframes
- May occasionally generate incorrect results
- Ordering of rows is not strict unless ORDER BY clause is specified
- No support for Window functions: https://www.sqlite.org/windowfunctions.html
- If SQL query contains WHERE clause with
LIKE
operator, incorrect result might be generated
Use English to visualize Pandas dataframes (beta)
You can write English to describe how you want to visualize your dataframe. This feature is available as beta feature so accurate result is not guaranteed.
You need to install matplotlib
and seaborn
packages as pre-requisites for SQL query.
pip install matplotlib seaborn
To visualize, simply call viz
function.
from datawise import DataWise
dw = DataWise(api_key="you_api_key_here")
tips = sns.load_dataset("tips")
dw.viz("Show me relationship between total bill and tip. Each day should have different colour. Title is: Total Bill vs Tip", { "tips": tips })
Printing out translated code
You can ask DataWise to print translated code to console using code=True
flag.
import logging
import sys
root = logging.getLogger()
root.setLevel(logging.INFO)
handler = logging.StreamHandler(sys.stdout)
handler.setLevel(logging.INFO)
root.addHandler(handler)
...
df = dw.sql("SELECT COUNT(country) FROM countries", {
"countries": countries
}, code=True)
Error Handling
Errors could happen if we cannot translate the SQL query. Consider the following example:
from datawise import DataWise
import pandas as pd
countries = pd.DataFrame({
"country": ["United States", "United Kingdom", "France", "Germany", "Italy", "Spain", "Canada", "Australia", "Japan", "China"],
"gdp": [19294482071552, 2891615567872, 2411255037952, 3435817336832, 1745433788416, 1181205135360, 1607402389504, 1490967855104, 4380756541440, 14631844184064],
"happiness_index": [6.94, 7.16, 6.66, 7.07, 6.38, 6.4, 7.23, 7.22, 5.87, 5.12]
})
dw = DataWise(api_key="you_api_key_here")
dw.sql("SELECT bad_column FROM bad_table", {
"countries": countries
})
The above code will give following error message:
ERROR root:__init__.py:47 We couldn't translate your query. Here is python code we attempted to generate:
return_df = bad_table['bad_column']
You can modify the SQL query so that it works based on the code we attempted to generate. You can also take the translated code and use it after modifying it to work.
📜 License
DataWise is licensed under the Apache 2.0 License. See the LICENSE file for more details.
🤝 Acknowledgements
- This project is leverages pandas library by independent contributors, but it's in no way affiliated with the pandas project.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file datawise-0.0.23.tar.gz
.
File metadata
- Download URL: datawise-0.0.23.tar.gz
- Upload date:
- Size: 8.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 73bcc5d8f46836c2968479c1efa5c257cccce51b9142253a0f6e01a911224879 |
|
MD5 | 87e699cec514164504839270c94b2776 |
|
BLAKE2b-256 | c198e72a628211420478718d62df6006ba69f05994e7b0b6518c46cf7d942220 |
File details
Details for the file datawise-0.0.23-py3-none-any.whl
.
File metadata
- Download URL: datawise-0.0.23-py3-none-any.whl
- Upload date:
- Size: 9.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | af289f088da28ef2a57b400e37a033d2c279b0156683faf6321bb926de8c5b04 |
|
MD5 | 305f7f87fe80d861860ec84639bfd74c |
|
BLAKE2b-256 | cede2f3dda95e687cbdb3c9d95632cbf273089c7d78d5b7a9457449bf9f9259a |