Skip to main content

PandasAI is a Python library that integrates generative artificial intelligence capabilities into Pandas, making dataframes conversational.

Project description

PandasAI 🐼

Release CI CD Coverage Documentation Status Discord Downloads License: MIT Open in Colab

PandasAI is a Python library that adds Generative AI capabilities to pandas, the popular data analysis and manipulation tool. It is designed to be used in conjunction with pandas, and is not a replacement for it.

PandasAI

🔧 Quick install

pip install pandasai

🔍 Demo

Try out PandasAI in your browser:

Open in Colab

📖 Documentation

The documentation for PandasAI can be found here.

💻 Usage

Disclaimer: GDP data was collected from this source, published by World Development Indicators - World Bank (2022.05.26) and collected at National accounts data - World Bank / OECD. It relates to the year of 2020. Happiness indexes were extracted from the World Happiness Report. Another useful link.

PandasAI is designed to be used in conjunction with pandas. It makes pandas conversational, allowing you to ask questions to your data in natural language.

Queries

For example, you can ask PandasAI to find all the rows in a DataFrame where the value of a column is greater than 5, and it will return a DataFrame containing only those rows:

import pandas as pd
from pandasai import SmartDataframe

# Sample DataFrame
df = pd.DataFrame({
    "country": ["United States", "United Kingdom", "France", "Germany", "Italy", "Spain", "Canada", "Australia", "Japan", "China"],
    "gdp": [19294482071552, 2891615567872, 2411255037952, 3435817336832, 1745433788416, 1181205135360, 1607402389504, 1490967855104, 4380756541440, 14631844184064],
    "happiness_index": [6.94, 7.16, 6.66, 7.07, 6.38, 6.4, 7.23, 7.22, 5.87, 5.12]
})

# Instantiate a LLM
from pandasai.llm import OpenAI
llm = OpenAI(api_token="YOUR_API_TOKEN")

df = SmartDataframe(df, config={"llm": llm})
df.chat('Which are the 5 happiest countries?')

The above code will return the following:

6            Canada
7         Australia
1    United Kingdom
3           Germany
0     United States
Name: country, dtype: object

Of course, you can also ask PandasAI to perform more complex queries. For example, you can ask PandasAI to find the sum of the GDPs of the 2 unhappiest countries:

df.chat('What is the sum of the GDPs of the 2 unhappiest countries?')

The above code will return the following:

19012600725504

Charts

You can also ask PandasAI to draw a graph:

df.chat(
    "Plot the histogram of countries showing for each the gdp, using different colors for each bar",
)

Chart

You can save any charts generated by PandasAI by setting the save_charts parameter to True in the PandasAI constructor. For example, PandasAI(llm, save_charts=True). Charts are saved in ./pandasai/exports/charts .

Multiple DataFrames

Additionally, you can also pass in multiple dataframes to PandasAI and ask questions relating them.

import pandas as pd
from pandasai import SmartDatalake
from pandasai.llm import OpenAI

employees_data = {
    'EmployeeID': [1, 2, 3, 4, 5],
    'Name': ['John', 'Emma', 'Liam', 'Olivia', 'William'],
    'Department': ['HR', 'Sales', 'IT', 'Marketing', 'Finance']
}

salaries_data = {
    'EmployeeID': [1, 2, 3, 4, 5],
    'Salary': [5000, 6000, 4500, 7000, 5500]
}

employees_df = pd.DataFrame(employees_data)
salaries_df = pd.DataFrame(salaries_data)


llm = OpenAI()
dl = SmartDatalake([employees_df, salaries_df], config={"llm": llm})
dl.chat("Who gets paid the most?")

The above code will return the following:

Oh, Olivia gets paid the most.

You can find more examples in the examples directory.

⚡️ Shortcuts

PandasAI also provides a number of shortcuts (beta) to make it easier to ask questions to your data. For example, you can ask PandasAI to clean_data, impute_missing_values, generate_features, plot_histogram, and many many more.

# Clean data
df.clean_data()

# Impute missing values
df.impute_missing_values()

# Generate features
df.generate_features()

# Plot histogram
df.plot_histogram(column="gdp")

Learn more about the shortcuts here.

🔒 Privacy & Security

In order to generate the Python code to run, we take the dataframe head, we randomize it (using random generation for sensitive data and shuffling for non-sensitive data) and send just the head.

Also, if you want to enforce further your privacy you can instantiate PandasAI with enforce_privacy = True which will not send the head (but just column names) to the LLM.

🤝 Contributing

Contributions are welcome! Please check out the todos below, and feel free to open a pull request. For more information, please see the contributing guidelines.

After installing the virtual environment, please remember to install pre-commit to be compliant with our standards:

pre-commit install

Contributors

Contributors

📜 License

PandasAI is licensed under the MIT License. See the LICENSE file for more details.

Acknowledgements

  • This project is based on the pandas library by independent contributors, but it's in no way affiliated with the pandas project.
  • This project is meant to be used as a tool for data exploration and analysis, and it's not meant to be used for production purposes. Please use it responsibly.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pandasai-1.5.8.tar.gz (76.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pandasai-1.5.8-py3-none-any.whl (177.2 kB view details)

Uploaded Python 3

File details

Details for the file pandasai-1.5.8.tar.gz.

File metadata

  • Download URL: pandasai-1.5.8.tar.gz
  • Upload date:
  • Size: 76.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.7.1 CPython/3.11.0 Linux/6.2.0-1018-azure

File hashes

Hashes for pandasai-1.5.8.tar.gz
Algorithm Hash digest
SHA256 d7fa71dad9cb5b2dc996c52a011b848cf5df9bfe1b14ec9defde75c3bb189efe
MD5 4922563aea1ea8ef5b5442ae2a2803e6
BLAKE2b-256 6c7c964da77c0856680f7cdfdeb093f9062e6c36675d5d18e3fb4841c6080edc

See more details on using hashes here.

File details

Details for the file pandasai-1.5.8-py3-none-any.whl.

File metadata

  • Download URL: pandasai-1.5.8-py3-none-any.whl
  • Upload date:
  • Size: 177.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.7.1 CPython/3.11.0 Linux/6.2.0-1018-azure

File hashes

Hashes for pandasai-1.5.8-py3-none-any.whl
Algorithm Hash digest
SHA256 8166e9c8d928cd65b5ee110e0c238e6526c599e95a9716c9bfc467ca767af02c
MD5 d073ef5e5f406ec884a682efcd14c2b6
BLAKE2b-256 b8617ec53f4fc3c29c58051ae3432f3400e8b319b6a45a188d5be56b8328c924

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page