Chat with your database (SQL, CSV, pandas, polars, mongodb, noSQL, etc). PandasAI makes data analysis conversational using LLMs (GPT 3.5 / 4, Anthropic, VertexAI) and RAG.
Project description
PandasAI is a Python library that makes it easy to ask questions to your data in natural language. It helps you to explore, clean, and analyze your data using generative AI.
🔧 Getting started
The documentation for PandasAI to use it with specific LLMs, vector stores and connectors, can be found here.
📦 Installation
With pip:
pip install pandasai
With poetry:
poetry add pandasai
🔍 Demo
Try out PandasAI yourself in your browser:
🚀 Deploying PandasAI
PandasAI can be deployed in a variety of ways. You can easily use it in your Jupyter notebooks or streamlit apps, or you can deploy it as a REST API such as with FastAPI or Flask.
If you are interested in managed PandasAI Cloud or self-hosted Enterprise Offering, take a look at our website or book a meeting with us.
💻 Usage
Ask questions
import os
import pandas as pd
from pandasai import Agent
# Sample DataFrame
sales_by_country = pd.DataFrame({
"country": ["United States", "United Kingdom", "France", "Germany", "Italy", "Spain", "Canada", "Australia", "Japan", "China"],
"sales": [5000, 3200, 2900, 4100, 2300, 2100, 2500, 2600, 4500, 7000]
})
# By default, unless you choose a different LLM, it will use BambooLLM.
# You can get your free API key signing up at https://pandabi.ai (you can also configure it in your .env file)
os.environ["PANDASAI_API_KEY"] = "YOUR_API_KEY"
agent = Agent(sales_by_country)
agent.chat('Which are the top 5 countries by sales?')
China, United States, Japan, Germany, Australia
Or you can ask more complex questions:
agent.chat(
"What is the total sales for the top 3 countries by sales?"
)
The total sales for the top 3 countries by sales is 16500.
Visualize charts
You can also ask PandasAI to generate charts for you:
agent.chat(
"Plot the histogram of countries showing for each the gdp, using different colors for each bar",
)
Multiple DataFrames
You can also pass in multiple dataframes to PandasAI and ask questions relating them.
import os
import pandas as pd
from pandasai import Agent
employees_data = {
'EmployeeID': [1, 2, 3, 4, 5],
'Name': ['John', 'Emma', 'Liam', 'Olivia', 'William'],
'Department': ['HR', 'Sales', 'IT', 'Marketing', 'Finance']
}
salaries_data = {
'EmployeeID': [1, 2, 3, 4, 5],
'Salary': [5000, 6000, 4500, 7000, 5500]
}
employees_df = pd.DataFrame(employees_data)
salaries_df = pd.DataFrame(salaries_data)
# By default, unless you choose a different LLM, it will use BambooLLM.
# You can get your free API key signing up at https://pandabi.ai (you can also configure it in your .env file)
os.environ["PANDASAI_API_KEY"] = "YOUR_API_KEY"
agent = Agent([employees_df, salaries_df])
agent.chat("Who gets paid the most?")
Olivia gets paid the most.
You can find more examples in the examples directory.
🔒 Privacy & Security
In order to generate the Python code to run, we take some random samples from the dataframe, we randomize it (using random generation for sensitive data and shuffling for non-sensitive data) and send just the randomized head to the LLM.
If you want to enforce further your privacy you can instantiate PandasAI with enforce_privacy = True
which will not send the head (but just column names) to the LLM.
📜 License
PandasAI is available under the MIT expat license, except for the pandasai/ee
directory (which has it's license here if applicable.
If you are interested in managed PandasAI Cloud or self-hosted Enterprise Offering, take a look at our website or book a meeting with us.
Resources
- Docs for comprehensive documentation
- Examples for example notebooks
- Discord for discussion with the community and PandasAI team
🤝 Contributing
Contributions are welcome! Please check the outstanding issues and feel free to open a pull request. For more information, please check out the contributing guidelines.
Thank you!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for pandasai-2.0.42-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6d79381bcd1d89fd03e994cc33c4b24600fcfbc90fd842134b70d44277bd5fa9 |
|
MD5 | e849139fcacc9ff2817aaa1e0f9ad9d2 |
|
BLAKE2b-256 | d75eef4efc43708226306c49e0b21a722bcff2e63ce6c0977b1cc19d5aee99bc |