Skip to main content

Power up your data science workflow with ChatGPT

Project description

pandas-gpt Open In Colab

Power up your data science workflow with LLMs.


pandas-gpt is a Python library for doing almost anything with a pandas DataFrame using ChatGPT or any other Large Language Model (LLM).

Installation

pip install pandas-gpt[openai]

You may also want to install the optional openai and/or litellm dependencies.

Next, set the OPENAI_API_KEY environment variable to your OpenAI API key, or use the following code snippet:

import openai
openai.api_key = '<API Key>'

If you're looking for a free alternative to the OpenAI API, we encourage using Google Gemini for code completion:

pip install pandas-gpt[litellm]
import pandas_gpt
pandas_gpt.completer = pandas_gpt.LiteLLM('gemini/gemini-1.5-pro', api_key='...')

Examples

Setup and usage examples are available in this Google Colab notebook.

import pandas as pd
import pandas_gpt

df = pd.DataFrame('https://gist.githubusercontent.com/bluecoconut/9ce2135aafb5c6ab2dc1d60ac595646e/raw/c93c3500a1f7fae469cba716f09358cfddea6343/sales_demo_with_pii_and_all_states.csv')

# Data transformation
df = df.ask('drop purchases from Laurenchester, NY')
df = df.ask('add a new Category column with values "cheap", "regular", or "expensive"')

# Queries
weekday = df.ask('which day of the week had the largest number of orders?')
top_10 = df.ask('what are the top 10 most popular products, as a table')

# Plotting
df.ask('plot monthly and hourly sales')
top_10.ask('horizontal bar plot with pastel colors')

# Allow changes to original dataset
df.ask('do something interesting', mutable=True)

# Show source code before running
df.ask('convert prices from USD to GBP', verbose=True)

Custom Language Models

It's possible to use a different language model with the completer config option:

import pandas_gpt

# Global default
pandas_gpt.completer = pandas_gpt.OpenAI('gpt-3.5-turbo')

# Custom completer for a specific request
df.ask('Do something interesting with the data', completer=pandas_gpt.LiteLLM('gemini/gemini-1.5-pro'))

By default, API keys are picked up from environment variables such as OPENAI_API_KEY. It's also possible to specify an API key for a particular call:

df.ask('Do something important with the data', completer=pandas_gpt.OpenAI('gpt-4o', api_key='...'))

OpenAI

pandas_gpt.completer = pandas_gpt.OpenAI('gpt-4o')

LiteLLM

pandas_gpt.completer = pandas_gpt.LiteLLM('gemini/gemini-1.5-pro')

Local (Huggingface)

pandas_gpt.completer = pandas_gpt.LiteLLM('huggingface/meta-llama/Meta-Llama-3.1-8B-Instruct')

OpenRouter

pandas_gpt.completer = pandas_gpt.OpenRouter('anthropic/claude-3.5-sonnet')

Anything

def my_custom_completer(prompt: str) -> str:
  # Use an LLM or any other method to create a `process()` function that
  # takes a pandas DataFrame as a single argument, does some operations on it,
  # and return a DataFrame.
  return 'def process(df): ...'

pandas_gpt.completer = my_custom_completer

If you want to use a fully customized API host such as Azure OpenAI Service, you can globally configure the openai and pandas-gpt packages:

import openai
openai.api_type = 'azure'
openai.api_base = '<Endpoint>'
openai.api_version = '<Version>'
openai.api_key = '<API Key>'

import pandas_gpt
pandas_gpt.completer = pandas_gpt.OpenAI(
  model='gpt-3.5-turbo',
  engine='<Engine>',
  deployment_id='<Deployment ID>',
)

Alternatives

  • GitHub Copilot: General-purpose code completion (paid subscription)
  • Sketch: AI-powered data summarization and code suggestions (works without an API key)

Disclaimer

Please note that the limitations of ChatGPT also apply to this library. I would recommend using pandas-gpt in a sandboxed environment such as Google Colab, Kaggle, or GitPod.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pandas_gpt-1.0.0.tar.gz (8.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pandas_gpt-1.0.0-py3-none-any.whl (7.6 kB view details)

Uploaded Python 3

File details

Details for the file pandas_gpt-1.0.0.tar.gz.

File metadata

  • Download URL: pandas_gpt-1.0.0.tar.gz
  • Upload date:
  • Size: 8.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.10.12

File hashes

Hashes for pandas_gpt-1.0.0.tar.gz
Algorithm Hash digest
SHA256 aa6d7142e70d775d24e547a00fe6a58d3e14c82959e2934d6e5d2633e1e91275
MD5 a19e392eee14f7d4d62bec0ad0de486a
BLAKE2b-256 86356209ecfeb8555fe44c86bc5b097dac4728d904412b90c521b94263406687

See more details on using hashes here.

File details

Details for the file pandas_gpt-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: pandas_gpt-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 7.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.10.12

File hashes

Hashes for pandas_gpt-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 26f15359cf49798757f986c3061dc0a490eda7fd6af9f0f0ca4c29005261d786
MD5 f2ad41f942867047b5565d49f1d3b7c4
BLAKE2b-256 402f1782aa04fcafc462352f8d336f1fdb445061fb76fe776552da3629a48716

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page