Skip to main content

AI-powered data engineering and analytics assistant for querying CSV data using natural language—locally and intelligently

Project description

AskPandas: AI-Powered Data Engineering & Analytics Assistant

Python 3.8+ License: MIT PyPI version

AskPandas is an open-source Python library that lets you query and transform CSV data using natural language, powered by free, local open-source LLMs via Ollama. No API keys, no cloud, no cost.

🚀 Quick Start (5 minutes!)

1. Install AskPandas

pip install askpandas

2. Install Ollama (one command)

# macOS/Linux
curl -fsSL https://ollama.com/install.sh | sh

# Windows: Download from https://ollama.com/download

3. Pull a lightweight model

ollama pull phi3:mini    # Very small, very fast

4. Start Ollama

ollama serve

5. Start analyzing data!

import askpandas as ap

# Set up AI
llm = ap.OllamaLLM(model_name="phi3:mini")
ap.set_llm(llm)

# Load your data
df = ap.DataFrame("your_data.csv")

# Ask questions in plain English!
result = df.chat("What is the total revenue?")
print(result)

🚀 See It In Action!

import askpandas as ap
import pandas as pd

# Create sample data
data = {
    'product': ['Apple', 'Banana', 'Orange'],
    'price': [2.50, 1.00, 1.50],
    'quantity': [100, 200, 150]
}
df = pd.DataFrame(data)
df['revenue'] = df['price'] * df['quantity']

# Create AskPandas DataFrame
sales_df = ap.DataFrame(df)

# AI-powered analysis
result = sales_df.chat("What is the total revenue?")
# Output: Total Revenue: $675.00

# More complex queries
result = sales_df.chat("Show me the top 3 products by revenue")
# Output: Product analysis with rankings

result = sales_df.chat("Calculate average price by product")
# Output: Average Price: $1.67

🎯 What Can You Do? (Everything!)

📊 Data Analysis - Just Ask!

# Basic questions
df.chat("What is the average price?")
# Output: Average Price: $1.67

df.chat("Show me the top 5 customers by revenue")
# Output: Customer rankings with revenue amounts

df.chat("How many sales were made in each region?")
# Output: Regional sales breakdown

# Complex analysis
df.chat("""
    Analyze our sales performance:
    1. Calculate total revenue by month
    2. Show the trend over time
    3. Identify the best performing products
    4. Create a visualization
""")
# Output: Comprehensive analysis with insights

🎨 Beautiful Visualizations - Automatically!

# Charts are created automatically
df.chat("Create a bar chart of sales by region")
df.chat("Plot revenue trends over time")
df.chat("Show correlation between price and quantity")
df.chat("Display distribution of customer ages")

🔍 Data Quality & Cleaning

# Automatic data assessment
df.chat("Check for missing values and duplicates")
df.chat("Identify outliers in numeric columns")
df.chat("Clean column names and standardize formats")
df.chat("Validate data types and suggest improvements")

🌐 Multi-Dataset Analysis

# Work with multiple files
customers = ap.DataFrame("customers.csv")
orders = ap.DataFrame("orders.csv")
products = ap.DataFrame("products.csv")

# Cross-dataset insights
ap.chat("""
    Customer analysis:
    1. Join customers with their orders
    2. Calculate lifetime value by segment
    3. Show purchase patterns
    4. Identify high-value customers
""", customers, orders, products)

💡 Real-World Examples

📈 Sales Analysis

import askpandas as ap

# Load sales data
sales = ap.DataFrame("sales_data.csv")

# Comprehensive sales report
sales.chat("What is our total revenue?")
# Output: Total Revenue: $78,586.11

sales.chat("Show me the top 3 products by revenue")
# Output: Product rankings with revenue amounts

sales.chat("Calculate average order value by region")
# Output: Regional performance metrics

sales.chat("How many sales were made in each region?")
# Output: Regional sales breakdown

👥 Customer Analytics

# Customer behavior analysis
customers = ap.DataFrame("customers.csv")
transactions = ap.DataFrame("transactions.csv")

ap.chat("""
    Customer behavior insights:
    1. Customer lifetime value analysis
    2. Purchase frequency patterns
    3. Churn prediction factors
    4. Customer satisfaction metrics
    5. Personalized marketing recommendations
""", customers, transactions)

📊 Financial Analysis

# Financial data processing
financial = ap.DataFrame("financial_data.csv")

financial.chat("""
    Financial performance review:
    1. Profit and loss analysis
    2. Cash flow trends
    3. Expense categorization
    4. Budget vs actual comparison
    5. Financial ratios and KPIs
    6. Risk assessment and recommendations
""")

🔬 Research & Academic

# Research data analysis
research = ap.DataFrame("research_data.csv")

research.chat("""
    Statistical analysis:
    1. Descriptive statistics for all variables
    2. Correlation analysis between key factors
    3. Hypothesis testing results
    4. Outlier detection and treatment
    5. Data distribution visualizations
    6. Statistical significance testing
""")

🛠️ Advanced Features

🔧 Custom Configuration

import askpandas as ap

# Set your preferences
ap.set_config(
    verbose=True,                    # See what's happening
    plot_style="seaborn",           # Beautiful charts
    output_dir="my_analysis",       # Save results here
    max_execution_time=120,         # Allow longer analysis
    enable_history=True             # Track all queries
)

🎨 Custom Visualizations

# Create custom charts
from askpandas.visualization.charts import create_bar_chart, save_plot

# Custom bar chart
fig = create_bar_chart(
    df.df,
    x_col="category",
    y_col="value",
    title="My Custom Chart",
    figsize=(12, 8)
)

# Save with high quality
save_plot(fig, "custom_chart.png", dpi=300)

🔍 Query Intelligence

# Get help with your queries
query = "Show me sales trends"
analysis = ap.analyze_query(query)
print(f"Query type: {analysis['primary_category']}")

# Get suggestions
suggestions = ap.get_query_examples('visualization')
print("Try these:", suggestions[:3])

# Validate your query
validation = ap.validate_query(query, df.columns)
if validation['is_valid']:
    print("✅ Query is valid!")

🚀 Performance Tips for Best Results

💪 Optimize Your Queries

# ✅ Good - Specific and clear
df.chat("Calculate total revenue by month for 2024, excluding returns")

# ❌ Avoid - Too vague
df.chat("Analyze this data")

# ✅ Good - Step-by-step analysis
df.chat("""
    1. Filter data for Q4 2024
    2. Group by product category
    3. Calculate sum of revenue
    4. Sort by revenue descending
    5. Show top 10 results
""")

# ✅ Good - Include context
df.chat("Show customer retention rate, considering customers who made purchases in both 2023 and 2024")

🎯 Choose the Right Model

# For speed and basic analysis
llm = ap.OllamaLLM(model_name="phi3:mini")      # Fastest

# For better quality and complex queries
llm = ap.OllamaLLM(model_name="mistral:7b")     # Balanced

# For best results (slower)
llm = ap.OllamaLLM(model_name="llama3.2:13b")   # Highest quality

📊 Data Preparation Tips

# Clean your data first
df = ap.DataFrame("messy_data.csv")

# Ask AskPandas to help clean it
df.chat("""
    Help me clean this data:
    1. Identify and handle missing values
    2. Remove duplicates
    3. Fix data type issues
    4. Standardize column names
    5. Show me what was cleaned
""")

# Then analyze the clean data
df.chat("Now analyze the cleaned data for insights")

🔧 Installation Options

Basic Installation

pip install askpandas

Full Installation (Recommended)

pip install "askpandas[full]"

Development Installation

git clone https://github.com/irfanalidv/AskPandas
cd AskPandas
pip install -e ".[dev]"

📱 Platform Support

  • macOS - Native support with Apple Silicon optimization
  • Linux - Full compatibility with all distributions
  • Windows - Complete support with WSL2 recommended
  • Cloud - Works on Google Colab, AWS, Azure, etc.

🆘 Troubleshooting

Common Issues & Solutions

"No LLM configured" error?

# Make sure Ollama is running
ollama serve

# Check if model is downloaded
ollama list

Slow responses?

# Try a smaller model
ollama pull phi3:mini

# Close other applications to free memory

Installation issues?

# Update pip
pip install --upgrade pip

# Install with specific Python version
python3.9 -m pip install askpandas

📚 Learning Resources

Interactive Examples

# Run the interactive demo
python simple_demo.py

# Try the configuration setup
python simple_config.py

Sample Datasets

  • fake_sample.csv - Small sample for testing
  • comprehensive_sample.csv - Larger dataset for practice
  • Create your own CSV files and start analyzing!

🎉 Success Stories

Data Scientists

"AskPandas reduced my data exploration time from hours to minutes. I can now focus on insights instead of coding."

Business Analysts

"I can analyze complex datasets without learning Python syntax. Natural language queries are a game-changer!"

Researchers

"Perfect for exploratory data analysis. I can quickly test hypotheses and generate visualizations for papers."

Students

"Learning data analysis has never been easier. AskPandas makes complex concepts accessible."

🚀 What's Next?

Version 0.2.0 (Coming Soon)

  • Jupyter notebook integration
  • More visualization options (Plotly, Bokeh)
  • SQL query generation
  • Data pipeline automation

Version 1.0.0 (Future)

  • Enterprise features
  • Advanced ML integration
  • Real-time data streaming
  • Community plugins

🤝 Get Help & Contribute

📄 License

MIT License - Use freely for personal and commercial projects!

🙏 Acknowledgments

  • Ollama Team - Making local AI accessible
  • HuggingFace - Open-source AI models
  • Pandas Community - Amazing data tools
  • Open Source Contributors - Building the future together

🎯 Complete Working Demonstration

Want to see everything in action? Run our comprehensive demo:

# Clone the repository
git clone https://github.com/irfanalidv/AskPandas.git
cd AskPandas

# Run the complete demonstration
python final_working_demo.py

This demo showcases:

  • DataFrame Creation & Analysis - Real data processing
  • Data Quality & Cleaning - Automatic column standardization
  • AI-Powered Queries - Natural language analysis
  • Multi-Dataset Analysis - Joining and complex queries
  • Configuration Management - Customizable settings
  • Query Intelligence - Automatic query categorization

🚀 Quick Test

import askpandas as ap
import pandas as pd

# Create test data
data = {'name': ['Alice', 'Bob'], 'age': [25, 30], 'salary': [50000, 60000]}
df = pd.DataFrame(data)
ap_df = ap.DataFrame(df)

# Test basic methods
print(f"Shape: {ap_df.shape()}")  # Output: Shape: (2, 3)
print(f"Columns: {ap_df.columns()}")  # Output: Columns: ['name', 'age', 'salary']

# Get comprehensive info
print(ap_df.info())  # Output: Detailed DataFrame information

# Statistical description
print(ap_df.describe())  # Output: Statistical summary

🚀 Ready to transform your data analysis? Install AskPandas today!

pip install askpandas

Made with ❤️ by Md Irfan Ali

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

askpandas-0.1.0.tar.gz (53.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

askpandas-0.1.0-py3-none-any.whl (51.0 kB view details)

Uploaded Python 3

File details

Details for the file askpandas-0.1.0.tar.gz.

File metadata

  • Download URL: askpandas-0.1.0.tar.gz
  • Upload date:
  • Size: 53.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.12.1.2 readme-renderer/44.0 requests/2.32.4 requests-toolbelt/1.0.0 urllib3/2.5.0 tqdm/4.67.1 importlib-metadata/8.5.0 keyring/25.6.0 rfc3986/2.0.0 colorama/0.4.6 CPython/3.12.1

File hashes

Hashes for askpandas-0.1.0.tar.gz
Algorithm Hash digest
SHA256 d0bed7449031ea1138f240005bdb5f28141a22edeb890b5e0abc68bddfc8166c
MD5 cfa97b6b682e5cd23fbdc8e13637ce2e
BLAKE2b-256 a15f1b5f25d6b9896433fab236b4ed6a606064e0e314e09468729dc63ac3518b

See more details on using hashes here.

File details

Details for the file askpandas-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: askpandas-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 51.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.12.1.2 readme-renderer/44.0 requests/2.32.4 requests-toolbelt/1.0.0 urllib3/2.5.0 tqdm/4.67.1 importlib-metadata/8.5.0 keyring/25.6.0 rfc3986/2.0.0 colorama/0.4.6 CPython/3.12.1

File hashes

Hashes for askpandas-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 704f7914f924782e9247f924a4279aabe403a522a37d7ec6bbf0be52d697e6ce
MD5 146056b61a0b8667b7e14a1035ddb1bf
BLAKE2b-256 eb38c1a664105e534472449f198cbe603f345d7591f8b2709412fbac0e176e08

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page