AI-powered data engineering and analytics assistant for querying CSV data using natural language—locally and intelligently
Project description
AskPandas: AI-Powered Data Engineering & Analytics Assistant
AskPandas is an open-source Python library that lets you query and transform CSV data using natural language, powered by free, local open-source LLMs via Ollama. No API keys, no cloud, no cost.
🚀 Quick Start (5 minutes!)
1. Install AskPandas
pip install askpandas
2. Install Ollama (one command)
# macOS/Linux
curl -fsSL https://ollama.com/install.sh | sh
# Windows: Download from https://ollama.com/download
3. Pull a lightweight model
ollama pull phi3:mini # Very small, very fast
4. Start Ollama
ollama serve
5. Start analyzing data!
import askpandas as ap
# Set up AI
llm = ap.OllamaLLM(model_name="phi3:mini")
ap.set_llm(llm)
# Load your data
df = ap.DataFrame("your_data.csv")
# Ask questions in plain English!
result = df.chat("What is the total revenue?")
print(result)
🚀 See It In Action!
import askpandas as ap
import pandas as pd
# Create sample data
data = {
'product': ['Apple', 'Banana', 'Orange'],
'price': [2.50, 1.00, 1.50],
'quantity': [100, 200, 150]
}
df = pd.DataFrame(data)
df['revenue'] = df['price'] * df['quantity']
# Create AskPandas DataFrame
sales_df = ap.DataFrame(df)
# AI-powered analysis
result = sales_df.chat("What is the total revenue?")
# Output: Total Revenue: $675.00
# More complex queries
result = sales_df.chat("Show me the top 3 products by revenue")
# Output: Product analysis with rankings
result = sales_df.chat("Calculate average price by product")
# Output: Average Price: $1.67
🎯 What Can You Do? (Everything!)
📊 Data Analysis - Just Ask!
# Basic questions
df.chat("What is the average price?")
# Output: Average Price: $1.67
df.chat("Show me the top 5 customers by revenue")
# Output: Customer rankings with revenue amounts
df.chat("How many sales were made in each region?")
# Output: Regional sales breakdown
# Complex analysis
df.chat("""
Analyze our sales performance:
1. Calculate total revenue by month
2. Show the trend over time
3. Identify the best performing products
4. Create a visualization
""")
# Output: Comprehensive analysis with insights
🎨 Beautiful Visualizations - Automatically!
# Charts are created automatically
df.chat("Create a bar chart of sales by region")
df.chat("Plot revenue trends over time")
df.chat("Show correlation between price and quantity")
df.chat("Display distribution of customer ages")
🔍 Data Quality & Cleaning
# Automatic data assessment
df.chat("Check for missing values and duplicates")
df.chat("Identify outliers in numeric columns")
df.chat("Clean column names and standardize formats")
df.chat("Validate data types and suggest improvements")
🌐 Multi-Dataset Analysis
# Work with multiple files
customers = ap.DataFrame("customers.csv")
orders = ap.DataFrame("orders.csv")
products = ap.DataFrame("products.csv")
# Cross-dataset insights
ap.chat("""
Customer analysis:
1. Join customers with their orders
2. Calculate lifetime value by segment
3. Show purchase patterns
4. Identify high-value customers
""", customers, orders, products)
💡 Real-World Examples
📈 Sales Analysis
import askpandas as ap
# Load sales data
sales = ap.DataFrame("sales_data.csv")
# Comprehensive sales report
sales.chat("What is our total revenue?")
# Output: Total Revenue: $78,586.11
sales.chat("Show me the top 3 products by revenue")
# Output: Product rankings with revenue amounts
sales.chat("Calculate average order value by region")
# Output: Regional performance metrics
sales.chat("How many sales were made in each region?")
# Output: Regional sales breakdown
👥 Customer Analytics
# Customer behavior analysis
customers = ap.DataFrame("customers.csv")
transactions = ap.DataFrame("transactions.csv")
ap.chat("""
Customer behavior insights:
1. Customer lifetime value analysis
2. Purchase frequency patterns
3. Churn prediction factors
4. Customer satisfaction metrics
5. Personalized marketing recommendations
""", customers, transactions)
📊 Financial Analysis
# Financial data processing
financial = ap.DataFrame("financial_data.csv")
financial.chat("""
Financial performance review:
1. Profit and loss analysis
2. Cash flow trends
3. Expense categorization
4. Budget vs actual comparison
5. Financial ratios and KPIs
6. Risk assessment and recommendations
""")
🔬 Research & Academic
# Research data analysis
research = ap.DataFrame("research_data.csv")
research.chat("""
Statistical analysis:
1. Descriptive statistics for all variables
2. Correlation analysis between key factors
3. Hypothesis testing results
4. Outlier detection and treatment
5. Data distribution visualizations
6. Statistical significance testing
""")
🛠️ Advanced Features
🔧 Custom Configuration
import askpandas as ap
# Set your preferences
ap.set_config(
verbose=True, # See what's happening
plot_style="seaborn", # Beautiful charts
output_dir="my_analysis", # Save results here
max_execution_time=120, # Allow longer analysis
enable_history=True # Track all queries
)
🎨 Custom Visualizations
# Create custom charts
from askpandas.visualization.charts import create_bar_chart, save_plot
# Custom bar chart
fig = create_bar_chart(
df.df,
x_col="category",
y_col="value",
title="My Custom Chart",
figsize=(12, 8)
)
# Save with high quality
save_plot(fig, "custom_chart.png", dpi=300)
🔍 Query Intelligence
# Get help with your queries
query = "Show me sales trends"
analysis = ap.analyze_query(query)
print(f"Query type: {analysis['primary_category']}")
# Get suggestions
suggestions = ap.get_query_examples('visualization')
print("Try these:", suggestions[:3])
# Validate your query
validation = ap.validate_query(query, df.columns)
if validation['is_valid']:
print("✅ Query is valid!")
🚀 Performance Tips for Best Results
💪 Optimize Your Queries
# ✅ Good - Specific and clear
df.chat("Calculate total revenue by month for 2024, excluding returns")
# ❌ Avoid - Too vague
df.chat("Analyze this data")
# ✅ Good - Step-by-step analysis
df.chat("""
1. Filter data for Q4 2024
2. Group by product category
3. Calculate sum of revenue
4. Sort by revenue descending
5. Show top 10 results
""")
# ✅ Good - Include context
df.chat("Show customer retention rate, considering customers who made purchases in both 2023 and 2024")
🎯 Choose the Right Model
# For speed and basic analysis
llm = ap.OllamaLLM(model_name="phi3:mini") # Fastest
# For better quality and complex queries
llm = ap.OllamaLLM(model_name="mistral:7b") # Balanced
# For best results (slower)
llm = ap.OllamaLLM(model_name="llama3.2:13b") # Highest quality
📊 Data Preparation Tips
# Clean your data first
df = ap.DataFrame("messy_data.csv")
# Ask AskPandas to help clean it
df.chat("""
Help me clean this data:
1. Identify and handle missing values
2. Remove duplicates
3. Fix data type issues
4. Standardize column names
5. Show me what was cleaned
""")
# Then analyze the clean data
df.chat("Now analyze the cleaned data for insights")
🔧 Installation Options
Basic Installation
pip install askpandas
Full Installation (Recommended)
pip install "askpandas[full]"
Development Installation
git clone https://github.com/irfanalidv/AskPandas
cd AskPandas
pip install -e ".[dev]"
📱 Platform Support
- ✅ macOS - Native support with Apple Silicon optimization
- ✅ Linux - Full compatibility with all distributions
- ✅ Windows - Complete support with WSL2 recommended
- ✅ Cloud - Works on Google Colab, AWS, Azure, etc.
🆘 Troubleshooting
Common Issues & Solutions
"No LLM configured" error?
# Make sure Ollama is running
ollama serve
# Check if model is downloaded
ollama list
Slow responses?
# Try a smaller model
ollama pull phi3:mini
# Close other applications to free memory
Installation issues?
# Update pip
pip install --upgrade pip
# Install with specific Python version
python3.9 -m pip install askpandas
📚 Learning Resources
Interactive Examples
# Run the interactive demo
python simple_demo.py
# Try the configuration setup
python simple_config.py
Sample Datasets
fake_sample.csv- Small sample for testingcomprehensive_sample.csv- Larger dataset for practice- Create your own CSV files and start analyzing!
🎉 Success Stories
Data Scientists
"AskPandas reduced my data exploration time from hours to minutes. I can now focus on insights instead of coding."
Business Analysts
"I can analyze complex datasets without learning Python syntax. Natural language queries are a game-changer!"
Researchers
"Perfect for exploratory data analysis. I can quickly test hypotheses and generate visualizations for papers."
Students
"Learning data analysis has never been easier. AskPandas makes complex concepts accessible."
🚀 What's Next?
Version 0.2.0 (Coming Soon)
- Jupyter notebook integration
- More visualization options (Plotly, Bokeh)
- SQL query generation
- Data pipeline automation
Version 1.0.0 (Future)
- Enterprise features
- Advanced ML integration
- Real-time data streaming
- Community plugins
🤝 Get Help & Contribute
- 📖 Documentation: GitHub Wiki
- 🐛 Bug Reports: GitHub Issues
- 💬 Discussions: GitHub Discussions
- ⭐ Star: GitHub Repository
📄 License
MIT License - Use freely for personal and commercial projects!
🙏 Acknowledgments
- Ollama Team - Making local AI accessible
- HuggingFace - Open-source AI models
- Pandas Community - Amazing data tools
- Open Source Contributors - Building the future together
🎯 Complete Working Demonstration
Want to see everything in action? Run our comprehensive demo:
# Clone the repository
git clone https://github.com/irfanalidv/AskPandas.git
cd AskPandas
# Run the complete demonstration
python final_working_demo.py
This demo showcases:
- ✅ DataFrame Creation & Analysis - Real data processing
- ✅ Data Quality & Cleaning - Automatic column standardization
- ✅ AI-Powered Queries - Natural language analysis
- ✅ Multi-Dataset Analysis - Joining and complex queries
- ✅ Configuration Management - Customizable settings
- ✅ Query Intelligence - Automatic query categorization
🚀 Quick Test
import askpandas as ap
import pandas as pd
# Create test data
data = {'name': ['Alice', 'Bob'], 'age': [25, 30], 'salary': [50000, 60000]}
df = pd.DataFrame(data)
ap_df = ap.DataFrame(df)
# Test basic methods
print(f"Shape: {ap_df.shape()}") # Output: Shape: (2, 3)
print(f"Columns: {ap_df.columns()}") # Output: Columns: ['name', 'age', 'salary']
# Get comprehensive info
print(ap_df.info()) # Output: Detailed DataFrame information
# Statistical description
print(ap_df.describe()) # Output: Statistical summary
🚀 Ready to transform your data analysis? Install AskPandas today!
pip install askpandas
Made with ❤️ by Md Irfan Ali
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file askpandas-0.1.1.tar.gz.
File metadata
- Download URL: askpandas-0.1.1.tar.gz
- Upload date:
- Size: 54.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.12.1.2 readme-renderer/44.0 requests/2.32.4 requests-toolbelt/1.0.0 urllib3/2.5.0 tqdm/4.67.1 importlib-metadata/8.5.0 keyring/25.6.0 rfc3986/2.0.0 colorama/0.4.6 CPython/3.12.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9703d3fcb852faec2687ab7a1befa34bca2de5fcd507b31f09c755bbda21b961
|
|
| MD5 |
52e8d8a0653233e36bb3a353ec7ffa14
|
|
| BLAKE2b-256 |
6743dbe230f54f55fc48f929e96f56df8a28677d0ffd6eb28ec2c5860af86a6e
|
File details
Details for the file askpandas-0.1.1-py3-none-any.whl.
File metadata
- Download URL: askpandas-0.1.1-py3-none-any.whl
- Upload date:
- Size: 51.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.12.1.2 readme-renderer/44.0 requests/2.32.4 requests-toolbelt/1.0.0 urllib3/2.5.0 tqdm/4.67.1 importlib-metadata/8.5.0 keyring/25.6.0 rfc3986/2.0.0 colorama/0.4.6 CPython/3.12.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e77f2e7c9bd763bcf8d13e7acc3cda2952550495c4e4f6a3f0538de322dbb1ea
|
|
| MD5 |
753bc7b0627dda6836d84fb994ac0a25
|
|
| BLAKE2b-256 |
f83e572d27fa76cde079fb52d5fb7168967ee0b534cf50ffa478e16e678a98af
|