A lightweight library for dataset insights...
Project description
InsightAI 🚀
A powerful open-source library that enables natural language conversations with your data using Large Language Models (LLMs). Transform complex data analysis into simple conversations - no coding required!
✨ Key Features
- 🗣️ Natural Language Interface: Ask questions about your data in plain English
- 🔌 Multiple Model Support: Works with OpenAI GPT models and Groq's high-speed inference
- 🧠 Smart Analysis: Automatic code generation, data cleaning, and ML model suggestions
- 🛠️ Error Recovery: Built-in debugging and error correction mechanisms
- 📊 Auto-Visualization: Generates charts and graphs automatically
- 💾 SQL Support: Native support for SQLite databases
- 📈 Report Generation: Create comprehensive analysis reports automatically
- 🔍 Data Quality Analysis: Identifies and fixes data quality issues
- ⚡ Real-time Processing: Streaming responses for immediate feedback
🚀 Quick Start
Installation
pip install insightai
Environment Setup
Set up your API keys:
# Required: OpenAI API Key
export OPENAI_API_KEY="your-openai-api-key"
# Required: Groq API Key (for faster inference)
export GROQ_API_KEY="your-groq-api-key"
Basic Usage
import pandas as pd
from insightai import InsightAI
# Load your data
df = pd.read_csv('your_data.csv')
# Initialize InsightAI
ai = InsightAI(df)
# Start asking questions!
ai.pd_agent_converse("What are the main trends in this data?")
💡 Usage Examples
1. Interactive Data Analysis
import pandas as pd
from insightai import InsightAI
# Load sales data
df = pd.read_csv('sales_data.csv')
ai = InsightAI(df)
# Interactive mode - ask multiple questions
ai.pd_agent_converse()
# Now you can ask: "Show me monthly revenue trends"
# Or: "Which product category has the highest profit margin?"
2. Single Question Analysis
# Ask a specific question
ai = InsightAI(df)
ai.pd_agent_converse("What is the correlation between price and customer rating?")
3. SQL Database Analysis
# Analyze SQLite database
ai = InsightAI(db_path='customer_database.db')
ai.pd_agent_converse("Find the top 10 customers by total purchase amount")
4. Automated Report Generation
# Generate comprehensive analysis report
ai = InsightAI(df, generate_report=True, report_questions=5)
ai.pd_agent_converse() # Generates a full report automatically
5. Data Cleaning and ML Suggestions
# Get data cleaning recommendations and ML model suggestions
ai = InsightAI(df)
ai.pd_agent_converse("Clean this dataset and suggest appropriate machine learning models")
🔧 Advanced Configuration
Constructor Parameters
InsightAI(
df=None, # pandas DataFrame
db_path=None, # Path to SQLite database
max_conversations=4, # Conversation memory length
debug=False, # Enable debug mode
exploratory=True, # Enable exploratory analysis
df_ontology=False, # Enable data ontology support
generate_report=True, # Auto-generate reports
report_questions=5 # Number of questions for reports
)
Custom Model Configuration
Create LLM_CONFIG.json in your working directory:
[
{
"agent": "Code Generator",
"details": {
"model": "gpt-4o",
"provider": "openai",
"max_tokens": 4000,
"temperature": 0
}
},
{
"agent": "Planner",
"details": {
"model": "llama-3.3-70b-versatile",
"provider": "groq",
"max_tokens": 2000,
"temperature": 0.1
}
}
]
Custom Prompts
Create PROMPT_TEMPLATES.json to customize agent behavior:
{
"planner_system": "You are a data analysis expert...",
"code_generator_system_df": "You are an AI data analyst..."
}
🎯 What You Can Ask
Data Exploration
- "What does this dataset contain?"
- "Show me the distribution of values in each column"
- "Are there any missing values or outliers?"
Statistical Analysis
- "What's the correlation between sales and marketing spend?"
- "Perform a statistical summary of the numerical columns"
- "Which factors most influence customer satisfaction?"
Visualizations
- "Create a bar chart of revenue by product category"
- "Plot the trend of monthly sales over time"
- "Show me a correlation heatmap of all numerical variables"
Data Cleaning
- "Clean this dataset and prepare it for machine learning"
- "Handle missing values and suggest the best approach"
- "Identify and fix data quality issues"
Machine Learning
- "What machine learning models would work best for this data?"
- "Prepare this data for predictive modeling"
- "Suggest features for predicting customer churn"
Business Intelligence
- "Generate a comprehensive analysis report"
- "What are the key business insights from this data?"
- "Create an executive summary of the findings"
📊 Output Examples
Automated Visualizations
InsightAI automatically saves visualizations to the visualization/ folder:
- Bar charts, line plots, scatter plots
- Correlation heatmaps
- Distribution plots
- Custom business charts
Analysis Reports
Generate professional markdown reports including:
- Executive summary
- Dataset overview
- Key findings and insights
- Recommendations
- Supporting visualizations
Code Generation
View the actual Python code generated for your analysis:
# Example generated code
import pandas as pd
import matplotlib.pyplot as plt
# Calculate monthly revenue trends
monthly_revenue = df.groupby('month')['revenue'].sum()
plt.figure(figsize=(10, 6))
plt.plot(monthly_revenue.index, monthly_revenue.values)
plt.title('Monthly Revenue Trends')
plt.savefig('visualization/monthly_revenue_trends.png')
plt.show()
🏗️ Architecture
InsightAI uses a multi-agent architecture with specialized AI agents:
- Expert Selector: Chooses the right agent for your task
- Data Analyst: Performs statistical analysis and visualizations
- SQL Analyst: Handles database queries and operations
- Data Cleaning Expert: Identifies and fixes data quality issues
- Code Generator: Creates Python code for your analysis
- Error Corrector: Debugs and fixes code issues automatically
- Report Generator: Creates comprehensive analysis reports
📈 Supported Models
OpenAI Models
- GPT-4o, GPT-4o-mini
- GPT-4 Turbo
- O1 series models
Groq Models (High-Speed Inference)
- Llama 3.3 70B
- Llama 3.1 8B
- Mixtral 8x7B
- Gemma 2 9B
📝 Logging and Cost Tracking
All interactions are automatically logged with detailed cost tracking:
{
"chain_id": "1234567890",
"agent": "Code Generator",
"model": "gpt-4o-mini",
"tokens_used": 1500,
"cost": 0.03,
"duration": "2.3s"
}
View logs in: insightai_consolidated_log.json
🔒 Security Features
- Input sanitization and validation
- Code execution sandboxing
- Blacklisted dangerous operations
- Rate limiting and error handling
🎓 Examples and Tutorials
E-commerce Analysis
# Analyze online store data
df = pd.read_csv('ecommerce_data.csv')
ai = InsightAI(df)
ai.pd_agent_converse("Which products have the highest return rate and why?")
Financial Data Analysis
# Stock market analysis
ai = InsightAI()
ai.pd_agent_converse("Download Apple stock data for 2024 and analyze the trends")
Healthcare Data
# Patient data analysis (anonymized)
df = pd.read_csv('patient_outcomes.csv')
ai = InsightAI(df)
ai.pd_agent_converse("What factors correlate with better patient outcomes?")
🛠️ Development Setup
git clone https://github.com/LeoRigasaki/InSightAI.git
cd InsightAI
pip install -e ".[dev]"
🆕 Version 0.5.0 Release Notes
✨ New Features
- Dynamic API Key Management: Only requires API keys for providers you actually use
- Flexible Provider Support: Mix and match OpenAI, Groq, and Gemini models freely
- Cost Optimization: Reduced overhead by eliminating unused API dependencies
🔧 Improvements
- Smarter LLM configuration parsing
- Better error messages for missing API keys
- Enhanced provider validation
🐛 Bug Fixes
- Fixed requirement for all API keys even when not needed
- Improved initialization error handling
🤝 Contributing
We welcome contributions! Please see our Contributing Guidelines for details.
- Fork the repository
- Create a feature branch:
git checkout -b feature-name - Commit changes:
git commit -am 'Add feature' - Push to branch:
git push origin feature-name - Submit a Pull Request
⚠️ Known Limitations
- Token limits vary by model (check your plan)
- Large datasets may require chunking
- Rate limiting depends on your API plan
- Complex visualizations may need manual adjustment
📄 License
MIT License - see LICENSE for details.
🙏 Acknowledgments
- Special thanks to pgalko for the original inspiration
- OpenAI for providing powerful language models
- Groq for high-performance inference capabilities
- The open-source community for continuous improvements
💬 Support
- 📧 Email: riorigasaki65@gmail.com
- 🐛 Issues: GitHub Issues
- 💡 Feature Requests: GitHub Discussions
Transform your data analysis workflow today with InsightAI - where natural language meets powerful analytics! 🚀
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file insightai-0.5.1.tar.gz.
File metadata
- Download URL: insightai-0.5.1.tar.gz
- Upload date:
- Size: 46.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b288f1d41d0606be8a989e5111a4c7754d199fc32b7cc34cae038025c81c9b09
|
|
| MD5 |
8ef3c65fa79d08cb252ac365c6f70d81
|
|
| BLAKE2b-256 |
28568077f7f42b0777cc0ccff2005602bba082570181ce25d92156ec5a94423b
|
File details
Details for the file insightai-0.5.1-py3-none-any.whl.
File metadata
- Download URL: insightai-0.5.1-py3-none-any.whl
- Upload date:
- Size: 46.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bb5173fc494a50abb37ec8287cc4b388375a429eb083982b1a1cdca8a379af58
|
|
| MD5 |
579c2541a0cf0be5fcfdc388dd1e9050
|
|
| BLAKE2b-256 |
d46961974676cd9c984be7a8aedbd4cba1ea60ce35c39593881c51d511127147
|