Skip to main content

A lightweight library for dataset insights...

Project description

InsightAI 🚀

PyPI version Python 3.8+ License: MIT

A powerful open-source library that enables natural language conversations with your data using Large Language Models (LLMs). Transform complex data analysis into simple conversations - no coding required!

✨ Key Features

  • 🗣️ Natural Language Interface: Ask questions about your data in plain English
  • 🔌 Multiple Model Support: Works with OpenAI GPT models and Groq's high-speed inference
  • 🧠 Smart Analysis: Automatic code generation, data cleaning, and ML model suggestions
  • 🛠️ Error Recovery: Built-in debugging and error correction mechanisms
  • 📊 Auto-Visualization: Generates charts and graphs automatically
  • 💾 SQL Support: Native support for SQLite databases
  • 📈 Report Generation: Create comprehensive analysis reports automatically
  • 🔍 Data Quality Analysis: Identifies and fixes data quality issues
  • ⚡ Real-time Processing: Streaming responses for immediate feedback

🚀 Quick Start

Installation

pip install insightai

Environment Setup

Set up your API keys:

# Required: OpenAI API Key
export OPENAI_API_KEY="your-openai-api-key"

# Required: Groq API Key (for faster inference)
export GROQ_API_KEY="your-groq-api-key"

Basic Usage

import pandas as pd
from insightai import InsightAI

# Load your data
df = pd.read_csv('your_data.csv')

# Initialize InsightAI
ai = InsightAI(df)

# Start asking questions!
ai.pd_agent_converse("What are the main trends in this data?")

💡 Usage Examples

1. Interactive Data Analysis

import pandas as pd
from insightai import InsightAI

# Load sales data
df = pd.read_csv('sales_data.csv')
ai = InsightAI(df)

# Interactive mode - ask multiple questions
ai.pd_agent_converse()
# Now you can ask: "Show me monthly revenue trends"
# Or: "Which product category has the highest profit margin?"

2. Single Question Analysis

# Ask a specific question
ai = InsightAI(df)
ai.pd_agent_converse("What is the correlation between price and customer rating?")

3. SQL Database Analysis

# Analyze SQLite database
ai = InsightAI(db_path='customer_database.db')
ai.pd_agent_converse("Find the top 10 customers by total purchase amount")

4. Automated Report Generation

# Generate comprehensive analysis report
ai = InsightAI(df, generate_report=True, report_questions=5)
ai.pd_agent_converse()  # Generates a full report automatically

5. Data Cleaning and ML Suggestions

# Get data cleaning recommendations and ML model suggestions
ai = InsightAI(df)
ai.pd_agent_converse("Clean this dataset and suggest appropriate machine learning models")

🔧 Advanced Configuration

Constructor Parameters

InsightAI(
    df=None,                    # pandas DataFrame
    db_path=None,              # Path to SQLite database
    max_conversations=4,        # Conversation memory length
    debug=False,               # Enable debug mode
    exploratory=True,          # Enable exploratory analysis
    df_ontology=False,         # Enable data ontology support
    generate_report=True,      # Auto-generate reports
    report_questions=5         # Number of questions for reports
)

Custom Model Configuration

Create LLM_CONFIG.json in your working directory:

[
  {
    "agent": "Code Generator",
    "details": {
      "model": "gpt-4o",
      "provider": "openai",
      "max_tokens": 4000,
      "temperature": 0
    }
  },
  {
    "agent": "Planner",
    "details": {
      "model": "llama-3.3-70b-versatile",
      "provider": "groq",
      "max_tokens": 2000,
      "temperature": 0.1
    }
  }
]

Custom Prompts

Create PROMPT_TEMPLATES.json to customize agent behavior:

{
  "planner_system": "You are a data analysis expert...",
  "code_generator_system_df": "You are an AI data analyst..."
}

🎯 What You Can Ask

Data Exploration

  • "What does this dataset contain?"
  • "Show me the distribution of values in each column"
  • "Are there any missing values or outliers?"

Statistical Analysis

  • "What's the correlation between sales and marketing spend?"
  • "Perform a statistical summary of the numerical columns"
  • "Which factors most influence customer satisfaction?"

Visualizations

  • "Create a bar chart of revenue by product category"
  • "Plot the trend of monthly sales over time"
  • "Show me a correlation heatmap of all numerical variables"

Data Cleaning

  • "Clean this dataset and prepare it for machine learning"
  • "Handle missing values and suggest the best approach"
  • "Identify and fix data quality issues"

Machine Learning

  • "What machine learning models would work best for this data?"
  • "Prepare this data for predictive modeling"
  • "Suggest features for predicting customer churn"

Business Intelligence

  • "Generate a comprehensive analysis report"
  • "What are the key business insights from this data?"
  • "Create an executive summary of the findings"

📊 Output Examples

Automated Visualizations

InsightAI automatically saves visualizations to the visualization/ folder:

  • Bar charts, line plots, scatter plots
  • Correlation heatmaps
  • Distribution plots
  • Custom business charts

Analysis Reports

Generate professional markdown reports including:

  • Executive summary
  • Dataset overview
  • Key findings and insights
  • Recommendations
  • Supporting visualizations

Code Generation

View the actual Python code generated for your analysis:

# Example generated code
import pandas as pd
import matplotlib.pyplot as plt

# Calculate monthly revenue trends
monthly_revenue = df.groupby('month')['revenue'].sum()
plt.figure(figsize=(10, 6))
plt.plot(monthly_revenue.index, monthly_revenue.values)
plt.title('Monthly Revenue Trends')
plt.savefig('visualization/monthly_revenue_trends.png')
plt.show()

🏗️ Architecture

InsightAI uses a multi-agent architecture with specialized AI agents:

  • Expert Selector: Chooses the right agent for your task
  • Data Analyst: Performs statistical analysis and visualizations
  • SQL Analyst: Handles database queries and operations
  • Data Cleaning Expert: Identifies and fixes data quality issues
  • Code Generator: Creates Python code for your analysis
  • Error Corrector: Debugs and fixes code issues automatically
  • Report Generator: Creates comprehensive analysis reports

📈 Supported Models

OpenAI Models

  • GPT-4o, GPT-4o-mini
  • GPT-4 Turbo
  • O1 series models

Groq Models (High-Speed Inference)

  • Llama 3.3 70B
  • Llama 3.1 8B
  • Mixtral 8x7B
  • Gemma 2 9B

📝 Logging and Cost Tracking

All interactions are automatically logged with detailed cost tracking:

{
  "chain_id": "1234567890",
  "agent": "Code Generator",
  "model": "gpt-4o-mini",
  "tokens_used": 1500,
  "cost": 0.03,
  "duration": "2.3s"
}

View logs in: insightai_consolidated_log.json

🔒 Security Features

  • Input sanitization and validation
  • Code execution sandboxing
  • Blacklisted dangerous operations
  • Rate limiting and error handling

🎓 Examples and Tutorials

E-commerce Analysis

# Analyze online store data
df = pd.read_csv('ecommerce_data.csv')
ai = InsightAI(df)
ai.pd_agent_converse("Which products have the highest return rate and why?")

Financial Data Analysis

# Stock market analysis
ai = InsightAI()
ai.pd_agent_converse("Download Apple stock data for 2024 and analyze the trends")

Healthcare Data

# Patient data analysis (anonymized)
df = pd.read_csv('patient_outcomes.csv')
ai = InsightAI(df)
ai.pd_agent_converse("What factors correlate with better patient outcomes?")

🛠️ Development Setup

git clone https://github.com/LeoRigasaki/InSightAI.git
cd InsightAI
pip install -e ".[dev]"

🆕 Version 0.5.0 Release Notes

✨ New Features

  • Dynamic API Key Management: Only requires API keys for providers you actually use
  • Flexible Provider Support: Mix and match OpenAI, Groq, and Gemini models freely
  • Cost Optimization: Reduced overhead by eliminating unused API dependencies

🔧 Improvements

  • Smarter LLM configuration parsing
  • Better error messages for missing API keys
  • Enhanced provider validation

🐛 Bug Fixes

  • Fixed requirement for all API keys even when not needed
  • Improved initialization error handling

🤝 Contributing

We welcome contributions! Please see our Contributing Guidelines for details.

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature-name
  3. Commit changes: git commit -am 'Add feature'
  4. Push to branch: git push origin feature-name
  5. Submit a Pull Request

⚠️ Known Limitations

  • Token limits vary by model (check your plan)
  • Large datasets may require chunking
  • Rate limiting depends on your API plan
  • Complex visualizations may need manual adjustment

📄 License

MIT License - see LICENSE for details.

🙏 Acknowledgments

  • Special thanks to pgalko for the original inspiration
  • OpenAI for providing powerful language models
  • Groq for high-performance inference capabilities
  • The open-source community for continuous improvements

💬 Support


Transform your data analysis workflow today with InsightAI - where natural language meets powerful analytics! 🚀

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

insightai-0.5.1.tar.gz (46.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

insightai-0.5.1-py3-none-any.whl (46.2 kB view details)

Uploaded Python 3

File details

Details for the file insightai-0.5.1.tar.gz.

File metadata

  • Download URL: insightai-0.5.1.tar.gz
  • Upload date:
  • Size: 46.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for insightai-0.5.1.tar.gz
Algorithm Hash digest
SHA256 b288f1d41d0606be8a989e5111a4c7754d199fc32b7cc34cae038025c81c9b09
MD5 8ef3c65fa79d08cb252ac365c6f70d81
BLAKE2b-256 28568077f7f42b0777cc0ccff2005602bba082570181ce25d92156ec5a94423b

See more details on using hashes here.

File details

Details for the file insightai-0.5.1-py3-none-any.whl.

File metadata

  • Download URL: insightai-0.5.1-py3-none-any.whl
  • Upload date:
  • Size: 46.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for insightai-0.5.1-py3-none-any.whl
Algorithm Hash digest
SHA256 bb5173fc494a50abb37ec8287cc4b388375a429eb083982b1a1cdca8a379af58
MD5 579c2541a0cf0be5fcfdc388dd1e9050
BLAKE2b-256 d46961974676cd9c984be7a8aedbd4cba1ea60ce35c39593881c51d511127147

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page