Skip to main content

A lightweight library for dataset insights...

Project description

InsightAI 🚀

PyPI version Python 3.8+ License: MIT

A powerful open-source library that enables natural language conversations with your data using Large Language Models (LLMs). Transform complex data analysis into simple conversations - no coding required!

✨ Key Features

  • 🗣️ Natural Language Interface: Ask questions about your data in plain English
  • 🔌 Multiple Model Support: Works with OpenAI GPT models and Groq's high-speed inference
  • 🧠 Smart Analysis: Automatic code generation, data cleaning, and ML model suggestions
  • 🛠️ Error Recovery: Built-in debugging and error correction mechanisms
  • 📊 Auto-Visualization: Generates charts and graphs automatically
  • 💾 SQL Support: Native support for SQLite databases
  • 📈 Report Generation: Create comprehensive analysis reports automatically
  • 🔍 Data Quality Analysis: Identifies and fixes data quality issues
  • ⚡ Real-time Processing: Streaming responses for immediate feedback

🚀 Quick Start

Installation

pip install insightai-core

Environment Setup

Set up your API keys:

# Required: OpenAI API Key
export OPENAI_API_KEY="your-openai-api-key"

# Required: Groq API Key (for faster inference)
export GROQ_API_KEY="your-groq-api-key"

Basic Usage

import pandas as pd
from insightai import InsightAI

# Load your data
df = pd.read_csv('your_data.csv')

# Initialize InsightAI
ai = InsightAI(df)

# Start asking questions!
ai.pd_agent_converse("What are the main trends in this data?")

💡 Usage Examples

1. Interactive Data Analysis

import pandas as pd
from insightai import InsightAI

# Load sales data
df = pd.read_csv('sales_data.csv')
ai = InsightAI(df)

# Interactive mode - ask multiple questions
ai.pd_agent_converse()
# Now you can ask: "Show me monthly revenue trends"
# Or: "Which product category has the highest profit margin?"

2. Single Question Analysis

# Ask a specific question
ai = InsightAI(df)
ai.pd_agent_converse("What is the correlation between price and customer rating?")

3. SQL Database Analysis

# Analyze SQLite database
ai = InsightAI(db_path='customer_database.db')
ai.pd_agent_converse("Find the top 10 customers by total purchase amount")

4. Automated Report Generation

# Generate comprehensive analysis report
ai = InsightAI(df, generate_report=True, report_questions=5)
ai.pd_agent_converse()  # Generates a full report automatically

5. Data Cleaning and ML Suggestions

# Get data cleaning recommendations and ML model suggestions
ai = InsightAI(df)
ai.pd_agent_converse("Clean this dataset and suggest appropriate machine learning models")

🔧 Advanced Configuration

Constructor Parameters

InsightAI(
    df=None,                    # pandas DataFrame
    db_path=None,              # Path to SQLite database
    max_conversations=4,        # Conversation memory length
    debug=False,               # Enable debug mode
    exploratory=True,          # Enable exploratory analysis
    df_ontology=False,         # Enable data ontology support
    generate_report=True,      # Auto-generate reports
    report_questions=5         # Number of questions for reports
)

Custom Model Configuration

Create LLM_CONFIG.json in your working directory:

[
  {
    "agent": "Code Generator",
    "details": {
      "model": "gpt-4o",
      "provider": "openai",
      "max_tokens": 4000,
      "temperature": 0
    }
  },
  {
    "agent": "Planner",
    "details": {
      "model": "llama-3.3-70b-versatile",
      "provider": "groq",
      "max_tokens": 2000,
      "temperature": 0.1
    }
  }
]

Custom Prompts

Create PROMPT_TEMPLATES.json to customize agent behavior:

{
  "planner_system": "You are a data analysis expert...",
  "code_generator_system_df": "You are an AI data analyst..."
}

🎯 What You Can Ask

Data Exploration

  • "What does this dataset contain?"
  • "Show me the distribution of values in each column"
  • "Are there any missing values or outliers?"

Statistical Analysis

  • "What's the correlation between sales and marketing spend?"
  • "Perform a statistical summary of the numerical columns"
  • "Which factors most influence customer satisfaction?"

Visualizations

  • "Create a bar chart of revenue by product category"
  • "Plot the trend of monthly sales over time"
  • "Show me a correlation heatmap of all numerical variables"

Data Cleaning

  • "Clean this dataset and prepare it for machine learning"
  • "Handle missing values and suggest the best approach"
  • "Identify and fix data quality issues"

Machine Learning

  • "What machine learning models would work best for this data?"
  • "Prepare this data for predictive modeling"
  • "Suggest features for predicting customer churn"

Business Intelligence

  • "Generate a comprehensive analysis report"
  • "What are the key business insights from this data?"
  • "Create an executive summary of the findings"

📊 Output Examples

Automated Visualizations

InsightAI automatically saves visualizations to the visualization/ folder:

  • Bar charts, line plots, scatter plots
  • Correlation heatmaps
  • Distribution plots
  • Custom business charts

Analysis Reports

Generate professional markdown reports including:

  • Executive summary
  • Dataset overview
  • Key findings and insights
  • Recommendations
  • Supporting visualizations

Code Generation

View the actual Python code generated for your analysis:

# Example generated code
import pandas as pd
import matplotlib.pyplot as plt

# Calculate monthly revenue trends
monthly_revenue = df.groupby('month')['revenue'].sum()
plt.figure(figsize=(10, 6))
plt.plot(monthly_revenue.index, monthly_revenue.values)
plt.title('Monthly Revenue Trends')
plt.savefig('visualization/monthly_revenue_trends.png')
plt.show()

🏗️ Architecture

InsightAI uses a multi-agent architecture with specialized AI agents:

  • Expert Selector: Chooses the right agent for your task
  • Data Analyst: Performs statistical analysis and visualizations
  • SQL Analyst: Handles database queries and operations
  • Data Cleaning Expert: Identifies and fixes data quality issues
  • Code Generator: Creates Python code for your analysis
  • Error Corrector: Debugs and fixes code issues automatically
  • Report Generator: Creates comprehensive analysis reports

📈 Supported Models

OpenAI Models

  • GPT-4o, GPT-4o-mini
  • GPT-4 Turbo
  • O1 series models

Groq Models (High-Speed Inference)

  • Llama 3.3 70B
  • Llama 3.1 8B
  • Mixtral 8x7B
  • Gemma 2 9B

📝 Logging and Cost Tracking

All interactions are automatically logged with detailed cost tracking:

{
  "chain_id": "1234567890",
  "agent": "Code Generator",
  "model": "gpt-4o-mini",
  "tokens_used": 1500,
  "cost": 0.03,
  "duration": "2.3s"
}

View logs in: insightai_consolidated_log.json

🔒 Security Features

  • Input sanitization and validation
  • Code execution sandboxing
  • Blacklisted dangerous operations
  • Rate limiting and error handling

🎓 Examples and Tutorials

E-commerce Analysis

# Analyze online store data
df = pd.read_csv('ecommerce_data.csv')
ai = InsightAI(df)
ai.pd_agent_converse("Which products have the highest return rate and why?")

Financial Data Analysis

# Stock market analysis
ai = InsightAI()
ai.pd_agent_converse("Download Apple stock data for 2024 and analyze the trends")

Healthcare Data

# Patient data analysis (anonymized)
df = pd.read_csv('patient_outcomes.csv')
ai = InsightAI(df)
ai.pd_agent_converse("What factors correlate with better patient outcomes?")

🛠️ Development Setup

git clone https://github.com/LeoRigasaki/InSightAI.git
cd InsightAI
pip install -e ".[dev]"

🆕 Version 0.5.0 Release Notes

✨ New Features

  • Dynamic API Key Management: Only requires API keys for providers you actually use
  • Flexible Provider Support: Mix and match OpenAI, Groq, and Gemini models freely
  • Cost Optimization: Reduced overhead by eliminating unused API dependencies

🔧 Improvements

  • Smarter LLM configuration parsing
  • Better error messages for missing API keys
  • Enhanced provider validation

🐛 Bug Fixes

  • Fixed requirement for all API keys even when not needed
  • Improved initialization error handling

🤝 Contributing

We welcome contributions! Please see our Contributing Guidelines for details.

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature-name
  3. Commit changes: git commit -am 'Add feature'
  4. Push to branch: git push origin feature-name
  5. Submit a Pull Request

⚠️ Known Limitations

  • Token limits vary by model (check your plan)
  • Large datasets may require chunking
  • Rate limiting depends on your API plan
  • Complex visualizations may need manual adjustment

📄 License

MIT License - see LICENSE for details.

🙏 Acknowledgments

  • Special thanks to pgalko for the original inspiration
  • OpenAI for providing powerful language models
  • Groq for high-performance inference capabilities
  • The open-source community for continuous improvements

💬 Support


Transform your data analysis workflow today with InsightAI - where natural language meets powerful analytics! 🚀

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

insightai_core-0.1.1.tar.gz (48.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

insightai_core-0.1.1-py3-none-any.whl (48.5 kB view details)

Uploaded Python 3

File details

Details for the file insightai_core-0.1.1.tar.gz.

File metadata

  • Download URL: insightai_core-0.1.1.tar.gz
  • Upload date:
  • Size: 48.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for insightai_core-0.1.1.tar.gz
Algorithm Hash digest
SHA256 25dcf46e262d1742cca405cded2225fefba4958732347f3b3d2ab9924cb22d2d
MD5 305de18550b66d07b95e1189e060e4d9
BLAKE2b-256 49a580576bffb9d5c0b0797ec87aef0e47a75147a72268aff74afbe201a1e62f

See more details on using hashes here.

File details

Details for the file insightai_core-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: insightai_core-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 48.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for insightai_core-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 d8271407ccb8d33eb2072058fab057209084ce38a315cea4ee6ad756397aa1b0
MD5 f1ef801d228a27fe2002f0dd996c5d82
BLAKE2b-256 3eb6dad340fa9bbe8f3e24349ad535d9a9e5ce11edd0cbfc801fc264f531193c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page