If you don't know what to do with your data, I got you, Bro!
Project description
BroInsight
An LLM-agnostic data analysis agent that helps people analyze their own data through natural language conversations.
Mission
Most people have data but lack the analytics or coding skills to extract meaningful insights. BroInsight bridges this gap by providing an intelligent agent that understands business questions and translates them into data analysis - no SQL or programming knowledge required.
How It Works
BroInsight uses a sophisticated flow-based architecture with dynamic routing and intelligent error recovery:
Flow Architecture
Start → UserInput → Organize → [3-way intelligent routing]
├─ "guide" → GuideQuestion → UserInput (loop)
├─ "select_metadata" → SelectMetadata → GenerateSQL → Retrieve → [error handling]
│ ├─ retry → GenerateSQL (up to 3x)
│ └─ fallback → Chat → UserInput
└─ "default" → Chat → UserInput (loop)
Core Actions
- UserInput - Handles input modes (ask/chat) and exit detection
- Organize - Intelligent 3-way routing (query/guide/default) based on user intent
- GuideQuestion - Provides data exploration suggestions and help
- SelectMetadata - Identifies relevant tables from available data
- GenerateSQL - Converts natural language to SQL with error context
- Retrieve - Executes queries with retry mechanism and graceful fallback
- Chat - Provides conversational insights and answers
Key Features
- LLM-Agnostic: Works with any LLM (OpenAI, Anthropic, local models)
- Natural Language: Ask questions in plain English
- Metadata-Driven: Understands your data structure automatically
- Conversational: Maintains context across multiple questions
- Jupyter-Ready: Designed for data science workflows
- Dynamic Routing: Intelligent flow control based on user intent and processing results
- Error Recovery: Automatic SQL retry with up to 3 attempts and graceful fallback
- User Guidance: Built-in help system for data exploration
Development Roadmap
Phase 1: Core Agent ✅ (v0.1.0)
- ✅ Basic BroInsight agent for Jupyter Lab
- ✅ One-shot Q&A with
ask()method - ✅ Interactive chat sessions with
chat()method - ✅ Natural language to SQL conversion
- ✅ Conversational data insights
- ✅ Happy path functionality working
Phase 2: Reliability & User Guidance ✅ (v0.1.1)
- ✅ SQL error retry mechanism with fallback
- ✅ Graceful failure recovery and user-friendly error messages
- ✅ Guided questions and data exploration assistance
- ✅ Enhanced routing with help/suggestion system
- ✅ Dynamic flow control with conditional routing
- ✅ Comprehensive error handling across all actions
- ✅ Empty result detection and user feedback
- ✅ Complete session logging with DuckDB
Phase 2.5: Separated Concerns Architecture ✅ (v0.1.2-0.1.3)
- ✅ DataCatalog: Universal data ingestion (pandas, local CSV, S3)
- ✅ BroInsight: Clean LLM interface with specialized methods
- ✅ Automatic Profiling: Data quality assessment and metadata generation
- ✅ Caching System: Profile once, use multiple times
- ✅ Relationship Management: Multi-table analysis support
- ✅ Multiple Output Formats: SQL metadata, guidance metadata, DQ profiles
Phase 3: Transparency & User Feedback 🔄 (v0.2.0)
- 📋 Session inspection and audit trails
- 📋 Query execution history and performance monitoring
- 📋 User feedback collection system
- 📋 Plugin architecture for extensibility
- 📋 Configuration-driven behavior
Phase 4: Visualization & Advanced Analytics
- 📋 Enhanced chart generation with context-aware suggestions
- 📋 Pattern discovery and anomaly detection
- 📋 Multi-LLM orchestration for specialized tasks
- 📋 Real-time data streaming support
Phase 5: Enterprise Integration
- 📋 BI tool integration (Tableau, PowerBI, Looker)
- 📋 Data platform connectors (Snowflake, Databricks, BigQuery)
- 📋 MLOps pipeline integration
- 📋 Workflow orchestration (Airflow, Prefect)
Phase 6: Collaborative Analytics
- 📋 Multi-user collaboration features
- 📋 Conversational data governance
- 📋 Cross-team integration (Slack, Teams)
- 📋 Automated documentation generation
Quick Start
Simple Analysis with DataCatalog + BroInsight
import pandas as pd
from broinsight.utils.data_catalog import DataCatalog
from broinsight.broinsight import BroInsight
from brollm import OpenAIModel # or your preferred LLM
# Setup
model = OpenAIModel(api_key="your-key")
broinsight = BroInsight(model)
catalog = DataCatalog()
# Load and profile your data
df = pd.read_csv("your_data.csv")
catalog.register("sales", df, "Sales transaction data")
catalog.profile_tables(["sales"])
# Add field descriptions for better analysis
from broinsight.utils.data_spec import FieldDescription, FieldDescriptions
descriptions = FieldDescriptions(descriptions=[
FieldDescription(field_name="amount", description="Transaction amount in USD"),
FieldDescription(field_name="customer_id", description="Unique customer identifier")
])
catalog.add_field_descriptions("sales", descriptions)
# Ask questions about your data
metadata = catalog.to_sql_metadata("sales")
response = broinsight.generate_sql(metadata, "What's the average transaction amount?")
print(response['content'])
# Get data quality insights
profile = catalog.to_dq_profile("sales")
quality_response = broinsight.assess_data_quality(profile, "Is my data ready for analysis?")
print(quality_response['content'])
# Get question suggestions
guide_metadata = catalog.to_guide_metadata("sales")
suggestions = broinsight.suggest_questions(guide_metadata, "I'm a sales manager")
print(suggestions['content'])
Multi-Table Analysis
# Load multiple related tables
catalog.register("customers", customers_df, "Customer information")
catalog.register("orders", orders_df, "Order transactions")
# Define relationships
catalog.add_relationship("orders", "customer_id", "customers", "customer_id")
# Profile all tables
catalog.profile_tables(["customers", "orders"])
# Analyze across tables
metadata = catalog.to_sql_metadata(["customers", "orders"])
response = broinsight.generate_sql(metadata, "Which customers have the highest total order value?")
Requirements
- Python 3.12+
- pandas>=2.3.2
- duckdb>=1.3.2
- pydantic>=2.11.7
- brollm>=0.1.2 (LLM interface)
- broflow>=0.1.5 (workflow engine)
- broprompt>=0.1.5 (prompt management)
Architecture
Separated Concerns Design
DataCatalog - Universal data management
- Ingests pandas DataFrames, local files, S3 data
- Automatic profiling with data quality assessment
- Relationship management for multi-table analysis
- Multiple metadata output formats for different use cases
BroInsight - LLM-powered analysis interface
- Specialized methods for different analysis tasks
- Clean separation between data management and AI logic
- Pluggable architecture for future enhancements
Prompt Hub - Specialized prompts for each capability
- SQL generation with complex query support
- Data quality assessment and recommendations
- Question guidance and exploration assistance
- Chart generation with context-aware suggestions
Current Status
BroInsight v0.1.3 represents a mature, production-ready data analysis agent with separated concerns architecture. The system is designed for maintainability and extensibility, with clear interfaces between data management (DataCatalog) and AI analysis (BroInsight).
Ready for Production Use:
- Comprehensive error handling and retry mechanisms
- Data quality assessment and profiling
- Multi-table relationship support
- LLM-agnostic design for flexibility
User Feedback Phase: We're currently collecting user feedback to inform Phase 3 development decisions. The architecture is designed to evolve based on real-world usage patterns while maintaining backward compatibility.
Contributing
BroInsight is in active development. We welcome contributions that help make data analysis more accessible to everyone.
License
MIT License - see LICENSE file for details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file broinsight-0.1.8.tar.gz.
File metadata
- Download URL: broinsight-0.1.8.tar.gz
- Upload date:
- Size: 34.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0dd7fe597460eee642f4b51dc2f577bbca42c0d934c697cbaa6e4d52ed953a7b
|
|
| MD5 |
60ed1c2fe04be926a1fb6c582bc01cc5
|
|
| BLAKE2b-256 |
26a25c45bf1a0153fbd35c277ee61e5bccb61816a40cb684ba8e5eafd434bf5d
|
File details
Details for the file broinsight-0.1.8-py3-none-any.whl.
File metadata
- Download URL: broinsight-0.1.8-py3-none-any.whl
- Upload date:
- Size: 43.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6951ae1df805a6a3cb800dad25672f59e1997b40d3ec584f76c993a5e98c5352
|
|
| MD5 |
c806f5a319046c069c1d852768312e22
|
|
| BLAKE2b-256 |
166abd861099a926aae82189e5da729b8d5511eac520e12bca0ea2cea3ef25e1
|