AI-powered analysis framework for structured data files and databases - part of the unified analysis framework suite
Project description
Data Analysis Framework
Version 2.0.0 - Part of the unified analysis framework suite
📈 Purpose
Specialized framework for analyzing structured data (spreadsheets, databases, configuration files) with AI-powered pattern detection and safe agent query capabilities.
Important: This framework focuses on structured data access via natural language queries, not document chunking. For document processing, see the complementary frameworks below.
📦 Supported Formats
Spreadsheets & Tables
- Excel: XLSX, XLS with multiple sheets
- CSV/TSV: Delimiter detection and parsing
- Apache Parquet: Columnar data analysis
- JSON: Nested and flat structure analysis
- JSONL: Line-delimited JSON streams
Configuration Data
- YAML: Configuration files and data serialization
- TOML: Configuration file analysis
- INI: Legacy configuration parsing
- Environment Files: .env variable analysis
Database Exports
- SQL Dumps: Schema and data analysis
- SQLite: Database file inspection
- Database Connection: Live data analysis
🤖 AI Integration Features
- Schema Detection: Automatic column type inference
- Pattern Analysis: Anomaly and trend detection
- Data Quality Assessment: Missing values, duplicates, outliers
- Relationship Discovery: Cross-table dependencies
- Business Logic Extraction: Rules and constraints
- Predictive Insights: Forecasting and recommendations
🚀 Quick Start
from data_analysis_framework import DataAnalyzer
analyzer = DataAnalyzer()
result = analyzer.analyze("sales_data.xlsx")
print(f"Data Type: {result.document_type.type_name}")
print(f"Schema: {result.analysis.schema_info}")
print(f"Quality Score: {result.analysis.quality_metrics['overall_score']}")
print(f"AI Insights: {result.analysis.ai_insights}")
🔄 Unified Interface Support
This framework now supports the unified interface standard, providing consistent access patterns across all analysis frameworks:
import data_analysis_framework as daf
# Use the unified interface
result = daf.analyze_unified("sales_data.csv")
# All access patterns work consistently
doc_type = result['document_type'] # Dict access ✓
doc_type = result.document_type # Attribute access ✓
doc_type = result.get('document_type') # get() method ✓
as_dict = result.to_dict() # Full dict conversion ✓
# Works the same across all frameworks
print(f"Framework: {result.framework}") # 'data-analysis-framework'
print(f"Type: {result.document_type}") # 'CSV Data'
print(f"Confidence: {result.confidence}") # Quality-based confidence
print(f"AI opportunities: {result.ai_opportunities}")
The unified interface ensures compatibility when switching between frameworks or using multiple frameworks together.
🏗️ Status
🚧 Active Development - Core functionality implemented, v2.0.0 adopts unified framework interfaces
🌐 Framework Suite
This framework is part of a unified suite of analysis frameworks, each optimized for different data types:
Document Processing Frameworks (Chunking-Based)
These frameworks chunk documents for RAG/LLM consumption:
- xml-analysis-framework - XML document analysis with 29+ specialized handlers (SCAP, Maven, Spring, etc.)
- docling-analysis-framework - Office documents, PDFs, and images using IBM Docling
- document-analysis-framework - General document processing and analysis
Data Access Framework (Query-Based)
This framework provides safe AI agent access to structured data:
- data-analysis-framework (this framework) - Structured data via natural language queries
Shared Foundation
- analysis-framework-base - Common interfaces and models for all frameworks
Key Differences
| Framework Type | Use Case | AI Integration | Output |
|---|---|---|---|
| Document Frameworks | "Chunk this manual for search" | RAG, semantic search | Text chunks for embeddings |
| Data Framework | "Show customers with revenue > $10M" | Natural language queries | Query results and insights |
When to Use What
- Processing documents? Use xml/docling/document frameworks to chunk content for vector search
- Querying databases/spreadsheets? Use data-analysis-framework for safe AI agent access
- Both? Combine them! Document frameworks for knowledge + data framework for operational queries
See CHUNKING_DECISION.md for detailed explanation of this framework's query-based approach.
📝 What's New in v2.0.0
- ✅ Adopted
analysis-framework-basefor unified interfaces - ✅ Inherits from
BaseAnalyzerfor consistent API across frameworks - ✅ Implements
UnifiedAnalysisResultfor standard result format - ✅ Added
get_supported_formats()method for format discovery - ✅ 100% backward compatible - all existing code works unchanged
- ℹ️ Does not implement
BaseChunker- uses query-based paradigm instead (see CHUNKING_DECISION.md)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file data_analysis_framework-2.0.0.tar.gz.
File metadata
- Download URL: data_analysis_framework-2.0.0.tar.gz
- Upload date:
- Size: 61.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b8ef82720cd0b97b04cc25d3a2ca3c737b666cd6b959fa0762ce8d072683650c
|
|
| MD5 |
28dec91c8b07231e20e74a721621f2b9
|
|
| BLAKE2b-256 |
dd11dfab70145ba8b37b6627aa6bf32143f5f3b4e7c0314100d93cdf319b07e0
|
File details
Details for the file data_analysis_framework-2.0.0-py3-none-any.whl.
File metadata
- Download URL: data_analysis_framework-2.0.0-py3-none-any.whl
- Upload date:
- Size: 30.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
55271a7b3793ee014343dd97eeb40a5c2a4314cfcd628bc17cc4518611f79811
|
|
| MD5 |
4a3844ebc4b4776a90ad02d20f089ce6
|
|
| BLAKE2b-256 |
7e7fbe34693e942cc8d864d5a7510625e6757d0208a4614056eaa33fd4ed4d8a
|