Skip to main content

AI-powered analysis framework for structured data files and databases - part of the unified analysis framework suite

Project description

Data Analysis Framework

Version 2.0.0 - Part of the unified analysis framework suite

📈 Purpose

Specialized framework for analyzing structured data (spreadsheets, databases, configuration files) with AI-powered pattern detection and safe agent query capabilities.

Important: This framework focuses on structured data access via natural language queries, not document chunking. For document processing, see the complementary frameworks below.

📦 Supported Formats

Spreadsheets & Tables

  • Excel: XLSX, XLS with multiple sheets
  • CSV/TSV: Delimiter detection and parsing
  • Apache Parquet: Columnar data analysis
  • JSON: Nested and flat structure analysis
  • JSONL: Line-delimited JSON streams

Configuration Data

  • YAML: Configuration files and data serialization
  • TOML: Configuration file analysis
  • INI: Legacy configuration parsing
  • Environment Files: .env variable analysis

Database Exports

  • SQL Dumps: Schema and data analysis
  • SQLite: Database file inspection
  • Database Connection: Live data analysis

🤖 AI Integration Features

  • Schema Detection: Automatic column type inference
  • Pattern Analysis: Anomaly and trend detection
  • Data Quality Assessment: Missing values, duplicates, outliers
  • Relationship Discovery: Cross-table dependencies
  • Business Logic Extraction: Rules and constraints
  • Predictive Insights: Forecasting and recommendations

🚀 Quick Start

from data_analysis_framework import DataAnalyzer

analyzer = DataAnalyzer()
result = analyzer.analyze("sales_data.xlsx")

print(f"Data Type: {result.document_type.type_name}")
print(f"Schema: {result.analysis.schema_info}")
print(f"Quality Score: {result.analysis.quality_metrics['overall_score']}")
print(f"AI Insights: {result.analysis.ai_insights}")

🔄 Unified Interface Support

This framework now supports the unified interface standard, providing consistent access patterns across all analysis frameworks:

import data_analysis_framework as daf

# Use the unified interface
result = daf.analyze_unified("sales_data.csv")

# All access patterns work consistently
doc_type = result['document_type']        # Dict access ✓
doc_type = result.document_type           # Attribute access ✓
doc_type = result.get('document_type')    # get() method ✓
as_dict = result.to_dict()                # Full dict conversion ✓

# Works the same across all frameworks
print(f"Framework: {result.framework}")   # 'data-analysis-framework'
print(f"Type: {result.document_type}")    # 'CSV Data'
print(f"Confidence: {result.confidence}")  # Quality-based confidence
print(f"AI opportunities: {result.ai_opportunities}")

The unified interface ensures compatibility when switching between frameworks or using multiple frameworks together.

🏗️ Status

🚧 Active Development - Core functionality implemented, v2.0.0 adopts unified framework interfaces

🌐 Framework Suite

This framework is part of a unified suite of analysis frameworks, each optimized for different data types:

Document Processing Frameworks (Chunking-Based)

These frameworks chunk documents for RAG/LLM consumption:

Data Access Framework (Query-Based)

This framework provides safe AI agent access to structured data:

Shared Foundation

Key Differences

Framework Type Use Case AI Integration Output
Document Frameworks "Chunk this manual for search" RAG, semantic search Text chunks for embeddings
Data Framework "Show customers with revenue > $10M" Natural language queries Query results and insights

When to Use What

  • Processing documents? Use xml/docling/document frameworks to chunk content for vector search
  • Querying databases/spreadsheets? Use data-analysis-framework for safe AI agent access
  • Both? Combine them! Document frameworks for knowledge + data framework for operational queries

See CHUNKING_DECISION.md for detailed explanation of this framework's query-based approach.

📝 What's New in v2.0.0

  • ✅ Adopted analysis-framework-base for unified interfaces
  • ✅ Inherits from BaseAnalyzer for consistent API across frameworks
  • ✅ Implements UnifiedAnalysisResult for standard result format
  • ✅ Added get_supported_formats() method for format discovery
  • ✅ 100% backward compatible - all existing code works unchanged
  • ℹ️ Does not implement BaseChunker - uses query-based paradigm instead (see CHUNKING_DECISION.md)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

data_analysis_framework-2.0.0.tar.gz (61.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

data_analysis_framework-2.0.0-py3-none-any.whl (30.1 kB view details)

Uploaded Python 3

File details

Details for the file data_analysis_framework-2.0.0.tar.gz.

File metadata

  • Download URL: data_analysis_framework-2.0.0.tar.gz
  • Upload date:
  • Size: 61.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.12

File hashes

Hashes for data_analysis_framework-2.0.0.tar.gz
Algorithm Hash digest
SHA256 b8ef82720cd0b97b04cc25d3a2ca3c737b666cd6b959fa0762ce8d072683650c
MD5 28dec91c8b07231e20e74a721621f2b9
BLAKE2b-256 dd11dfab70145ba8b37b6627aa6bf32143f5f3b4e7c0314100d93cdf319b07e0

See more details on using hashes here.

File details

Details for the file data_analysis_framework-2.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for data_analysis_framework-2.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 55271a7b3793ee014343dd97eeb40a5c2a4314cfcd628bc17cc4518611f79811
MD5 4a3844ebc4b4776a90ad02d20f089ce6
BLAKE2b-256 7e7fbe34693e942cc8d864d5a7510625e6757d0208a4614056eaa33fd4ed4d8a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page