Skip to main content

Pure analytics engine for statistical analysis and insight generation

Project description

Xelytics-Core

Pure analytics engine for statistical analysis and insight generation.

Current Version: 0.2.0-alpha.1 (In Development)
Status: Alpha - Phase 1, 2, 3 Complete

What's New in v0.2.0

Phase 1 (Foundation): Complete

  • Extended schemas for v0.2.0 features
  • Backward compatibility guaranteed

Phase 2 (Time Series Analysis): Complete

  • Time series detection and validation
  • Trend and seasonality decomposition (STL, classical)
  • ARIMA and Exponential Smoothing forecasting
  • Anomaly detection (Z-score, IQR, MAD, Isolation Forest)
  • Change point detection

Phase 3 (Clustering): Complete

  • K-Means clustering with optimal K selection
  • DBSCAN density-based clustering
  • Hierarchical/Agglomerative clustering
  • Cluster profiling and characterization

🚧 Coming Soon (Phase 4-7):

  • Performance optimization (parallel processing, caching)
  • Enhanced LLM providers (Anthropic, Azure, Gemini)
  • Database connectors (PostgreSQL, MySQL, BigQuery)
  • Interactive HTML report generation

Installation

pip install -e .

Quick Start

from xelytics import analyze, AnalysisConfig
import pandas as pd

# Load your data
df = pd.read_csv("data.csv")

# Run automated analysis
result = analyze(df, mode="automated")

# Access results
print(f"Analyzed {result.metadata.row_count} rows")
print(f"Found {len(result.statistics)} statistical tests")
print(f"Generated {len(result.visualizations)} visualizations")
print(f"Produced {len(result.insights)} insights")

# Export to JSON
json_output = result.to_json()

API Contract

from xelytics import analyze, AnalysisConfig, AnalysisResult

result = analyze(
    data=df,
    mode="automated",  # or "semi-automated"
    config=AnalysisConfig(
        significance_level=0.05,
        enable_llm_insights=True,
        max_visualizations=10,
    )
)

Output Schema

AnalysisResult(
    summary=DatasetSummary(...),
    statistics=[StatisticalTestResult(...), ...],
    visualizations=[VisualizationSpec(...), ...],
    insights=[Insight(...), ...],
    metadata=RunMetadata(...),
)

v0.2.0 Features

Time Series Analysis

from xelytics import analyze, AnalysisConfig

config = AnalysisConfig(
    enable_time_series=True,
    datetime_column='date',
    forecast_periods=30
)

result = analyze(df, config=config)

# Access time series results
for ts in result.time_series_analysis:
    print(f"Trend: {ts.has_trend}")
    print(f"Seasonality: {ts.has_seasonality}")
    print(f"Period: {ts.seasonal_period}")

Clustering

from xelytics import analyze, AnalysisConfig

config = AnalysisConfig(
    enable_clustering=True,
    clustering_algorithm='kmeans',  # or 'dbscan', 'hierarchical'
    max_clusters=5
)

result = analyze(df, config=config)

# Access cluster results
for cluster in result.clusters:
    print(f"Cluster {cluster.cluster_id}: {cluster.size} samples")
    print(f"Silhouette score: {cluster.silhouette_score}")

Standalone Modules

# Time Series
from xelytics.timeseries import decompose_time_series, forecast_time_series, detect_anomalies

decomposition = decompose_time_series(df, 'value', datetime_column='date', period=12)
forecast = forecast_time_series(df, 'value', periods=30, method='arima')
anomalies = detect_anomalies(df, 'value', method='zscore', threshold=3.0)

# Clustering
from xelytics.clustering import cluster_kmeans, cluster_dbscan, profile_clusters

kmeans_result, df_clustered = cluster_kmeans(df, n_clusters=5)
dbscan_result, df_clustered = cluster_dbscan(df, eps=0.5, min_samples=5)
profiles = profile_clusters(df, cluster_labels)

Design Principles

  1. Pure analytics engine - No HTTP, no database, no auth
  2. Deterministic - Same input = same output
  3. LLM is optional - Rule-based insights work without LLM
  4. Type-safe - All inputs/outputs are typed dataclasses
  5. Backward compatible - v0.1.0 code works in v0.2.0

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xelytics_core-0.2.0.tar.gz (146.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

xelytics_core-0.2.0-py3-none-any.whl (138.9 kB view details)

Uploaded Python 3

File details

Details for the file xelytics_core-0.2.0.tar.gz.

File metadata

  • Download URL: xelytics_core-0.2.0.tar.gz
  • Upload date:
  • Size: 146.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for xelytics_core-0.2.0.tar.gz
Algorithm Hash digest
SHA256 15f9d8040fd886d89403c1bf600843c127aa0bacd9ad4c26ff80738474b912d1
MD5 1478f24242ad31b315fc57441c9dba1e
BLAKE2b-256 957521a1823d6f2ddf0690c8e7b8f0e820e9f901fbba4b5b2f30c8ea9df506d5

See more details on using hashes here.

File details

Details for the file xelytics_core-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: xelytics_core-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 138.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for xelytics_core-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 eb8267eaf20b65adfa3abb5c2183c702d6265bf7794f713b91f4f7e48dc20887
MD5 20a389b6699be4671db64ef3ca6b524e
BLAKE2b-256 c29972820a12de8d125da9c268ff0e7aa96c54adb815b3e14f1b6bc337964506

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page