Statistical correlation between time-series and discrete events with optional LLM narration

These details have not been verified by PyPI

Project links

Project description

chrono-correlator

A generic statistical engine that correlates time-series data with discrete events using Mann-Whitney U, and narrates results with an LLM only when p < 0.05.

Install

# Core (statistics only — no LLM required)
pip install chrono-correlator

# With specific LLM provider
pip install chrono-correlator[groq]
pip install chrono-correlator[anthropic]
pip install chrono-correlator[ollama]      # local, no API key

# Everything
pip install chrono-correlator[all]

Quick start

from datetime import datetime, timedelta
from chrono_correlator import Event, Metric, evaluate, narrate

base = datetime(2024, 1, 1)

events = [
    Event(timestamp=base + timedelta(days=d), label="migraine")
    for d in [10, 20, 30]
]

timestamps = [base + timedelta(hours=h) for h in range(800)]
values = [55.0] * 800
for day in [10, 20, 30]:
    for h in range(48):
        idx = day * 24 - 48 + h
        if 0 <= idx < 800:
            values[idx] = 28.0

hrv = Metric(name="hrv", timestamps=timestamps, values=values)

report = evaluate(events, [hrv])
print(f"Level: {report.level} — {report.active_signals}/{report.total_signals} signals")

if report.level != "green":
    report = narrate(report, provider="groq")
    print(report.narrative)

From a pandas DataFrame

import pandas as pd
from chrono_correlator import Metric

df = pd.read_csv("hrv_data.csv")   # columns: timestamp, value
hrv = Metric.from_dataframe(df, name="hrv", timestamp_col="timestamp", value_col="value")

Lag sweep — find the best anticipatory window automatically

from chrono_correlator import find_best_lag

results = find_best_lag(events, hrv_metric, lag_range=range(0, 72, 6))

best = max(results, key=lambda k: results[k].association_strength)
print(f"Strongest signal at lag={best}h — association_strength={results[best].association_strength:.2f}")

Bootstrap confidence interval for effect size

report = evaluate(events, [hrv], bootstrap_ci=True)   # ~1s per metric
r = report.results[0]
print(f"Effect: {r.effect_size:.3f}  95% CI: [{r.effect_ci[0]:.3f}, {r.effect_ci[1]:.3f}]")

If the CI excludes 0, the effect is unlikely to be sampling noise.

Seasonal baseline correction

# Compare pre-event window only against same day of the week in the baseline
# Eliminates false positives caused by weekly patterns (e.g. traffic every Friday)
report = evaluate(events, metrics, baseline_strategy="same_weekday")

# Compare against same hour of the day — for circadian metrics (HRV, temperature)
report = evaluate(events, metrics, baseline_strategy="same_hour")

Directional analysis

# Only flag metrics that DROP before events (e.g. HRV decrease before migraine)
report = evaluate(events, metrics, direction="decrease")

# Only flag metrics that RISE before events (e.g. heart rate spike before incident)
report = evaluate(events, metrics, direction="increase")

Custom significance thresholds

from chrono_correlator import SignificanceConfig

cfg = SignificanceConfig(alpha=0.01, strong_effect=0.35, strong_consistency=0.75)
report = evaluate(events, metrics, config=cfg)

Overlapping event windows

When two events are closer together than lookback_hours, evaluate() emits a UserWarning automatically:

UserWarning: Events 'migraine' (2024-01-10) and 'migraine' (2024-01-11) are 24h apart —
pre-event windows overlap (lookback=48h). Pooled results may be inflated.

Persistence — save and reload reports

from chrono_correlator import save_report, load_reports
from datetime import datetime, timedelta

# Save to SQLite (stdlib, no extra dependencies)
row_id = save_report(report, db_path="chrono.db")

# Load all reports
history = load_reports("chrono.db")

# Filter by level or time window
alerts = load_reports("chrono.db", level="red")
recent  = load_reports("chrono.db", since=datetime.now() - timedelta(days=7))

Export to HTML and Markdown

from chrono_correlator import export_html, export_markdown

export_html(report, "report.html")         # self-contained HTML with table + narratives
export_markdown(report, "report.md")       # GitHub-ready Markdown — paste into issues/PRs

LLM narration with audit trail

# Every LLM call is logged to a JSONL file: stats + prompt + response
# Required for audits in regulated environments (health, industry)
report = narrate(report, provider="groq", audit_log="audit.jsonl")

Each audit entry:

{
  "ts": "2024-06-01T14:23:11",
  "metric": "hrv",
  "stats": {"p_value": 0.003, "effect_size": -0.41, "association_strength": 0.68, ...},
  "prompt": "Datos estadísticos CALCULADOS...",
  "response": "Patrón detectado en HRV antes del evento."
}

Multi-domain narration presets

Four built-in language/domain presets — pass the key to narrate(), BaseNarrator, or the CLI:

Key	Language	Designed for
`default`	Spanish	Health / wearables
`finance`	English	Trading signals, financial time-series
`it`	English	Infrastructure anomalies, incident management
`science`	English	Research data, academic reporting

# Finance — one English sentence, forbidden: buy / sell / predicts
report = narrate(report, provider="groq", prompt_template="finance")

# IT ops — flags anomaly / pre-incident signal language
report = narrate(report, provider="anthropic", prompt_template="it")

# Science — association observed / temporal correlation language
report = narrate(report, provider="groq", prompt_template="science")

# Custom domain — raw format string with any of the available placeholders:
# {metric_name} {baseline_median} {pre_event_median} {p_value}
# {effect_size} {consistency} {association_strength} {signal_strength}
report = narrate(
    report, provider="groq",
    prompt_template="Metric {metric_name}: p={p_value:.4f}, effect={effect_size:.3f}. Write one factual sentence.",
)

Set a default per narrator instance and override per call:

narrator = GroqNarrator(prompt_template="it")   # instance default
narrator.narrate(report)                         # uses "it"
narrator.narrate(report, prompt_template="science")  # overrides to "science"

CLI:

chrono analyze metrics.csv events.csv --narrate --provider groq --prompt-template finance
chrono analyze metrics.csv events.csv --narrate --provider anthropic --prompt-template it

PROMPT_TEMPLATES is also exported from the package for direct access or extension:

from chrono_correlator import PROMPT_TEMPLATES

# Inspect or extend
print(PROMPT_TEMPLATES["finance"])
PROMPT_TEMPLATES["my_domain"] = "Métrica {metric_name}: p={p_value:.4f}. Una frase."

Continuous monitoring (no events needed)

Statistical note: monitor() uses a rolling self-comparison: the current window is compared against the preceding baseline_days period with no discrete event anchor. Statistical assumptions differ from evaluate() — results reflect distributional drift, not pre-event patterns. Calibrate with real events first using evaluate() before relying on monitor() alerts.

from chrono_correlator import monitor, loop

# Single evaluation at now()
report = monitor(metrics, narrate=False)

# Infinite loop — calls on_alert when level is yellow or red
def alert_handler(report):
    save_report(report)
    export_html(report, f"alert_{datetime.now():%Y%m%d_%H%M}.html")

loop(metrics_fn=lambda: metrics, interval_seconds=3600, on_alert=alert_handler)

CLI

chrono analyze metrics.csv events.csv --name hrv --correction fdr
chrono analyze metrics.csv events.csv --json
chrono analyze metrics.csv events.csv --direction decrease --baseline-strategy same_weekday
chrono analyze metrics.csv events.csv --narrate --provider anthropic

Custom LLM provider

from chrono_correlator import BaseNarrator

class MyNarrator(BaseNarrator):
    def generate(self, prompt: str) -> str:
        # call any local or remote model
        ...

report = MyNarrator().narrate(report)

Adapter recipes — connect live sources without built-in connectors

Prometheus

import requests
from datetime import datetime, timedelta
from chrono_correlator import Metric

def prometheus_metric(query: str, url: str = "http://localhost:9090") -> Metric:
    end = datetime.now()
    start = end - timedelta(days=35)
    r = requests.get(f"{url}/api/v1/query_range", params={
        "query": query, "start": start.timestamp(),
        "end": end.timestamp(), "step": "1h",
    })
    data = r.json()["data"]["result"][0]["values"]
    return Metric(
        name=query,
        timestamps=[datetime.fromtimestamp(float(t)) for t, _ in data],
        values=[float(v) for _, v in data],
    )

cpu = prometheus_metric("rate(node_cpu_seconds_total[5m])")
report = evaluate(events, [cpu])

InfluxDB

from influxdb_client import InfluxDBClient
from chrono_correlator import Metric

def influx_metric(bucket: str, measurement: str, field: str, url: str, token: str) -> Metric:
    client = InfluxDBClient(url=url, token=token, org="my-org")
    query = f'from(bucket:"{bucket}") |> range(start:-35d) |> filter(fn:(r) => r._measurement == "{measurement}" and r._field == "{field}")'
    tables = client.query_api().query(query)
    rows = [(r.get_time(), r.get_value()) for table in tables for r in table.records]
    return Metric(name=field, timestamps=[t for t, _ in rows], values=[v for _, v in rows])

Watching a live CSV file

from watchdog.observers import Observer
from watchdog.events import FileSystemEventHandler
from chrono_correlator import Metric
import pandas as pd

class CsvWatcher(FileSystemEventHandler):
    def __init__(self, path: str, name: str, on_update):
        self.path, self.name, self.on_update = path, name, on_update

    def on_modified(self, event):
        if event.src_path == self.path:
            df = pd.read_csv(self.path)
            metric = Metric.from_dataframe(df, name=self.name)
            self.on_update(metric)

Generic REST API

import requests
from chrono_correlator import Metric

def api_metric(url: str, name: str, ts_field="timestamp", val_field="value") -> Metric:
    data = requests.get(url).json()
    return Metric(
        name=name,
        timestamps=[datetime.fromisoformat(row[ts_field]) for row in data],
        values=[float(row[val_field]) for row in data],
    )

Interactive notebook

examples/dashboard.ipynb — full pipeline with matplotlib visualizations, lag sweep chart, and bootstrap CI plot. No UI server required.

Key finding: p-value alone is not enough

Statistical significance (p < 0.05) can appear in large samples even with no real pattern. Effect size + consistency is what separates real signals from statistical noise.

Dataset	p-value	Effect	Consistency	Association strength	Signal
Real pattern	< 0.001	0.289	0.86	0.64	strong
Flat metrics	0.09*	-0.005	~0.4	~0.2	none
Shuffled	0.55	0.000	~0.5	0.25	none

* p < 0.05 in some metrics due to large sample size — effect size and consistency correctly identify these as noise.

CorrelationResult includes:

consistency — fraction of events individually showing the pattern (0–1)
signal_strength — "strong" / "moderate" / "weak" / "none"
association_strength — composite score: 0.5 × |effect| + 0.5 × consistency (0–1)
effect_ci — 95% bootstrap confidence interval (low, high) when bootstrap_ci=True

significant = True only when p < alpha AND signal_strength in ("strong", "moderate").

How it works

Statistical core: For each metric, values in the pre-event window (default: 48 h before, configurable lag) are compared against a 28-day baseline using Mann-Whitney U. Effect size is computed as rank-biserial correlation.
Multiple comparison correction: When analysing several metrics simultaneously, FDR (Benjamini-Hochberg) correction is applied by default to control false positives. Bonferroni is also available.
Alert level: Corrected active signals are counted. 1–2 → green, 3–4 → yellow, 5–7 → red.
LLM narration: Only triggered on yellow or red. The model receives pre-calculated statistics and is constrained to one factual sentence per signal — no diagnosis, no causal inference.

Use cases

Health monitoring — correlate HRV, deep sleep, or skin temperature drops with migraine or crisis events.
Infrastructure — detect latency or error-rate anomalies preceding service outages.
IPTV / streaming — link buffering load spikes to subscriber disconnection events.
Energy consumption — associate power demand patterns with grid stress or equipment failures.
Finance — find pre-event signals in volume, volatility, or spread data before earnings or market events.

License

Free to use in personal and commercial projects. Attribution required: keep the copyright notice. See LICENSE for full terms.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.2.0

May 8, 2026

1.1.1

May 7, 2026

1.1.0

May 7, 2026

1.0.0

May 6, 2026

0.9.0

May 6, 2026

0.8.0

May 6, 2026

0.7.0

May 6, 2026

0.6.0

May 6, 2026

0.5.0

May 6, 2026

0.4.0

May 6, 2026

0.3.0

May 6, 2026

0.2.0

May 6, 2026

0.1.0

May 6, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chrono_correlator-1.2.0.tar.gz (36.0 kB view details)

Uploaded May 8, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

chrono_correlator-1.2.0-py3-none-any.whl (37.2 kB view details)

Uploaded May 8, 2026 Python 3

File details

Details for the file chrono_correlator-1.2.0.tar.gz.

File metadata

Download URL: chrono_correlator-1.2.0.tar.gz
Upload date: May 8, 2026
Size: 36.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for chrono_correlator-1.2.0.tar.gz
Algorithm	Hash digest
SHA256	`6ece3fd63c3c3da3416df847ea27df7204781ce08e0e571e5d9a78be3cd60109`
MD5	`5a1389d2dff11387a8111dd9b8fd389a`
BLAKE2b-256	`1398a15532b9e532d1a8c4f16d1a23cb47a78d73ddec32894c0ec8b8ef1fe3ff`

See more details on using hashes here.

File details

Details for the file chrono_correlator-1.2.0-py3-none-any.whl.

File metadata

Download URL: chrono_correlator-1.2.0-py3-none-any.whl
Upload date: May 8, 2026
Size: 37.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for chrono_correlator-1.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5a65a77dbab8634f31c807eb2ae6ad6c5fd13acee953258084d678246e99bdd9`
MD5	`fa13da17103c735cabc143eba98a5a5c`
BLAKE2b-256	`132a5d752c46e495e386425cc05346bb6a62e285af5a3419b563fe861bd15cd7`

See more details on using hashes here.

chrono-correlator 1.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

chrono-correlator

Install

Quick start

From a pandas DataFrame

Lag sweep — find the best anticipatory window automatically

Bootstrap confidence interval for effect size

Seasonal baseline correction

Directional analysis

Custom significance thresholds

Overlapping event windows

Persistence — save and reload reports

Export to HTML and Markdown

LLM narration with audit trail

Multi-domain narration presets

Continuous monitoring (no events needed)

CLI

Custom LLM provider

Adapter recipes — connect live sources without built-in connectors

Prometheus

InfluxDB

Watching a live CSV file

Generic REST API

Interactive notebook

Key finding: p-value alone is not enough

How it works

Use cases

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes