Learn constrained causal kernels and generate Polars feature tables plus diagnostics.
Project description
rtdfeatures
rtdfeatures learns constrained lag kernels from regular-grid process time series and turns them into auditable lag-aware features for downstream modelling.
flowchart LR
A[Regular process time series] --> B[Fit kernel]
B --> C[Validate against baselines]
C --> D[Generate weighted features]
D --> E[Polars feature table]
Why this exists
Process industries have lagged relationships between variables — material transport, mixing, recycle loops, and cascade dynamics mean that a change in one variable affects another only after some time, and the influence is often distributed across multiple past observations. Standard ML pipelines treat each time step independently or use arbitrary lag windows that waste signal or leak information.
rtdfeatures learns this lag structure directly from data using physically plausible constraints: causal (no future leakage), non-negative influence, sum-to-one weighting, and bounded lag windows. The result is a compact, interpretable kernel that can be validated against baselines and then used to generate auditable lag-aware features for any downstream model.
It does not train final predictive models, perform plant-wide causal discovery, or serve features in real time.
Install
pip install rtdfeatures
Requires Python >=3.10. The package depends on numpy, polars, and torch. CPU operation is the default and expected path — no GPU is required.
Quickstart
Executable example (docs-test:readme-primary):
from rtdfeatures import KernelFeatureBuilder, SimplexKernelLearner
from rtdfeatures.synthetic import make_single_delay_dataset
dataset = make_single_delay_dataset(n_rows=120, dt=60.0, seed=7)
df = dataset.data
learner = SimplexKernelLearner(max_lag="20m")
fit = learner.fit(
df,
input_col="input_signal",
target_col="target_signal",
time_col="time",
)
builder = KernelFeatureBuilder(
kernels={"learned": fit.kernel},
time_col="time",
numeric_cols=["input_signal"],
)
result = builder.transform_result(df)
features = result.features
report = result.report
registry = result.feature_registry
assert "time" in features.columns
assert "learned_num_input_signal_wmean" in features.columns
assert report.row_count == df.height
TransformResult is the preferred auditable output. It keeps the feature table, transform diagnostics, and feature registry together. The primary API transform_result() returns a TransformResult with .features (a Polars DataFrame), .report (a TransformReport with diagnostics), and .feature_registry (structured metadata per column). The simpler builder.transform(df) path returns just the feature table.
Shared multi-pair learning
Executable example (docs-test:readme-shared):
from rtdfeatures import KernelFeatureBuilder
from rtdfeatures.learners import SharedSimplexKernelLearner
from rtdfeatures.synthetic import make_multi_pair_dataset
dataset = make_multi_pair_dataset(n_rows=160, dt=60.0, seed=11)
df = dataset.data
shared = SharedSimplexKernelLearner(max_lag="40m", min_lag="10m", loss="huber")
shared_fit = shared.fit(
df,
input_cols=["input_signal_a", "input_signal_b"],
target_cols=["target_signal_a", "target_signal_b"],
time_col="time",
)
kernels = shared_fit.to_kernels()
builder = KernelFeatureBuilder(
kernels=kernels,
time_col="time",
numeric_cols=["input_signal_a", "input_signal_b"],
)
features = builder.transform(df)
assert features.height == df.height
Core concepts
Kernel — the generic object representing a weighted lag distribution. Kernels are causal, non-negative, sum-to-one, and defined over a bounded lag window. This is always the correct term unless a specific physical interpretation is justified.
RTD-like kernel — a kernel interpreted as a Residence Time Distribution. This physical interpretation requires independent evidence that the relationship is driven by material or tracer propagation (e.g. known vessel geometry, tracer tests, process knowledge). Do not claim RTD without supporting evidence.
Response kernel — a kernel interpreted as a delayed-influence relationship where the physical RTD interpretation is not justified. This is the safe default interpretation for most process-data relationships. The kernel object is identical — only the interpretation label differs.
Feature evidence — structured metadata attached to each generated feature column, recording its source column, kernel, interpretation label, lag window, and completeness status. Feature evidence makes every generated column auditable and independently interpretable.
Generated features include weighted lagged aggregations of numeric signals, categorical contribution scores, and age features (time since the kernel-weighted window). See the data model for the full output schema.
What it produces
- Constrained kernels —
LearnedKernel,FixedDelayKernel,GammaKernel, and others, each carrying its lag weights, support, and fit metadata. - Feature tables — Polars DataFrames with deterministic, auditable columns ready for downstream models.
- Diagnostics — fit diagnostics, baseline comparisons (
no_lag,best_single_lag), identifiability reports, and transform reports. - Feature evidence — per-column provenance records linking every feature back to its source, kernel, and interpretation.
Examples gallery
- nRTD laminar flow worked example — flagship extracted benchmark learning example.
- Plant-first scenario gallery — scenario-first synthetic gallery with fit evidence and feature previews.
- Parametric vs empirical fit gallery — compares parametric kernels (Gamma, Exponential) against empirical (simplex) fits on synthetic plug-flow and tanks-in-series scenarios. Run with
python examples/parametric_empirical_baseline_fits.py.
Limitations
- Input data must have a regular time grid. Irregular or missing timestamps raise by default — there is no imputation.
- The
SimplexKernelLearnerfits one input signal to one target signal. TheSharedSimplexKernelLearnerextends this to multiple pairs with a shared lag bound. - Final predictive modelling is out of scope. This package produces features and diagnostics — it does not train, evaluate, or deploy models.
- Not a causal discovery tool. Learned kernels capture lagged statistical relationships under the constraints; they do not prove physical causation.
- No online or streaming support. All operations assume a complete batch of historical data.
- No plant-topology or genealogy modelling. The package learns pairwise relationships, not full process graphs.
- Warmup rows (before the maximum lag is satisfied) produce
nullfeatures. The row count is preserved.
Citation / scientific context
If you use rtdfeatures in published work, please cite the repository. The package builds on ideas from constrained kernel learning, residence time distribution analysis, and feature engineering for irregular-spaced process data. See the cross-field research summary for background.
Development status
This is the stable v1.0.0 release. The package is in production use. Changes are documented in release notes. CI gates: pytest -m "not external_data".
Read the documentation hub for guides, examples, and API reference.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file rtdfeatures-1.0.0.tar.gz.
File metadata
- Download URL: rtdfeatures-1.0.0.tar.gz
- Upload date:
- Size: 190.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
034bb0991d4249135875661ee7b25c5d0c616b926d4531a156c72d1a96064c49
|
|
| MD5 |
76e1fcce4557d824111c3f525dd8297c
|
|
| BLAKE2b-256 |
cf0d9920bae3483fffb09f828388369b77b7999e8d66e42f4fa7fa6fb82fc15a
|
Provenance
The following attestation bundles were made for rtdfeatures-1.0.0.tar.gz:
Publisher:
publish.yml on MMerryweather/rtdfeatures
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
rtdfeatures-1.0.0.tar.gz -
Subject digest:
034bb0991d4249135875661ee7b25c5d0c616b926d4531a156c72d1a96064c49 - Sigstore transparency entry: 1679777806
- Sigstore integration time:
-
Permalink:
MMerryweather/rtdfeatures@dfc81bfaf59ee1979a1b9f5c87fc312710eaba3d -
Branch / Tag:
refs/tags/v1.0.0 - Owner: https://github.com/MMerryweather
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@dfc81bfaf59ee1979a1b9f5c87fc312710eaba3d -
Trigger Event:
release
-
Statement type:
File details
Details for the file rtdfeatures-1.0.0-py3-none-any.whl.
File metadata
- Download URL: rtdfeatures-1.0.0-py3-none-any.whl
- Upload date:
- Size: 120.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
973be27176ed193049339dcc4e1f0c1bc2f8ffda3b0f72ff57b5f67754090f79
|
|
| MD5 |
85946e7e64a8128a0b3d47c9c55848d4
|
|
| BLAKE2b-256 |
70fbd5f31b7fa5882189c7a26aedb512f46aca9082fe0cb922c0dfb1074ad2ae
|
Provenance
The following attestation bundles were made for rtdfeatures-1.0.0-py3-none-any.whl:
Publisher:
publish.yml on MMerryweather/rtdfeatures
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
rtdfeatures-1.0.0-py3-none-any.whl -
Subject digest:
973be27176ed193049339dcc4e1f0c1bc2f8ffda3b0f72ff57b5f67754090f79 - Sigstore transparency entry: 1679777907
- Sigstore integration time:
-
Permalink:
MMerryweather/rtdfeatures@dfc81bfaf59ee1979a1b9f5c87fc312710eaba3d -
Branch / Tag:
refs/tags/v1.0.0 - Owner: https://github.com/MMerryweather
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@dfc81bfaf59ee1979a1b9f5c87fc312710eaba3d -
Trigger Event:
release
-
Statement type: