Skip to main content

Cur-E: release-oriented imputation package distilled from the Propose_Alg path.

Project description

Cur-E

中文说明见 README.zh-CN.md.

Cur-E imputation package with release-oriented packaging.

cur-estimator is a Python package distilled from the original experimental codebase. The packaged implementation follows the algs04/Propose_Alg.py direction and keeps the release focused on the Cur-E method itself. This repository is still a simplified release package rather than a full reproduction of the original research environment.

The Cur-E pipeline implemented here includes:

  • a GRU-based bidirectional recurrent imputation core
  • interpolation-regularized training inspired by the Propose_Alg.py direction
  • a bidirectional GRU-based ITIN core
  • pchip-based interpolation regularization during training
  • NumPy-based training and inference API
  • a standalone demo.py entry for direct execution

Installation

pip install cur-estimator

For local development:

pip install .

For local packaging:

python -m build

Quick Start

import numpy as np
from cur_e import CurEImputer, make_holdout_validation

rng = np.random.default_rng(2024)
X = rng.normal(size=(32, 48, 8)).astype(np.float32)

mask = rng.random(X.shape) < 0.1
X_missing = X.copy()
X_missing[mask] = np.nan

val = make_holdout_validation(X_missing, holdout_rate=0.1, random_state=2024)

model = CurEImputer(
    n_steps=48,
    n_features=8,
    rnn_hidden_size=128,
    epochs=5,
    alpha=1.2,
)

model.fit(
    train_X=X_missing,
    train_timestamps=None,  # optional absolute timestamps s, shape (num_samples, seq_len) or (num_samples, seq_len, 1/feature_dim)
    val_X=val["X"],
    val_X_ori=val["X_ori"],
    val_indicating_mask=val["indicating_mask"],
    verbose=True,
)

imputed = model.predict(X_missing)
print(imputed.shape)

CLI / Demo

Run the standalone demo directly:

python demo.py

Run the demo with a CSV input:

python demo.py --csv your_data.csv --n-steps 48

The demo saves outputs into demo_outputs/, including:

  • cur_e_demo_model.pt
  • imputed.npy
  • input_with_nan.npy
  • input_full.npy

Input Data Format

The core API expects NumPy arrays with shape (num_samples, seq_len, feature_dim).

  • train_X, val_X, test_X must be 3D arrays
  • missing values must be represented by np.nan
  • val_X_ori must contain the intact validation target
  • val_indicating_mask must be 1 on artificially hidden validation positions and 0 elsewhere
  • train_timestamps, val_timestamps, and test_timestamps are optional absolute timestamps s
  • the model derives delta internally from adjacent timestamp differences for temporal decay
  • if timestamps are omitted, an equally spaced time axis 0, 1, 2, ... is used

Minimal example:

import numpy as np

X = np.array(
    [
        [
            [1.0, 2.0],
            [np.nan, 2.1],
            [1.2, np.nan],
        ],
        [
            [0.8, 1.5],
            [0.9, np.nan],
            [1.0, 1.7],
        ],
    ],
    dtype=np.float32,
)

This example has:

  • num_samples = 2
  • seq_len = 3
  • feature_dim = 2

CSV Demo Input Format

When using python demo.py --csv your_data.csv --n-steps 48, the CSV is interpreted as a continuous table:

  • each row is one time step
  • each column is one feature
  • if a column named timestamp exists, it is used as the absolute timestamp axis and is not treated as a feature column
  • the total row count must be at least n_steps
  • rows are reshaped into samples of shape (n_steps, feature_dim)
  • if the total number of rows is not divisible by n_steps, the tail rows are dropped

During training, the PCHIP regularization term is also evaluated along the provided timestamp axis instead of assuming equally spaced steps.

For example, a CSV with 480 rows and 8 columns and --n-steps 48 becomes:

  • num_samples = 10
  • seq_len = 48
  • feature_dim = 8

Configuration Notes

from cur_e import CurEImputer

model = CurEImputer(
    n_steps=48,
    n_features=8,
    rnn_hidden_size=128,
    batch_size=16,
    epochs=30,
    patience=3,
    learning_rate=1e-3,
    alpha=1.2,
)
  • n_steps is the sequence length per sample
  • n_features is the feature dimension per time step
  • rnn_hidden_size controls the recurrent hidden-state size
  • batch_size controls training and inference batch size
  • epochs and patience control stopping behavior
  • alpha controls the strength of Cur-E interpolation regularization

Notes

This repository is the source distribution of the cur-estimator package, intended for research, reproduction, and further development. The implementation here is a distilled package extracted from a larger experimental codebase, rather than the complete original research environment.

  • It should not be read as a claim that this package reproduces every detail of the full published paper system.
  • It is a release-oriented distillation inspired by the algs04/Propose_Alg.py direction.
  • It is not the complete original research environment or experiment pipeline.

Because the code has been extracted and simplified for packaging, it may contain engineering adaptations relative to the broader experimental system. If you need dataset-specific preprocessing or experiment orchestration, those should be added explicitly on top of this package.

License

This project is released under the Apache License 2.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cur_estimator-0.1.0.tar.gz (15.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cur_estimator-0.1.0-py3-none-any.whl (14.8 kB view details)

Uploaded Python 3

File details

Details for the file cur_estimator-0.1.0.tar.gz.

File metadata

  • Download URL: cur_estimator-0.1.0.tar.gz
  • Upload date:
  • Size: 15.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.11

File hashes

Hashes for cur_estimator-0.1.0.tar.gz
Algorithm Hash digest
SHA256 cb47ba58e8e7bc6a17cbc7eca5f2b3f00df8832f4b35bacd5577683ab663b96b
MD5 1de7b3454f772412ef5bdf983c17d28b
BLAKE2b-256 0cef02fe1eca6b5267f71acfdab969284891fa315a6d4d340f0c719c9219b703

See more details on using hashes here.

File details

Details for the file cur_estimator-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: cur_estimator-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 14.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.11

File hashes

Hashes for cur_estimator-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b963f64eb62cf6a11ef889e2f775d6a70660147976ed836ffe5ca5f54824a456
MD5 c99d9241c9488295182e7b6dbdd7d36b
BLAKE2b-256 bb7e7c04c19839136a91cf73eec82f52054d8f87f3da1f42ba9e33fffcc8dcf9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page