Cur-E: release-oriented imputation package distilled from the Propose_Alg path.
Project description
Cur-E
中文说明见 README.zh-CN.md.
Cur-E imputation package with release-oriented packaging.
cur-estimator is a Python package distilled from the original experimental codebase. The packaged implementation follows the algs04/Propose_Alg.py direction and keeps the release focused on the Cur-E method itself. This repository is still a simplified release package rather than a full reproduction of the original research environment.
The Cur-E pipeline implemented here includes:
- a GRU-based bidirectional recurrent imputation core
- interpolation-regularized training inspired by the
Propose_Alg.pydirection - a bidirectional GRU-based ITIN core
- pchip-based interpolation regularization during training
- NumPy-based training and inference API
- a standalone
demo.pyentry for direct execution
Installation
pip install cur-estimator
For local development:
pip install .
For local packaging:
python -m build
Quick Start
import numpy as np
from cur_e import CurEImputer, make_holdout_validation
rng = np.random.default_rng(2024)
X = rng.normal(size=(32, 48, 8)).astype(np.float32)
mask = rng.random(X.shape) < 0.1
X_missing = X.copy()
X_missing[mask] = np.nan
val = make_holdout_validation(X_missing, holdout_rate=0.1, random_state=2024)
model = CurEImputer(
n_steps=48,
n_features=8,
rnn_hidden_size=128,
epochs=5,
alpha=1.2,
)
model.fit(
train_X=X_missing,
train_timestamps=None, # optional absolute timestamps s, shape (num_samples, seq_len) or (num_samples, seq_len, 1/feature_dim)
val_X=val["X"],
val_X_ori=val["X_ori"],
val_indicating_mask=val["indicating_mask"],
verbose=True,
)
imputed = model.predict(X_missing)
print(imputed.shape)
CLI / Demo
Run the standalone demo directly:
python demo.py
Run the demo with a CSV input:
python demo.py --csv your_data.csv --n-steps 48
The demo saves outputs into demo_outputs/, including:
cur_e_demo_model.ptimputed.npyinput_with_nan.npyinput_full.npy
Input Data Format
The core API expects NumPy arrays with shape (num_samples, seq_len, feature_dim).
train_X,val_X,test_Xmust be 3D arrays- missing values must be represented by
np.nan val_X_orimust contain the intact validation targetval_indicating_maskmust be1on artificially hidden validation positions and0elsewheretrain_timestamps,val_timestamps, andtest_timestampsare optional absolute timestampss- the model derives
deltainternally from adjacent timestamp differences for temporal decay - if timestamps are omitted, an equally spaced time axis
0, 1, 2, ...is used
Minimal example:
import numpy as np
X = np.array(
[
[
[1.0, 2.0],
[np.nan, 2.1],
[1.2, np.nan],
],
[
[0.8, 1.5],
[0.9, np.nan],
[1.0, 1.7],
],
],
dtype=np.float32,
)
This example has:
num_samples = 2seq_len = 3feature_dim = 2
CSV Demo Input Format
When using python demo.py --csv your_data.csv --n-steps 48, the CSV is interpreted as a continuous table:
- each row is one time step
- each column is one feature
- if a column named
timestampexists, it is used as the absolute timestamp axis and is not treated as a feature column - the total row count must be at least
n_steps - rows are reshaped into samples of shape
(n_steps, feature_dim) - if the total number of rows is not divisible by
n_steps, the tail rows are dropped
During training, the PCHIP regularization term is also evaluated along the provided timestamp axis instead of assuming equally spaced steps.
For example, a CSV with 480 rows and 8 columns and --n-steps 48 becomes:
num_samples = 10seq_len = 48feature_dim = 8
Configuration Notes
from cur_e import CurEImputer
model = CurEImputer(
n_steps=48,
n_features=8,
rnn_hidden_size=128,
batch_size=16,
epochs=30,
patience=3,
learning_rate=1e-3,
alpha=1.2,
)
n_stepsis the sequence length per samplen_featuresis the feature dimension per time steprnn_hidden_sizecontrols the recurrent hidden-state sizebatch_sizecontrols training and inference batch sizeepochsandpatiencecontrol stopping behavioralphacontrols the strength of Cur-E interpolation regularization
Notes
This repository is the source distribution of the cur-estimator package, intended for research, reproduction, and further development. The implementation here is a distilled package extracted from a larger experimental codebase, rather than the complete original research environment.
- It should not be read as a claim that this package reproduces every detail of the full published paper system.
- It is a release-oriented distillation inspired by the
algs04/Propose_Alg.pydirection. - It is not the complete original research environment or experiment pipeline.
Because the code has been extracted and simplified for packaging, it may contain engineering adaptations relative to the broader experimental system. If you need dataset-specific preprocessing or experiment orchestration, those should be added explicitly on top of this package.
License
This project is released under the Apache License 2.0.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cur_estimator-0.1.0.tar.gz.
File metadata
- Download URL: cur_estimator-0.1.0.tar.gz
- Upload date:
- Size: 15.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cb47ba58e8e7bc6a17cbc7eca5f2b3f00df8832f4b35bacd5577683ab663b96b
|
|
| MD5 |
1de7b3454f772412ef5bdf983c17d28b
|
|
| BLAKE2b-256 |
0cef02fe1eca6b5267f71acfdab969284891fa315a6d4d340f0c719c9219b703
|
File details
Details for the file cur_estimator-0.1.0-py3-none-any.whl.
File metadata
- Download URL: cur_estimator-0.1.0-py3-none-any.whl
- Upload date:
- Size: 14.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b963f64eb62cf6a11ef889e2f775d6a70660147976ed836ffe5ca5f54824a456
|
|
| MD5 |
c99d9241c9488295182e7b6dbdd7d36b
|
|
| BLAKE2b-256 |
bb7e7c04c19839136a91cf73eec82f52054d8f87f3da1f42ba9e33fffcc8dcf9
|