A small library to preprocess diary/transaction data for time series models.
Project description
Time Series Cleaning Package
This repository contains a small Python package, timeseries_cleaner,
designed to simplify the transformation of diary or transactional data
into a format suitable for time–series forecasting models. The goal of
the package is to encapsulate the repetitive data munging steps so you
can focus on building and evaluating your models.
Features
- Flexible column mapping – Configure which columns in your raw
DataFrame represent dates, values and identifiers using a
PreprocessConfig. This means you can reuse the same functions across different datasets without editing any code. - Automatic weekly aggregation – Raw events are aggregated by week, missing weeks are filled with zeros and the resulting timeline is aligned across entities.
- Sliding window generation – Fixed length sequences of past observations (lags) are extracted along with the next value as the target. Windows containing all zeros or missing targets are automatically discarded.
- Demographic merging – Seamlessly join static attributes (e.g. gender, age) to the generated sequences via a single helper.
- Train/test splitting – Hold out the most recent weeks for evaluation, ensuring that each test window has sufficient history behind it.
Basic Usage
import pandas as pd
from timeseries_cleaner import load_data, preprocess_data, merge_demographics, train_test_split, PreprocessConfig
# Load your data (CSV or Excel). Column names will be normalised to lower
df = load_data("Income Report - Mon Apr 3 2023.xlsx", sheet_name="Income Reports")
# Select and rename the relevant columns from the full report
dt = df[[
"respondent id", "gender", "age", "number of children",
"marital status", "country of residence", "income report amount",
"income report date created"
]].rename(columns={
"respondent id": "id",
"gender": "gender",
"age": "age",
"number of children": "children",
"marital status": "marital",
"country of residence": "country",
"income report amount": "amount",
"income report date created": "date"
})
demographic_cols = ["id", "gender", "age", "children", "marital", "country"]
demographics = dt[demographic_cols].drop_duplicates("id")
config = PreprocessConfig(
date_col="date",
value_col="amount",
id_col="id",
window=6,
)
# Split into training and testing sets and process each
train, test, full = train_test_split(dt, config=config, weeks_back=3, demographics=demographics)
print(train.head())
Updating the Package
The package is intentionally small and easy to extend. You can add
additional helper functions or modify existing ones simply by editing
the modules inside timeseries_cleaner/. No special tooling is
required: the package does not depend on any external libraries beyond
Pandas, which is installed by default in most data science
environments.
If you wish to distribute or install this package into your own
projects, consider adding a minimal setup.py or pyproject.toml.
For the purposes of this exercise the files have been arranged so that
you can import directly from the local directory without installation.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file lift_timeseries_cleaner-0.1.0-py3-none-any.whl.
File metadata
- Download URL: lift_timeseries_cleaner-0.1.0-py3-none-any.whl
- Upload date:
- Size: 11.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3da51836d5170e03e3fe0a9e7217e9ab3941d1e560ed1b944fdadb2844278663
|
|
| MD5 |
fd76103b5e42c47e3fa2ebc9d8d00499
|
|
| BLAKE2b-256 |
046299ccbdade3f9a85394ceab5b57da0286185344a9511bc4effd8fbf662248
|