A small library to preprocess diary/transaction data for time series models.

These details have not been verified by PyPI

Project links

Homepage

Project description

Time Series Cleaning Package

This repository contains a small Python package, timeseries_cleaner, designed to simplify the transformation of diary or transactional data into a format suitable for time–series forecasting models. The goal of the package is to encapsulate the repetitive data munging steps so you can focus on building and evaluating your models.

Features

Flexible column mapping – Configure which columns in your raw DataFrame represent dates, values and identifiers using a PreprocessConfig. This means you can reuse the same functions across different datasets without editing any code.
Automatic weekly aggregation – Raw events are aggregated by week, missing weeks are filled with zeros and the resulting timeline is aligned across entities.
Sliding window generation – Fixed length sequences of past observations (lags) are extracted along with the next value as the target. Windows containing all zeros or missing targets are automatically discarded.
Demographic merging – Seamlessly join static attributes (e.g. gender, age) to the generated sequences via a single helper.
Train/test splitting – Hold out the most recent weeks for evaluation, ensuring that each test window has sufficient history behind it.

Basic Usage

import pandas as pd
from timeseries_cleaner import load_data, preprocess_data, merge_demographics, train_test_split, PreprocessConfig

# Load your data (CSV or Excel). Column names will be normalised to lower
df = load_data("Income Report - Mon Apr 3 2023.xlsx", sheet_name="Income Reports")

# Select and rename the relevant columns from the full report
dt = df[[
    "respondent id", "gender", "age", "number of children",
    "marital status", "country of residence", "income report amount",
    "income report date created"
]].rename(columns={
    "respondent id": "id",
    "gender": "gender",
    "age": "age",
    "number of children": "children",
    "marital status": "marital",
    "country of residence": "country",
    "income report amount": "amount",
    "income report date created": "date"
})

demographic_cols = ["id", "gender", "age", "children", "marital", "country"]
demographics = dt[demographic_cols].drop_duplicates("id")

config = PreprocessConfig(
    date_col="date",
    value_col="amount",
    id_col="id",
    window=6,
)

# Split into training and testing sets and process each
train, test, full = train_test_split(dt, config=config, weeks_back=3, demographics=demographics)

print(train.head())

Updating the Package

The package is intentionally small and easy to extend. You can add additional helper functions or modify existing ones simply by editing the modules inside timeseries_cleaner/. No special tooling is required: the package does not depend on any external libraries beyond Pandas, which is installed by default in most data science environments.

If you wish to distribute or install this package into your own projects, consider adding a minimal setup.py or pyproject.toml. For the purposes of this exercise the files have been arranged so that you can import directly from the local directory without installation.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.1.1

Oct 18, 2025

This version

0.1.0

Oct 18, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

lift_timeseries_cleaner-0.1.0-py3-none-any.whl (11.6 kB view details)

Uploaded Oct 18, 2025 Python 3

File details

Details for the file lift_timeseries_cleaner-0.1.0-py3-none-any.whl.

File metadata

Download URL: lift_timeseries_cleaner-0.1.0-py3-none-any.whl
Upload date: Oct 18, 2025
Size: 11.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for lift_timeseries_cleaner-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3da51836d5170e03e3fe0a9e7217e9ab3941d1e560ed1b944fdadb2844278663`
MD5	`fd76103b5e42c47e3fa2ebc9d8d00499`
BLAKE2b-256	`046299ccbdade3f9a85394ceab5b57da0286185344a9511bc4effd8fbf662248`

See more details on using hashes here.

lift-timeseries-cleaner 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Time Series Cleaning Package

Features

Basic Usage

Updating the Package

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes