Skip to main content

SOTA unsupervised auto-annotation SDK for time series classification

Project description

AutoAnnotate-TimeSeries ๐Ÿ“Š

State-of-the-art unsupervised auto-annotation SDK for time series classification with GUI

Tests Python 3.10+ License: MIT Code style: black

AutoAnnotate-TimeSeries automatically clusters and organizes unlabeled time series datasets using cutting-edge Chronos foundation models by Amazon. It features a GUI and interactive HTML preview with Plotly charts for visual cluster inspection, as well as CLI tool.

โœจ Features

  • ๐ŸŽจ Graphical User Interface: Easy file browser and visual controls via autoannotate-ts
  • ๐Ÿ“ˆ Interactive Plotly Charts: View cluster samples in browser before labeling
  • ๐Ÿค– SOTA Foundation Models: Chronos-T5, Chronos-2
  • ๐Ÿ”ฌ Multiple Clustering: K-means, HDBSCAN, Spectral, DBSCAN
  • ๐Ÿ“ Smart Organization: CSV files named after cluster names for easy identification
  • ๐Ÿ• Flexible Timestamp Handling: Auto-detect or specify timestamp column (GUI uses indices, CLI uses names)
  • ๐Ÿ“‚ Clean Output: HTML preview files saved in output folder alongside results
  • โœ‚๏ธ Auto Splits: Train/val/test dataset splitting
  • ๐Ÿ’พ Export: CSV, JSON formats
  • ๐Ÿ“Š Single CSV Input: All time series in one file
  • ๐Ÿ”Œ Python API: Full programmatic control

๐Ÿš€ Installation

pip install autoannotate-timeseries

Optional Dependencies

HDBSCAN Clustering (Optional):

If you want to use the HDBSCAN clustering method:

# Option 1: Install with the package
pip install autoannotate-timeseries[hdbscan]

# Option 2: Install separately before running autoannotate
pip install hdbscan

Note: HDBSCAN is not required for the default K-means, Spectral, or DBSCAN methods. Only install it if you specifically need HDBSCAN clustering.

Development Tools:

pip install -e .[dev]

After Installation

Two commands are available:

  • autoannotate-ts - Launch the graphical user interface
  • autoannotate-ts-cli - Command-line interface for automation

Check installation:

autoannotate-ts-cli --version
autoannotate-ts-cli --help

๐Ÿ“ Input Data Format

Your CSV Structure

INPUT: One CSV file with multiple time series as columns

timestamp,series_1,series_2,series_3,series_4,series_5
2024-01-01 00:00:00,10.5,20.1,15.3,18.2,22.5
2024-01-01 01:00:00,11.2,19.8,14.9,17.8,23.1
2024-01-01 02:00:00,9.8,21.2,15.7,18.5,21.8
2024-01-01 03:00:00,10.1,19.5,16.1,18.0,22.2
...

Key Points:

  • First column can be timestamp (auto-detected or specify explicitly)
  • Each column = one time series to be clustered
  • Column names are preserved as series identifiers
  • Variable length series supported
  • Missing values automatically handled

Timestamp Column Handling:

  • Auto-detect (recommended): Leave empty in GUI or omit --timestamp-column in CLI
  • GUI: Use column index (0 = first column, 1 = second column, etc.)
  • CLI: Use column name (e.g., --timestamp-column "timestamp")

Specify timestamp column:

autoannotate-ts-cli annotate data.csv output --timestamp-column "datetime" --n-clusters 5

Output Structure

output/
โ”œโ”€โ”€ increasing_trend/
โ”‚   โ””โ”€โ”€ increasing_trend.csv    # Contains series_1, series_4 (all rows)
โ”œโ”€โ”€ decreasing_trend/
โ”‚   โ””โ”€โ”€ decreasing_trend.csv    # Contains series_2 (all rows)
โ”œโ”€โ”€ seasonal/
โ”‚   โ””โ”€โ”€ seasonal.csv            # Contains series_3, series_5 (all rows)
โ”œโ”€โ”€ unclustered/
โ”‚   โ””โ”€โ”€ unclustered.csv         # Outliers/noise
โ”œโ”€โ”€ splits/                     # Available with a CLI parameter
โ”‚   โ”œโ”€โ”€ train/
โ”‚   โ”‚   โ”œโ”€โ”€ increasing_trend/
โ”‚   โ”‚   โ”‚   โ””โ”€โ”€ increasing_trend.csv
โ”‚   โ”‚   โ””โ”€โ”€ ...
โ”‚   โ”œโ”€โ”€ val/
โ”‚   โ””โ”€โ”€ test/
โ”œโ”€โ”€ cluster_0_preview.html      # HTML preview files (saved in output folder)
โ”œโ”€โ”€ cluster_1_preview.html
โ”œโ”€โ”€ cluster_2_preview.html
โ”œโ”€โ”€ metadata.json
โ””โ”€โ”€ labels.csv

Key Points:

  • Each class folder contains ONE CSV file named after the class
  • CSV file includes timestamp column and all time series belonging to that class
  • HTML preview files are saved in the output folder for reference

๐ŸŽจ Quick Start - GUI

The easiest way to use AutoAnnotate-TimeSeries:

autoannotate-ts

Workflow:

  1. ๐Ÿ“ Select input CSV file (with multiple time series as columns)
  2. ๐Ÿ“‚ Select output folder
  3. ๐Ÿ”ข Set number of classes
  4. ๐Ÿค– Choose model
  5. ๐Ÿ“ Configure context length (512 for typical series, 1024+ for long series)
  6. ๐Ÿ“Š [Optional] Specify timestamp column index (e.g., 0 for first column, leave empty for auto-detect)
  7. โ–ถ๏ธ Click "Start Auto-Annotation"

The app will:

  • Cluster your time series automatically
  • Open interactive HTML previews in your browser with Plotly charts for each cluster
  • Save all preview files in the output folder (not project root)
  • Prompt you to label each cluster interactively

๐Ÿ’ป CLI Usage

Basic Command

autoannotate-ts-cli annotate /path/to/data.csv /path/to/output \
    --n-clusters 5 \
    --model chronos-t5-tiny \
    --create-splits

Advanced CLI Options

autoannotate-ts-cli annotate ./data/sensors.csv ./output \
    --n-clusters 8 \
    --method hdbscan \
    --model chronos-2 \
    --context-length 512 \
    --timestamp-column "datetime" \
    --create-splits \
    --export-format json

Available models: chronos-t5-tiny, chronos-t5-small, chronos-2

Note: CLI uses column names for timestamp (e.g., --timestamp-column "timestamp"), while GUI uses column * indices* (e.g., 0 for first column).

CLI Options Reference

autoannotate-ts-cli annotate INPUT_FILE OUTPUT_DIR [OPTIONS]

Options:
  --n-clusters, -n INTEGER        Number of clusters (required for kmeans/spectral)
  --method, -m [kmeans|hdbscan|spectral|dbscan]
                                  Clustering method (default: kmeans)
  --model [chronos-t5-tiny|chronos-t5-small|chronos-2]
                                  Embedding model (default: chronos-2)
  --batch-size, -b INTEGER        Batch size for embedding extraction (default: 16)
  --n-samples INTEGER             Representative samples per cluster (default: 5)
  --context-length INTEGER        Context length for models (default: 512)
  --timestamp-column TEXT         Timestamp column name (auto-detected if not specified)
  --create-splits                 Create train/val/test splits
  --export-format [csv|json]      Export labels format (default: csv)
  --help                          Show this message and exit

Technical Details:

  • Batch Size: Default is 16 for both GUI and CLI, optimized for memory efficiency
  • Dimensionality Reduction: Automatically applied when dataset has more than 50 time series
  • Context Length: Number of time steps processed by the model (512 for typical series, up to 8192 for chrono-2 and long time-series)

๐Ÿ Python API

from autoannotate import AutoAnnotator
from pathlib import Path

annotator = AutoAnnotator(
    input_file=Path("./data/timeseries.csv"),
    output_dir=Path("./output"),
    model="chronos-t5-tiny",
    clustering_method="kmeans",
    n_clusters=5,
    batch_size=16,
    context_length=512,
    timestamp_column="timestamp"  # Optional
)

result = annotator.run_full_pipeline(
    n_samples=7,
    create_splits=True,
    export_format="csv"
)

print(f"Processed {result['n_timeseries']} time series")
print(f"Created {result['n_clusters']} classes")

Manual Pipeline Control

annotator.load_timeseries()
annotator.extract_embeddings()
annotator.cluster()

stats = annotator.get_cluster_stats()
print(f"Found {stats['n_clusters']} clusters")

class_names = {
    0: "increasing_trend",
    1: "decreasing_trend",
    2: "seasonal_pattern",
    3: "stationary"
}

annotator.organize_dataset(class_names)
annotator.export_labels(format="json")

๐Ÿ“Š Example: Real-World Sensor Data

Input CSV (sensors.csv):

timestamp,temp_A,temp_B,temp_C,humidity_A,humidity_B
2024-01-01 00:00,22.5,23.1,21.8,65.2,64.8
2024-01-01 01:00,22.8,23.0,21.9,65.5,64.9
2024-01-01 02:00,23.1,22.9,22.1,65.8,65.1
...

Command:

autoannotate-ts-cli annotate sensors.csv ./organized \
    --n-clusters 3 \
    --timestamp-column "timestamp"

Output:

organized/
โ”œโ”€โ”€ stable_temperature/
โ”‚   โ””โ”€โ”€ stable_temperature.csv        # Contains: timestamp, temp_A, temp_C
โ”œโ”€โ”€ variable_temperature/
โ”‚   โ””โ”€โ”€ variable_temperature.csv      # Contains: timestamp, temp_B
โ”œโ”€โ”€ high_humidity/
โ”‚   โ””โ”€โ”€ high_humidity.csv             # Contains: timestamp, humidity_A, humidity_B
โ”œโ”€โ”€ cluster_0_preview.html
โ”œโ”€โ”€ cluster_1_preview.html
โ”œโ”€โ”€ cluster_2_preview.html
โ”œโ”€โ”€ metadata.json
โ””โ”€โ”€ labels.csv

๐Ÿง  Model Comparison

Model Context Speed Quality Best For
chronos-t5-tiny 512 โšกโšกโšก โญโญโญ Fast inference, small datasets
chronos-t5-small 512 โšกโšก โญโญโญโญ Balanced (recommended)
chronos-2 up to 8192 โšก โญโญโญโญโญ Best quality, long series (v2 model)

Important Notes:

  • chronos-2 is a completely new architecture (uses Chronos2Pipeline) with support for much longer time series (up to 8192 tokens vs 512)
  • chronos-2 requires chronos-forecasting>=2.0.0 (Already installed with pip install autoannotate-timeseries)
  • For most use cases, chronos-t5-small offers the best balance of speed and quality

๐Ÿ”ฌ Clustering Methods

The following methods can be specified when running autoannotate-ts-cli annotate through the --method argument

Method Auto K Handles Noise Best For Installation
kmeans โŒ โŒ Fast, spherical clusters โœ… Included (Default method used in GUI)
hdbscan โœ… โœ… Complex shapes, outliers โš ๏ธ Optional: pip install ...[hdbscan]
spectral โŒ โŒ Non-convex shapes โœ… Included
dbscan โœ… โœ… Density-based โœ… Included

Note: HDBSCAN requires separate installation. See Optional Dependencies section.

โœ… Quick Validation

Test if your CSV file is valid:

autoannotate-ts-cli validate ./your_data.csv

This shows:

  • Number of time series columns found
  • Column names
  • Auto-detected timestamp column (if present)

With explicit timestamp column:

autoannotate-ts-cli validate ./your_data.csv --timestamp-column "timestamp"

๐Ÿ› Troubleshooting

Out of Memory?

Reduce batch size and context length for large datasets:

annotator = AutoAnnotator(
    input_file=Path("./data.csv"),
    output_dir=Path("./output"),
    batch_size=8,  # Reduce from default 16 to 8
    context_length=256,  # Reduce from default 512 to 256
    model="chronos-t5-tiny"
)

Or for CLI:

autoannotate-ts-cli annotate data.csv output \
    --batch-size 8 \
    --context-length 256 \
    --model chronos-t5-tiny \
    --n-clusters 5

Too Many/Few Clusters?

Try HDBSCAN for automatic cluster detection:

autoannotate-ts-cli annotate data.csv output --method hdbscan

Note: HDBSCAN must be installed first:

pip install autoannotate-timeseries[hdbscan]

If you try to use HDBSCAN without installing it, you'll get an error: ImportError: HDBSCAN is not installed. Install it with: pip install autoannotate-timeseries[hdbscan]

Need to specify timestamp column?

CLI (uses column name):

autoannotate-ts-cli annotate data.csv output --timestamp-column "datetime" --n-clusters 5

GUI (uses column index):

  • Enter 0 for first column, 1 for second column, etc.
  • Leave empty to auto-detect

๐Ÿ”„ Data Preparation Tips

If you have separate CSV files per time series:

Merge them first:

import pandas as pd
from pathlib import Path

dfs = []
for csv_file in Path("./separate_files").glob("*.csv"):
    df = pd.read_csv(csv_file)
    series_name = csv_file.stem
    df_renamed = df.rename(columns={"value": series_name})
    dfs.append(df_renamed)

merged_df = pd.concat(dfs, axis=1)
merged_df.to_csv("combined_timeseries.csv", index=False)

If you have wide format with row-based time series:

Transpose it:

import pandas as pd

df = pd.read_csv("wide_format.csv")
df_transposed = df.T
df_transposed.to_csv("column_format.csv")

๐Ÿค Contributing

  1. Fork the repository
  2. Create feature branch
  3. All actions from tests.yml should pass.
  4. Push and create PR

๐Ÿ“„ License

MIT License - see LICENSE file.

๐Ÿ™ Acknowledgments

Built with PyTorch, scikit-learn, pandas, numpy and more. Foundation models: Chronos-T5 and Chronos-2 (Amazon)

Made for the RAIDO Project, from MetaMind Innovations


Sister Project: AutoAnnotate-Vision - For image auto-annotation

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

autoannotate_timeseries-0.1.5.tar.gz (41.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

autoannotate_timeseries-0.1.5-py3-none-any.whl (32.9 kB view details)

Uploaded Python 3

File details

Details for the file autoannotate_timeseries-0.1.5.tar.gz.

File metadata

  • Download URL: autoannotate_timeseries-0.1.5.tar.gz
  • Upload date:
  • Size: 41.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for autoannotate_timeseries-0.1.5.tar.gz
Algorithm Hash digest
SHA256 fa07bfc3d1ed2fc43fbe750f62cd3853f8d7b17119778fcfd64c330f238aadbb
MD5 1f549de42ce7fc890cd37384e51d9aa7
BLAKE2b-256 421e1a63d0829bfc2ae46a84fdab272412a061429c55dc5a22e38570f090b633

See more details on using hashes here.

File details

Details for the file autoannotate_timeseries-0.1.5-py3-none-any.whl.

File metadata

File hashes

Hashes for autoannotate_timeseries-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 f9feecde5ec940b1da0b19ec583d1c1306716d3ef17f6e944835036314935e4d
MD5 e53eee175b1a290b3716fcab925dd85a
BLAKE2b-256 02a2448298e01ef7051b0a8942b65622867ff879f68c2b2060490376e3963d01

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page