Skip to main content

SOTA unsupervised auto-annotation SDK for time series classification

Project description

AutoAnnotate-TimeSeries ๐Ÿ“Š

State-of-the-art unsupervised auto-annotation SDK for time series classification with GUI

Tests Python 3.10+ License: MIT Code style: black

AutoAnnotate-TimeSeries automatically clusters and organizes unlabeled time series datasets using cutting-edge foundation models (Chronos, Moirai, Lag-Llama). Features a graphical user interface for easy use and interactive HTML preview with Plotly charts for visual cluster inspection.

โœจ Features

  • ๐ŸŽจ Graphical User Interface: Easy file browser and visual controls via autoannotate-ts
  • ๐Ÿ“ˆ Interactive Plotly Charts: View cluster samples in browser before labeling
  • ๐Ÿค– SOTA Foundation Models: Chronos-T5, Moirai, Lag-Llama
  • ๐Ÿ”ฌ Multiple Clustering: K-means, HDBSCAN, Spectral, DBSCAN
  • ๐Ÿ“ Smart Organization: CSV files named after cluster names for easy identification
  • ๐Ÿ• Flexible Timestamp Handling: Auto-detect or specify timestamp column (GUI uses indices, CLI uses names)
  • ๐Ÿ“‚ Clean Output: HTML preview files saved in output folder alongside results
  • โœ‚๏ธ Auto Splits: Train/val/test dataset splitting
  • ๐Ÿ’พ Export: CSV, JSON formats
  • ๐Ÿ“Š Single CSV Input: All time series in one file
  • ๐Ÿ”Œ Python API: Full programmatic control

๐Ÿš€ Installation

pip install autoannotate-timeseries

Optional Dependencies

HDBSCAN Clustering (Optional):

If you want to use the HDBSCAN clustering method:

# Option 1: Install with the package
pip install autoannotate-timeseries[hdbscan]

# Option 2: Install separately before running autoannotate
pip install hdbscan

Note: HDBSCAN is not required for the default K-means, Spectral, or DBSCAN methods. Only install it if you specifically need HDBSCAN clustering.

Development Tools:

pip install -e .[dev]

After Installation

Two commands are available:

  • autoannotate-ts - Launch the graphical user interface
  • autoannotate-ts-cli - Command-line interface for automation

Check installation:

autoannotate-ts-cli --version
autoannotate-ts-cli --help

๐Ÿ“ Input Data Format

Your CSV Structure

INPUT: One CSV file with multiple time series as columns

timestamp,series_1,series_2,series_3,series_4,series_5
2024-01-01 00:00:00,10.5,20.1,15.3,18.2,22.5
2024-01-01 01:00:00,11.2,19.8,14.9,17.8,23.1
2024-01-01 02:00:00,9.8,21.2,15.7,18.5,21.8
2024-01-01 03:00:00,10.1,19.5,16.1,18.0,22.2
...

Key Points:

  • First column can be timestamp (auto-detected or specify explicitly)
  • Each column = one time series to be clustered
  • Column names are preserved as series identifiers
  • Variable length series supported
  • Missing values automatically handled

Timestamp Column Handling:

  • Auto-detect (recommended): Leave empty in GUI or omit --timestamp-column in CLI
  • GUI: Use column index (0 = first column, 1 = second column, etc.)
  • CLI: Use column name (e.g., --timestamp-column "timestamp")

Specify timestamp column:

autoannotate-ts-cli annotate data.csv output --timestamp-column "datetime" --n-clusters 5

Output Structure

output/
โ”œโ”€โ”€ increasing_trend/
โ”‚   โ””โ”€โ”€ increasing_trend.csv    # Contains series_1, series_4 (all rows)
โ”œโ”€โ”€ decreasing_trend/
โ”‚   โ””โ”€โ”€ decreasing_trend.csv    # Contains series_2 (all rows)
โ”œโ”€โ”€ seasonal/
โ”‚   โ””โ”€โ”€ seasonal.csv            # Contains series_3, series_5 (all rows)
โ”œโ”€โ”€ unclustered/
โ”‚   โ””โ”€โ”€ unclustered.csv         # Outliers/noise
โ”œโ”€โ”€ splits/                     # Available with a CLI parameter
โ”‚   โ”œโ”€โ”€ train/
โ”‚   โ”‚   โ”œโ”€โ”€ increasing_trend/
โ”‚   โ”‚   โ”‚   โ””โ”€โ”€ increasing_trend.csv
โ”‚   โ”‚   โ””โ”€โ”€ ...
โ”‚   โ”œโ”€โ”€ val/
โ”‚   โ””โ”€โ”€ test/
โ”œโ”€โ”€ cluster_0_preview.html      # HTML preview files (saved in output folder)
โ”œโ”€โ”€ cluster_1_preview.html
โ”œโ”€โ”€ cluster_2_preview.html
โ”œโ”€โ”€ metadata.json
โ””โ”€โ”€ labels.csv

Key Points:

  • Each class folder contains ONE CSV file named after the class
  • CSV file includes timestamp column and all time series belonging to that class
  • HTML preview files are saved in the output folder for reference

๐ŸŽจ Quick Start - GUI

The easiest way to use AutoAnnotate-TimeSeries:

autoannotate-ts

Workflow:

  1. ๐Ÿ“ Select input CSV file (with multiple time series as columns)
  2. ๐Ÿ“‚ Select output folder
  3. ๐Ÿ”ข Set number of classes
  4. ๐Ÿค– Choose model
  5. ๐Ÿ“ Configure context length (512 for typical series, 1024+ for long series)
  6. ๐Ÿ“Š [Optional] Specify timestamp column index (e.g., 0 for first column, leave empty for auto-detect)
  7. โ–ถ๏ธ Click "Start Auto-Annotation"

The app will:

  • Cluster your time series automatically
  • Open interactive HTML previews in your browser with Plotly charts for each cluster
  • Save all preview files in the output folder (not project root)
  • Prompt you to label each cluster interactively

๐Ÿ’ป CLI Usage

Basic Command

autoannotate-ts-cli annotate /path/to/data.csv /path/to/output \
    --n-clusters 5 \
    --model chronos-t5-tiny \
    --create-splits

Advanced CLI Options

autoannotate-ts-cli annotate ./data/sensors.csv ./output \
    --n-clusters 8 \
    --method hdbscan \
    --model chronos-2 \
    --context-length 512 \
    --timestamp-column "datetime" \
    --create-splits \
    --export-format json

Available models: chronos-t5-tiny, chronos-t5-small, chronos-2

Note: CLI uses column names for timestamp (e.g., --timestamp-column "timestamp"), while GUI uses column * indices* (e.g., 0 for first column).

CLI Options Reference

autoannotate-ts-cli annotate INPUT_FILE OUTPUT_DIR [OPTIONS]

Options:
  --n-clusters, -n INTEGER        Number of clusters (required for kmeans/spectral)
  --method, -m [kmeans|hdbscan|spectral|dbscan]
                                  Clustering method (default: kmeans)
  --model [chronos-t5-tiny|chronos-t5-small|chronos-2]
                                  Embedding model (default: chronos-2)
  --batch-size, -b INTEGER        Batch size for embedding extraction (default: 16)
  --n-samples INTEGER             Representative samples per cluster (default: 5)
  --context-length INTEGER        Context length for models (default: 512)
  --timestamp-column TEXT         Timestamp column name (auto-detected if not specified)
  --create-splits                 Create train/val/test splits
  --export-format [csv|json]      Export labels format (default: csv)
  --help                          Show this message and exit

Technical Details:

  • Batch Size: Default is 16 for both GUI and CLI, optimized for memory efficiency
  • Dimensionality Reduction: Automatically applied when dataset has more than 50 time series
  • Context Length: Number of time steps processed by the model (512 for typical series, up to 8192 for chrono-2 and long time-series)

๐Ÿ Python API

from autoannotate import AutoAnnotator
from pathlib import Path

annotator = AutoAnnotator(
    input_file=Path("./data/timeseries.csv"),
    output_dir=Path("./output"),
    model="chronos-t5-tiny",
    clustering_method="kmeans",
    n_clusters=5,
    batch_size=16,
    context_length=512,
    timestamp_column="timestamp"  # Optional
)

result = annotator.run_full_pipeline(
    n_samples=7,
    create_splits=True,
    export_format="csv"
)

print(f"Processed {result['n_timeseries']} time series")
print(f"Created {result['n_clusters']} classes")

Manual Pipeline Control

annotator.load_timeseries()
annotator.extract_embeddings()
annotator.cluster()

stats = annotator.get_cluster_stats()
print(f"Found {stats['n_clusters']} clusters")

class_names = {
    0: "increasing_trend",
    1: "decreasing_trend",
    2: "seasonal_pattern",
    3: "stationary"
}

annotator.organize_dataset(class_names)
annotator.export_labels(format="json")

๐Ÿ“Š Example: Real-World Sensor Data

Input CSV (sensors.csv):

timestamp,temp_A,temp_B,temp_C,humidity_A,humidity_B
2024-01-01 00:00,22.5,23.1,21.8,65.2,64.8
2024-01-01 01:00,22.8,23.0,21.9,65.5,64.9
2024-01-01 02:00,23.1,22.9,22.1,65.8,65.1
...

Command:

autoannotate-ts-cli annotate sensors.csv ./organized \
    --n-clusters 3 \
    --timestamp-column "timestamp"

Output:

organized/
โ”œโ”€โ”€ stable_temperature/
โ”‚   โ””โ”€โ”€ stable_temperature.csv        # Contains: timestamp, temp_A, temp_C
โ”œโ”€โ”€ variable_temperature/
โ”‚   โ””โ”€โ”€ variable_temperature.csv      # Contains: timestamp, temp_B
โ”œโ”€โ”€ high_humidity/
โ”‚   โ””โ”€โ”€ high_humidity.csv             # Contains: timestamp, humidity_A, humidity_B
โ”œโ”€โ”€ cluster_0_preview.html
โ”œโ”€โ”€ cluster_1_preview.html
โ”œโ”€โ”€ cluster_2_preview.html
โ”œโ”€โ”€ metadata.json
โ””โ”€โ”€ labels.csv

๐Ÿง  Model Comparison

Model Context Speed Quality Best For
chronos-t5-tiny 512 โšกโšกโšก โญโญโญ Fast inference, small datasets
chronos-t5-small 512 โšกโšก โญโญโญโญ Balanced (recommended)
chronos-2 up to 8192 โšก โญโญโญโญโญ Best quality, long series (v2 model)

Important Notes:

  • chronos-2 is a completely new architecture (uses Chronos2Pipeline) with support for much longer time series (up to 8192 tokens vs 512)
  • chronos-2 requires chronos-forecasting>=2.0.0
  • For most use cases, chronos-t5-small offers the best balance of speed and quality

๐Ÿ”ฌ Clustering Methods

Method Auto K Handles Noise Best For Installation
kmeans โŒ โŒ Fast, spherical clusters โœ… Included
hdbscan โœ… โœ… Complex shapes, outliers โš ๏ธ Optional: pip install ...[hdbscan]
spectral โŒ โŒ Non-convex shapes โœ… Included
dbscan โœ… โœ… Density-based โœ… Included

Note: HDBSCAN requires separate installation. See Optional Dependencies section.

โœ… Quick Validation

Test if your CSV file is valid:

autoannotate-ts-cli validate ./your_data.csv

This shows:

  • Number of time series columns found
  • Column names
  • Auto-detected timestamp column (if present)

With explicit timestamp column:

autoannotate-ts-cli validate ./your_data.csv --timestamp-column "timestamp"

๐Ÿ” Pre-Push Checklist

Before pushing code:

# Format code with Black
black src/autoannotate tests

# Run tests
pytest tests/ -v

๐Ÿ› Troubleshooting

Out of Memory?

Reduce batch size and context length for large datasets:

annotator = AutoAnnotator(
    input_file=Path("./data.csv"),
    output_dir=Path("./output"),
    batch_size=8,  # Reduce from default 16 to 8
    context_length=256,  # Reduce from default 512 to 256
    model="chronos-t5-tiny"
)

Or for CLI:

autoannotate-ts-cli annotate data.csv output \
    --batch-size 8 \
    --context-length 256 \
    --model chronos-t5-tiny \
    --n-clusters 5

Too Many/Few Clusters?

Try HDBSCAN for automatic cluster detection:

autoannotate-ts-cli annotate data.csv output --method hdbscan

Note: HDBSCAN must be installed first:

pip install autoannotate-timeseries[hdbscan]

If you try to use HDBSCAN without installing it, you'll get an error: ImportError: HDBSCAN is not installed. Install it with: pip install autoannotate-timeseries[hdbscan]

Need to specify timestamp column?

CLI (uses column name):

autoannotate-ts-cli annotate data.csv output --timestamp-column "datetime" --n-clusters 5

GUI (uses column index):

  • Enter 0 for first column, 1 for second column, etc.
  • Leave empty to auto-detect

๐Ÿ”„ Data Preparation Tips

If you have separate CSV files per time series:

Merge them first:

import pandas as pd
from pathlib import Path

dfs = []
for csv_file in Path("./separate_files").glob("*.csv"):
    df = pd.read_csv(csv_file)
    series_name = csv_file.stem
    df_renamed = df.rename(columns={"value": series_name})
    dfs.append(df_renamed)

merged_df = pd.concat(dfs, axis=1)
merged_df.to_csv("combined_timeseries.csv", index=False)

If you have wide format with row-based time series:

Transpose it:

import pandas as pd

df = pd.read_csv("wide_format.csv")
df_transposed = df.T
df_transposed.to_csv("column_format.csv")

๐Ÿค Contributing

  1. Fork the repository
  2. Create feature branch
  3. Format with Black: black src/autoannotate tests
  4. Run tests: pytest tests/ -v
  5. Push and create PR

๐Ÿ“„ License

MIT License - see LICENSE file.

๐Ÿ™ Acknowledgments

Built with PyTorch, Transformers, scikit-learn, Plotly. Foundation models: Chronos-T5 (Amazon), Moirai (Salesforce), Lag-Llama.

Made for the RAIDO Project, from MetaMind Innovations


Sister Project: AutoAnnotate-Vision - For image classification

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

autoannotate_timeseries-0.1.0.tar.gz (41.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

autoannotate_timeseries-0.1.0-py3-none-any.whl (32.8 kB view details)

Uploaded Python 3

File details

Details for the file autoannotate_timeseries-0.1.0.tar.gz.

File metadata

  • Download URL: autoannotate_timeseries-0.1.0.tar.gz
  • Upload date:
  • Size: 41.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for autoannotate_timeseries-0.1.0.tar.gz
Algorithm Hash digest
SHA256 a904170059941e3b45acdcc6bdb900837dfa90f90d397bc18710e8cc19d03d00
MD5 d6fd2e27561f7c1c3e716cbadd9ccb86
BLAKE2b-256 b7c964466e74e86dff585a2bc6d3f9a8ce80c3fa0f4fd95926d0ea047f8990a3

See more details on using hashes here.

File details

Details for the file autoannotate_timeseries-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for autoannotate_timeseries-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9ec3c0d5b1a92db3100319b0ee377953515fc5e099b5c4d9fc67a9f59ed71e10
MD5 3ba3b613a62d69eda4168cdc82e4bd1c
BLAKE2b-256 d2ca6e903c57a8a7323b35ee780ee5b4143d2d7a818a6978a6522a61c40ac0ee

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page