Skip to main content

SOTA unsupervised auto-annotation SDK for time series classification

Project description

AutoAnnotate-TimeSeries ๐Ÿ“Š

State-of-the-art unsupervised auto-annotation SDK for time series classification with GUI

Tests Python 3.10+ License: MIT Code style: black

AutoAnnotate-TimeSeries automatically clusters and organizes unlabeled time series datasets using cutting-edge Chronos foundation models by Amazon. It features a GUI and interactive HTML preview with Plotly charts for visual cluster inspection, and a CLI tool.

โœจ Features

  • ๐ŸŽจ Graphical User Interface: Easy file browser and visual controls via autoannotate-ts
  • ๐Ÿ“ˆ Interactive Plotly Charts: View cluster samples in browser before labeling
  • ๐Ÿค– SOTA Foundation Models: Chronos-T5, Chronos-2
  • ๐Ÿ”ฌ Multiple Clustering: K-means, HDBSCAN, Spectral, DBSCAN
  • ๐Ÿ“ Smart Organization: CSV files named after cluster names for easy identification
  • ๐Ÿ• Flexible Timestamp Handling: Auto-detect or specify timestamp column (GUI uses indices, CLI uses names)
  • ๐Ÿ“‚ Clean Output: HTML preview files saved in output folder alongside results
  • โœ‚๏ธ Auto Splits: Train/val/test dataset splitting
  • ๐Ÿ’พ Export: CSV, JSON formats
  • ๐Ÿ“Š Single CSV Input: All time series in one file
  • ๐Ÿ”Œ Python API: Full programmatic control

๐Ÿš€ Installation

pip install autoannotate-timeseries

Optional Dependencies

HDBSCAN Clustering (Optional):

If you want to use the HDBSCAN clustering method:

# Option 1: Install with the package
pip install autoannotate-timeseries[hdbscan]

# Option 2: Install separately before running autoannotate
pip install hdbscan

Note: HDBSCAN is not required for the default K-means, Spectral, or DBSCAN methods. Only install it if you specifically need HDBSCAN clustering.

Development Tools:

pip install -e .[dev]

After Installation

Two commands are available:

  • autoannotate-ts - Launch the graphical user interface
  • autoannotate-ts-cli - Command-line interface for automation

Check installation:

autoannotate-ts-cli --version
autoannotate-ts-cli --help

๐Ÿ“ Input Data Format

Your CSV Structure

INPUT: One CSV file with multiple time series as columns

timestamp,series_1,series_2,series_3,series_4,series_5
2024-01-01 00:00:00,10.5,20.1,15.3,18.2,22.5
2024-01-01 01:00:00,11.2,19.8,14.9,17.8,23.1
2024-01-01 02:00:00,9.8,21.2,15.7,18.5,21.8
2024-01-01 03:00:00,10.1,19.5,16.1,18.0,22.2
...

Key Points:

  • First column can be timestamp (auto-detected or specify explicitly)
  • Each column = one time series to be clustered
  • Column names are preserved as series identifiers
  • Variable length series supported
  • Missing values automatically handled

Timestamp Column Handling:

  • Auto-detect (recommended): Leave empty in GUI or omit --timestamp-column in CLI
  • GUI: Use column index (0 = first column, 1 = second column, etc.)
  • CLI: Use column name (e.g., --timestamp-column "timestamp")

Specify timestamp column:

autoannotate-ts-cli annotate data.csv output --timestamp-column "datetime" --n-clusters 5

Output Structure

output/
โ”œโ”€โ”€ increasing_trend/
โ”‚   โ””โ”€โ”€ increasing_trend.csv    # Contains series_1, series_4 (all rows)
โ”œโ”€โ”€ decreasing_trend/
โ”‚   โ””โ”€โ”€ decreasing_trend.csv    # Contains series_2 (all rows)
โ”œโ”€โ”€ seasonal/
โ”‚   โ””โ”€โ”€ seasonal.csv            # Contains series_3, series_5 (all rows)
โ”œโ”€โ”€ unclustered/
โ”‚   โ””โ”€โ”€ unclustered.csv         # Outliers/noise
โ”œโ”€โ”€ splits/                     # Available with a CLI parameter
โ”‚   โ”œโ”€โ”€ train/
โ”‚   โ”‚   โ”œโ”€โ”€ increasing_trend/
โ”‚   โ”‚   โ”‚   โ””โ”€โ”€ increasing_trend.csv
โ”‚   โ”‚   โ””โ”€โ”€ ...
โ”‚   โ”œโ”€โ”€ val/
โ”‚   โ””โ”€โ”€ test/
โ”œโ”€โ”€ cluster_0_preview.html      # HTML preview files (saved in output folder)
โ”œโ”€โ”€ cluster_1_preview.html
โ”œโ”€โ”€ cluster_2_preview.html
โ”œโ”€โ”€ metadata.json
โ””โ”€โ”€ labels.csv

Key Points:

  • Each class folder contains ONE CSV file named after the class
  • CSV file includes timestamp column and all time series belonging to that class
  • HTML preview files are saved in the output folder for reference

๐ŸŽจ Quick Start - GUI

The easiest way to use AutoAnnotate-TimeSeries:

autoannotate-ts

Workflow:

  1. ๐Ÿ“ Select input CSV file (with multiple time series as columns)
  2. ๐Ÿ“‚ Select output folder
  3. ๐Ÿ”ข Set number of classes
  4. ๐Ÿค– Choose model
  5. ๐Ÿ“ Configure context length (512 for typical series, 1024+ for long series)
  6. ๐Ÿ“Š [Optional] Specify timestamp column index (e.g., 0 for first column, leave empty for auto-detect)
  7. โ–ถ๏ธ Click "Start Auto-Annotation"

The app will:

  • Cluster your time series automatically
  • Open interactive HTML previews in your browser with Plotly charts for each cluster
  • Save all preview files in the output folder (not project root)
  • Prompt you to label each cluster interactively

๐Ÿ’ป CLI Usage

Basic Command

autoannotate-ts-cli annotate /path/to/data.csv /path/to/output \
    --n-clusters 5 \
    --model chronos-t5-tiny \
    --create-splits

Advanced CLI Options

autoannotate-ts-cli annotate ./data/sensors.csv ./output \
    --n-clusters 8 \
    --method hdbscan \
    --model chronos-2 \
    --context-length 512 \
    --timestamp-column "datetime" \
    --create-splits \
    --export-format json

Available models: chronos-t5-tiny, chronos-t5-small, chronos-2

Note: CLI uses column names for timestamp (e.g., --timestamp-column "timestamp"), while GUI uses column * indices* (e.g., 0 for first column).

CLI Options Reference

autoannotate-ts-cli annotate INPUT_FILE OUTPUT_DIR [OPTIONS]

Options:
  --n-clusters, -n INTEGER        Number of clusters (required for kmeans/spectral)
  --method, -m [kmeans|hdbscan|spectral|dbscan]
                                  Clustering method (default: kmeans)
  --model [chronos-t5-tiny|chronos-t5-small|chronos-2]
                                  Embedding model (default: chronos-2)
  --batch-size, -b INTEGER        Batch size for embedding extraction (default: 16)
  --n-samples INTEGER             Representative samples per cluster (default: 5)
  --context-length INTEGER        Context length for models (default: 512)
  --timestamp-column TEXT         Timestamp column name (auto-detected if not specified)
  --create-splits                 Create train/val/test splits
  --export-format [csv|json]      Export labels format (default: csv)
  --help                          Show this message and exit

Technical Details:

  • Batch Size: Default is 16 for both GUI and CLI, optimized for memory efficiency
  • Dimensionality Reduction: Automatically applied when dataset has more than 50 time series
  • Context Length: Number of time steps processed by the model (512 for typical series, up to 8192 for chrono-2 and long time-series)

๐Ÿ Python API

from autoannotate import AutoAnnotator
from pathlib import Path

annotator = AutoAnnotator(
    input_file=Path("./data/timeseries.csv"),
    output_dir=Path("./output"),
    model="chronos-t5-tiny",
    clustering_method="kmeans",
    n_clusters=5,
    batch_size=16,
    context_length=512,
    timestamp_column="timestamp"  # Optional
)

result = annotator.run_full_pipeline(
    n_samples=7,
    create_splits=True,
    export_format="csv"
)

print(f"Processed {result['n_timeseries']} time series")
print(f"Created {result['n_clusters']} classes")

Manual Pipeline Control

annotator.load_timeseries()
annotator.extract_embeddings()
annotator.cluster()

stats = annotator.get_cluster_stats()
print(f"Found {stats['n_clusters']} clusters")

class_names = {
    0: "increasing_trend",
    1: "decreasing_trend",
    2: "seasonal_pattern",
    3: "stationary"
}

annotator.organize_dataset(class_names)
annotator.export_labels(format="json")

๐Ÿ“Š Example: Real-World Sensor Data

Input CSV (sensors.csv):

timestamp,temp_A,temp_B,temp_C,humidity_A,humidity_B
2024-01-01 00:00,22.5,23.1,21.8,65.2,64.8
2024-01-01 01:00,22.8,23.0,21.9,65.5,64.9
2024-01-01 02:00,23.1,22.9,22.1,65.8,65.1
...

Command:

autoannotate-ts-cli annotate sensors.csv ./organized \
    --n-clusters 3 \
    --timestamp-column "timestamp"

Output:

organized/
โ”œโ”€โ”€ stable_temperature/
โ”‚   โ””โ”€โ”€ stable_temperature.csv        # Contains: timestamp, temp_A, temp_C
โ”œโ”€โ”€ variable_temperature/
โ”‚   โ””โ”€โ”€ variable_temperature.csv      # Contains: timestamp, temp_B
โ”œโ”€โ”€ high_humidity/
โ”‚   โ””โ”€โ”€ high_humidity.csv             # Contains: timestamp, humidity_A, humidity_B
โ”œโ”€โ”€ cluster_0_preview.html
โ”œโ”€โ”€ cluster_1_preview.html
โ”œโ”€โ”€ cluster_2_preview.html
โ”œโ”€โ”€ metadata.json
โ””โ”€โ”€ labels.csv

๐Ÿง  Model Comparison

Model Context Speed Quality Best For
chronos-t5-tiny 512 โšกโšกโšก โญโญโญ Fast inference, small datasets
chronos-t5-small 512 โšกโšก โญโญโญโญ Balanced (recommended)
chronos-2 up to 8192 โšก โญโญโญโญโญ Best quality, long series (v2 model)

Important Notes:

  • chronos-2 is a completely new architecture (uses Chronos2Pipeline) with support for much longer time series (up to 8192 tokens vs 512)
  • chronos-2 requires chronos-forecasting>=2.0.0
  • For most use cases, chronos-t5-small offers the best balance of speed and quality

๐Ÿ”ฌ Clustering Methods

Method Auto K Handles Noise Best For Installation
kmeans โŒ โŒ Fast, spherical clusters โœ… Included
hdbscan โœ… โœ… Complex shapes, outliers โš ๏ธ Optional: pip install ...[hdbscan]
spectral โŒ โŒ Non-convex shapes โœ… Included
dbscan โœ… โœ… Density-based โœ… Included

Note: HDBSCAN requires separate installation. See Optional Dependencies section.

โœ… Quick Validation

Test if your CSV file is valid:

autoannotate-ts-cli validate ./your_data.csv

This shows:

  • Number of time series columns found
  • Column names
  • Auto-detected timestamp column (if present)

With explicit timestamp column:

autoannotate-ts-cli validate ./your_data.csv --timestamp-column "timestamp"

๐Ÿ” Pre-Push Checklist

Before pushing code:

# Format code with Black
black src/autoannotate tests

# Run tests
pytest tests/ -v

๐Ÿ› Troubleshooting

Out of Memory?

Reduce batch size and context length for large datasets:

annotator = AutoAnnotator(
    input_file=Path("./data.csv"),
    output_dir=Path("./output"),
    batch_size=8,  # Reduce from default 16 to 8
    context_length=256,  # Reduce from default 512 to 256
    model="chronos-t5-tiny"
)

Or for CLI:

autoannotate-ts-cli annotate data.csv output \
    --batch-size 8 \
    --context-length 256 \
    --model chronos-t5-tiny \
    --n-clusters 5

Too Many/Few Clusters?

Try HDBSCAN for automatic cluster detection:

autoannotate-ts-cli annotate data.csv output --method hdbscan

Note: HDBSCAN must be installed first:

pip install autoannotate-timeseries[hdbscan]

If you try to use HDBSCAN without installing it, you'll get an error: ImportError: HDBSCAN is not installed. Install it with: pip install autoannotate-timeseries[hdbscan]

Need to specify timestamp column?

CLI (uses column name):

autoannotate-ts-cli annotate data.csv output --timestamp-column "datetime" --n-clusters 5

GUI (uses column index):

  • Enter 0 for first column, 1 for second column, etc.
  • Leave empty to auto-detect

๐Ÿ”„ Data Preparation Tips

If you have separate CSV files per time series:

Merge them first:

import pandas as pd
from pathlib import Path

dfs = []
for csv_file in Path("./separate_files").glob("*.csv"):
    df = pd.read_csv(csv_file)
    series_name = csv_file.stem
    df_renamed = df.rename(columns={"value": series_name})
    dfs.append(df_renamed)

merged_df = pd.concat(dfs, axis=1)
merged_df.to_csv("combined_timeseries.csv", index=False)

If you have wide format with row-based time series:

Transpose it:

import pandas as pd

df = pd.read_csv("wide_format.csv")
df_transposed = df.T
df_transposed.to_csv("column_format.csv")

๐Ÿค Contributing

  1. Fork the repository
  2. Create feature branch
  3. Format with Black: black src/autoannotate tests
  4. Run tests: pytest tests/ -v
  5. Push and create PR

๐Ÿ“„ License

MIT License - see LICENSE file.

๐Ÿ™ Acknowledgments

Built with PyTorch, scikit-learn, pandas, numpy and more. Foundation models: Chronos-T5 and Chronos-2 (Amazon)

Made for the RAIDO Project, from MetaMind Innovations


Sister Project: AutoAnnotate-Vision - For image auto-annotation

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

autoannotate_timeseries-0.1.4.tar.gz (41.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

autoannotate_timeseries-0.1.4-py3-none-any.whl (32.8 kB view details)

Uploaded Python 3

File details

Details for the file autoannotate_timeseries-0.1.4.tar.gz.

File metadata

  • Download URL: autoannotate_timeseries-0.1.4.tar.gz
  • Upload date:
  • Size: 41.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for autoannotate_timeseries-0.1.4.tar.gz
Algorithm Hash digest
SHA256 d8cadcdbfe3b0ae9125c865f5b8dda3a4e57e1d96f6071d93914e7d394b83a83
MD5 f32a2f11fd99b35ee9ac1d24932e53d0
BLAKE2b-256 b7b541971563267d399d6eeecf90466f7482c3cf48882646c812fdd1491deb5d

See more details on using hashes here.

File details

Details for the file autoannotate_timeseries-0.1.4-py3-none-any.whl.

File metadata

File hashes

Hashes for autoannotate_timeseries-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 d737428def662e04fedb844e63c72ed86c5f64aa45c358b71d9b406f54437401
MD5 ea40c733363a477355963af630d98fe3
BLAKE2b-256 63aab38393ebea238b7239f09f15cc4373a55c541d94c65e475ed0fea6f3e77b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page