SOTA unsupervised auto-annotation SDK for time series classification

These details have not been verified by PyPI

Project links

Project description

AutoAnnotate-TimeSeries 📊

State-of-the-art unsupervised auto-annotation SDK for time series classification with GUI

AutoAnnotate-TimeSeries automatically clusters and organizes unlabeled time series datasets using cutting-edge **Chronos ** foundation models by Amazon. It features a GUI and interactive HTML preview with Plotly charts for visual cluster inspection, and a CLI tool.

✨ Features

🎨 Graphical User Interface: Easy file browser and visual controls via autoannotate-ts
📈 Interactive Plotly Charts: View cluster samples in browser before labeling
🤖 SOTA Foundation Models: Chronos-T5, Chronos-2
🔬 Multiple Clustering: K-means, HDBSCAN, Spectral, DBSCAN
📁 Smart Organization: CSV files named after cluster names for easy identification
🕐 Flexible Timestamp Handling: Auto-detect or specify timestamp column (GUI uses indices, CLI uses names)
📂 Clean Output: HTML preview files saved in output folder alongside results
✂️ Auto Splits: Train/val/test dataset splitting
💾 Export: CSV, JSON formats
📊 Single CSV Input: All time series in one file
🔌 Python API: Full programmatic control

🚀 Installation

pip install autoannotate-timeseries

Optional Dependencies

HDBSCAN Clustering (Optional):

If you want to use the HDBSCAN clustering method:

# Option 1: Install with the package
pip install autoannotate-timeseries[hdbscan]

# Option 2: Install separately before running autoannotate
pip install hdbscan

Note: HDBSCAN is not required for the default K-means, Spectral, or DBSCAN methods. Only install it if you specifically need HDBSCAN clustering.

Development Tools:

pip install -e .[dev]

After Installation

Two commands are available:

autoannotate-ts - Launch the graphical user interface
autoannotate-ts-cli - Command-line interface for automation

Check installation:

autoannotate-ts-cli --version
autoannotate-ts-cli --help

📝 Input Data Format

Your CSV Structure

INPUT: One CSV file with multiple time series as columns

timestamp,series_1,series_2,series_3,series_4,series_5
2024-01-01 00:00:00,10.5,20.1,15.3,18.2,22.5
2024-01-01 01:00:00,11.2,19.8,14.9,17.8,23.1
2024-01-01 02:00:00,9.8,21.2,15.7,18.5,21.8
2024-01-01 03:00:00,10.1,19.5,16.1,18.0,22.2
...

Key Points:

First column can be timestamp (auto-detected or specify explicitly)
Each column = one time series to be clustered
Column names are preserved as series identifiers
Variable length series supported
Missing values automatically handled

Timestamp Column Handling:

Auto-detect (recommended): Leave empty in GUI or omit --timestamp-column in CLI
GUI: Use column index (0 = first column, 1 = second column, etc.)
CLI: Use column name (e.g., --timestamp-column "timestamp")

Specify timestamp column:

autoannotate-ts-cli annotate data.csv output --timestamp-column "datetime" --n-clusters 5

Output Structure

output/
├── increasing_trend/
│   └── increasing_trend.csv    # Contains series_1, series_4 (all rows)
├── decreasing_trend/
│   └── decreasing_trend.csv    # Contains series_2 (all rows)
├── seasonal/
│   └── seasonal.csv            # Contains series_3, series_5 (all rows)
├── unclustered/
│   └── unclustered.csv         # Outliers/noise
├── splits/                     # Available with a CLI parameter
│   ├── train/
│   │   ├── increasing_trend/
│   │   │   └── increasing_trend.csv
│   │   └── ...
│   ├── val/
│   └── test/
├── cluster_0_preview.html      # HTML preview files (saved in output folder)
├── cluster_1_preview.html
├── cluster_2_preview.html
├── metadata.json
└── labels.csv

Key Points:

Each class folder contains ONE CSV file named after the class
CSV file includes timestamp column and all time series belonging to that class
HTML preview files are saved in the output folder for reference

🎨 Quick Start - GUI

The easiest way to use AutoAnnotate-TimeSeries:

autoannotate-ts

Workflow:

📁 Select input CSV file (with multiple time series as columns)
📂 Select output folder
🔢 Set number of classes
🤖 Choose model
📏 Configure context length (512 for typical series, 1024+ for long series)
📊 [Optional] Specify timestamp column index (e.g., 0 for first column, leave empty for auto-detect)
▶️ Click "Start Auto-Annotation"

The app will:

Cluster your time series automatically
Open interactive HTML previews in your browser with Plotly charts for each cluster
Save all preview files in the output folder (not project root)
Prompt you to label each cluster interactively

💻 CLI Usage

Basic Command

autoannotate-ts-cli annotate /path/to/data.csv /path/to/output \
    --n-clusters 5 \
    --model chronos-t5-tiny \
    --create-splits

Advanced CLI Options

autoannotate-ts-cli annotate ./data/sensors.csv ./output \
    --n-clusters 8 \
    --method hdbscan \
    --model chronos-2 \
    --context-length 512 \
    --timestamp-column "datetime" \
    --create-splits \
    --export-format json

Available models: chronos-t5-tiny, chronos-t5-small, chronos-2

Note: CLI uses column names for timestamp (e.g., --timestamp-column "timestamp"), while GUI uses column * indices* (e.g., 0 for first column).

CLI Options Reference

autoannotate-ts-cli annotate INPUT_FILE OUTPUT_DIR [OPTIONS]

Options:
  --n-clusters, -n INTEGER        Number of clusters (required for kmeans/spectral)
  --method, -m [kmeans|hdbscan|spectral|dbscan]
                                  Clustering method (default: kmeans)
  --model [chronos-t5-tiny|chronos-t5-small|chronos-2]
                                  Embedding model (default: chronos-2)
  --batch-size, -b INTEGER        Batch size for embedding extraction (default: 16)
  --n-samples INTEGER             Representative samples per cluster (default: 5)
  --context-length INTEGER        Context length for models (default: 512)
  --timestamp-column TEXT         Timestamp column name (auto-detected if not specified)
  --create-splits                 Create train/val/test splits
  --export-format [csv|json]      Export labels format (default: csv)
  --help                          Show this message and exit

Technical Details:

Batch Size: Default is 16 for both GUI and CLI, optimized for memory efficiency
Dimensionality Reduction: Automatically applied when dataset has more than 50 time series
Context Length: Number of time steps processed by the model (512 for typical series, up to 8192 for chrono-2 and long time-series)

🐍 Python API

from autoannotate import AutoAnnotator
from pathlib import Path

annotator = AutoAnnotator(
    input_file=Path("./data/timeseries.csv"),
    output_dir=Path("./output"),
    model="chronos-t5-tiny",
    clustering_method="kmeans",
    n_clusters=5,
    batch_size=16,
    context_length=512,
    timestamp_column="timestamp"  # Optional
)

result = annotator.run_full_pipeline(
    n_samples=7,
    create_splits=True,
    export_format="csv"
)

print(f"Processed {result['n_timeseries']} time series")
print(f"Created {result['n_clusters']} classes")

Manual Pipeline Control

annotator.load_timeseries()
annotator.extract_embeddings()
annotator.cluster()

stats = annotator.get_cluster_stats()
print(f"Found {stats['n_clusters']} clusters")

class_names = {
    0: "increasing_trend",
    1: "decreasing_trend",
    2: "seasonal_pattern",
    3: "stationary"
}

annotator.organize_dataset(class_names)
annotator.export_labels(format="json")

📊 Example: Real-World Sensor Data

Input CSV (sensors.csv):

timestamp,temp_A,temp_B,temp_C,humidity_A,humidity_B
2024-01-01 00:00,22.5,23.1,21.8,65.2,64.8
2024-01-01 01:00,22.8,23.0,21.9,65.5,64.9
2024-01-01 02:00,23.1,22.9,22.1,65.8,65.1
...

Command:

autoannotate-ts-cli annotate sensors.csv ./organized \
    --n-clusters 3 \
    --timestamp-column "timestamp"

Output:

organized/
├── stable_temperature/
│   └── stable_temperature.csv        # Contains: timestamp, temp_A, temp_C
├── variable_temperature/
│   └── variable_temperature.csv      # Contains: timestamp, temp_B
├── high_humidity/
│   └── high_humidity.csv             # Contains: timestamp, humidity_A, humidity_B
├── cluster_0_preview.html
├── cluster_1_preview.html
├── cluster_2_preview.html
├── metadata.json
└── labels.csv

🧠 Model Comparison

Model	Context	Speed	Quality	Best For
chronos-t5-tiny	512	⚡⚡⚡	⭐⭐⭐	Fast inference, small datasets
chronos-t5-small	512	⚡⚡	⭐⭐⭐⭐	Balanced (recommended)
chronos-2	up to 8192	⚡	⭐⭐⭐⭐⭐	Best quality, long series (v2 model)

Important Notes:

chronos-2 is a completely new architecture (uses Chronos2Pipeline) with support for much longer time series (up to 8192 tokens vs 512)
chronos-2 requires chronos-forecasting>=2.0.0
For most use cases, chronos-t5-small offers the best balance of speed and quality

🔬 Clustering Methods

Method	Auto K	Handles Noise	Best For	Installation
kmeans	❌	❌	Fast, spherical clusters	✅ Included
hdbscan	✅	✅	Complex shapes, outliers	⚠️ Optional: `pip install ...[hdbscan]`
spectral	❌	❌	Non-convex shapes	✅ Included
dbscan	✅	✅	Density-based	✅ Included

Note: HDBSCAN requires separate installation. See Optional Dependencies section.

✅ Quick Validation

Test if your CSV file is valid:

autoannotate-ts-cli validate ./your_data.csv

This shows:

Number of time series columns found
Column names
Auto-detected timestamp column (if present)

With explicit timestamp column:

autoannotate-ts-cli validate ./your_data.csv --timestamp-column "timestamp"

🔍 Pre-Push Checklist

Before pushing code:

# Format code with Black
black src/autoannotate tests

# Run tests
pytest tests/ -v

🐛 Troubleshooting

Out of Memory?

Reduce batch size and context length for large datasets:

annotator = AutoAnnotator(
    input_file=Path("./data.csv"),
    output_dir=Path("./output"),
    batch_size=8,  # Reduce from default 16 to 8
    context_length=256,  # Reduce from default 512 to 256
    model="chronos-t5-tiny"
)

Or for CLI:

autoannotate-ts-cli annotate data.csv output \
    --batch-size 8 \
    --context-length 256 \
    --model chronos-t5-tiny \
    --n-clusters 5

Too Many/Few Clusters?

Try HDBSCAN for automatic cluster detection:

autoannotate-ts-cli annotate data.csv output --method hdbscan

Note: HDBSCAN must be installed first:

pip install autoannotate-timeseries[hdbscan]

If you try to use HDBSCAN without installing it, you'll get an error: ImportError: HDBSCAN is not installed. Install it with: pip install autoannotate-timeseries[hdbscan]

Need to specify timestamp column?

CLI (uses column name):

autoannotate-ts-cli annotate data.csv output --timestamp-column "datetime" --n-clusters 5

GUI (uses column index):

Enter 0 for first column, 1 for second column, etc.
Leave empty to auto-detect

🔄 Data Preparation Tips

If you have separate CSV files per time series:

Merge them first:

import pandas as pd
from pathlib import Path

dfs = []
for csv_file in Path("./separate_files").glob("*.csv"):
    df = pd.read_csv(csv_file)
    series_name = csv_file.stem
    df_renamed = df.rename(columns={"value": series_name})
    dfs.append(df_renamed)

merged_df = pd.concat(dfs, axis=1)
merged_df.to_csv("combined_timeseries.csv", index=False)

If you have wide format with row-based time series:

Transpose it:

import pandas as pd

df = pd.read_csv("wide_format.csv")
df_transposed = df.T
df_transposed.to_csv("column_format.csv")

🤝 Contributing

Fork the repository
Create feature branch
Format with Black: black src/autoannotate tests
Run tests: pytest tests/ -v
Push and create PR

📄 License

MIT License - see LICENSE file.

🙏 Acknowledgments

Built with PyTorch, scikit-learn, pandas, numpy and more. Foundation models: Chronos-T5 and Chronos-2 (Amazon)

Made for the RAIDO Project, from MetaMind Innovations

Sister Project: AutoAnnotate-Vision - For image auto-annotation

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.6

Nov 26, 2025

0.1.5

Nov 25, 2025

0.1.4

Nov 25, 2025

This version

0.1.3

Nov 25, 2025

0.1.2

Nov 25, 2025

0.1.1

Nov 25, 2025

0.1.0

Nov 25, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

autoannotate_timeseries-0.1.3.tar.gz (41.8 kB view details)

Uploaded Nov 25, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

autoannotate_timeseries-0.1.3-py3-none-any.whl (32.8 kB view details)

Uploaded Nov 25, 2025 Python 3

File details

Details for the file autoannotate_timeseries-0.1.3.tar.gz.

File metadata

Download URL: autoannotate_timeseries-0.1.3.tar.gz
Upload date: Nov 25, 2025
Size: 41.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for autoannotate_timeseries-0.1.3.tar.gz
Algorithm	Hash digest
SHA256	`69df2090af90cc589a0fbba1c2fab6fb10977f74f505c52be7dd42efd3a68204`
MD5	`8288956c0661b6075c7b6b3134775591`
BLAKE2b-256	`420e2cf5f32fed52f3bfd314d055b7c2ab4ed95f42a0904156ef2bea09769d67`

See more details on using hashes here.

File details

Details for the file autoannotate_timeseries-0.1.3-py3-none-any.whl.

File metadata

Download URL: autoannotate_timeseries-0.1.3-py3-none-any.whl
Upload date: Nov 25, 2025
Size: 32.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for autoannotate_timeseries-0.1.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3c1906e6daab7ce5045b52bd64ddaecf1ca7654e5fe89481175fcff3c12713a7`
MD5	`05bacd6d490bd0f72cdb81de05461e8c`
BLAKE2b-256	`290ab97f79852034c111a1ac6a03d1885656937d408657fce143a792163ac37c`

See more details on using hashes here.

autoannotate-timeseries 0.1.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

AutoAnnotate-TimeSeries 📊

✨ Features

🚀 Installation

Optional Dependencies

After Installation

📝 Input Data Format

Your CSV Structure

Output Structure

🎨 Quick Start - GUI

💻 CLI Usage

Basic Command

Advanced CLI Options

CLI Options Reference

🐍 Python API

Manual Pipeline Control

📊 Example: Real-World Sensor Data

🧠 Model Comparison

🔬 Clustering Methods

✅ Quick Validation

🔍 Pre-Push Checklist

🐛 Troubleshooting

Out of Memory?

Too Many/Few Clusters?

Need to specify timestamp column?

🔄 Data Preparation Tips

If you have separate CSV files per time series:

If you have wide format with row-based time series:

🤝 Contributing

📄 License

🙏 Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes