SOTA unsupervised auto-annotation SDK for time series classification
Project description
AutoAnnotate-TimeSeries ๐
State-of-the-art unsupervised auto-annotation SDK for time series classification with GUI
AutoAnnotate-TimeSeries automatically clusters and organizes unlabeled time series datasets using cutting-edge **Chronos ** foundation models by Amazon. It features a GUI and interactive HTML preview with Plotly charts for visual cluster inspection, and a CLI tool.
โจ Features
- ๐จ Graphical User Interface: Easy file browser and visual controls via
autoannotate-ts - ๐ Interactive Plotly Charts: View cluster samples in browser before labeling
- ๐ค SOTA Foundation Models: Chronos-T5, Chronos-2
- ๐ฌ Multiple Clustering: K-means, HDBSCAN, Spectral, DBSCAN
- ๐ Smart Organization: CSV files named after cluster names for easy identification
- ๐ Flexible Timestamp Handling: Auto-detect or specify timestamp column (GUI uses indices, CLI uses names)
- ๐ Clean Output: HTML preview files saved in output folder alongside results
- โ๏ธ Auto Splits: Train/val/test dataset splitting
- ๐พ Export: CSV, JSON formats
- ๐ Single CSV Input: All time series in one file
- ๐ Python API: Full programmatic control
๐ Installation
pip install autoannotate-timeseries
Optional Dependencies
HDBSCAN Clustering (Optional):
If you want to use the HDBSCAN clustering method:
# Option 1: Install with the package
pip install autoannotate-timeseries[hdbscan]
# Option 2: Install separately before running autoannotate
pip install hdbscan
Note: HDBSCAN is not required for the default K-means, Spectral, or DBSCAN methods. Only install it if you specifically need HDBSCAN clustering.
Development Tools:
pip install -e .[dev]
After Installation
Two commands are available:
autoannotate-ts- Launch the graphical user interfaceautoannotate-ts-cli- Command-line interface for automation
Check installation:
autoannotate-ts-cli --version
autoannotate-ts-cli --help
๐ Input Data Format
Your CSV Structure
INPUT: One CSV file with multiple time series as columns
timestamp,series_1,series_2,series_3,series_4,series_5
2024-01-01 00:00:00,10.5,20.1,15.3,18.2,22.5
2024-01-01 01:00:00,11.2,19.8,14.9,17.8,23.1
2024-01-01 02:00:00,9.8,21.2,15.7,18.5,21.8
2024-01-01 03:00:00,10.1,19.5,16.1,18.0,22.2
...
Key Points:
- First column can be timestamp (auto-detected or specify explicitly)
- Each column = one time series to be clustered
- Column names are preserved as series identifiers
- Variable length series supported
- Missing values automatically handled
Timestamp Column Handling:
- Auto-detect (recommended): Leave empty in GUI or omit
--timestamp-columnin CLI - GUI: Use column index (0 = first column, 1 = second column, etc.)
- CLI: Use column name (e.g.,
--timestamp-column "timestamp")
Specify timestamp column:
autoannotate-ts-cli annotate data.csv output --timestamp-column "datetime" --n-clusters 5
Output Structure
output/
โโโ increasing_trend/
โ โโโ increasing_trend.csv # Contains series_1, series_4 (all rows)
โโโ decreasing_trend/
โ โโโ decreasing_trend.csv # Contains series_2 (all rows)
โโโ seasonal/
โ โโโ seasonal.csv # Contains series_3, series_5 (all rows)
โโโ unclustered/
โ โโโ unclustered.csv # Outliers/noise
โโโ splits/ # Available with a CLI parameter
โ โโโ train/
โ โ โโโ increasing_trend/
โ โ โ โโโ increasing_trend.csv
โ โ โโโ ...
โ โโโ val/
โ โโโ test/
โโโ cluster_0_preview.html # HTML preview files (saved in output folder)
โโโ cluster_1_preview.html
โโโ cluster_2_preview.html
โโโ metadata.json
โโโ labels.csv
Key Points:
- Each class folder contains ONE CSV file named after the class
- CSV file includes timestamp column and all time series belonging to that class
- HTML preview files are saved in the output folder for reference
๐จ Quick Start - GUI
The easiest way to use AutoAnnotate-TimeSeries:
autoannotate-ts
Workflow:
- ๐ Select input CSV file (with multiple time series as columns)
- ๐ Select output folder
- ๐ข Set number of classes
- ๐ค Choose model
- ๐ Configure context length (512 for typical series, 1024+ for long series)
- ๐ [Optional] Specify timestamp column index (e.g., 0 for first column, leave empty for auto-detect)
- โถ๏ธ Click "Start Auto-Annotation"
The app will:
- Cluster your time series automatically
- Open interactive HTML previews in your browser with Plotly charts for each cluster
- Save all preview files in the output folder (not project root)
- Prompt you to label each cluster interactively
๐ป CLI Usage
Basic Command
autoannotate-ts-cli annotate /path/to/data.csv /path/to/output \
--n-clusters 5 \
--model chronos-t5-tiny \
--create-splits
Advanced CLI Options
autoannotate-ts-cli annotate ./data/sensors.csv ./output \
--n-clusters 8 \
--method hdbscan \
--model chronos-2 \
--context-length 512 \
--timestamp-column "datetime" \
--create-splits \
--export-format json
Available models: chronos-t5-tiny, chronos-t5-small, chronos-2
Note: CLI uses column names for timestamp (e.g., --timestamp-column "timestamp"), while GUI uses column *
indices* (e.g., 0 for first column).
CLI Options Reference
autoannotate-ts-cli annotate INPUT_FILE OUTPUT_DIR [OPTIONS]
Options:
--n-clusters, -n INTEGER Number of clusters (required for kmeans/spectral)
--method, -m [kmeans|hdbscan|spectral|dbscan]
Clustering method (default: kmeans)
--model [chronos-t5-tiny|chronos-t5-small|chronos-2]
Embedding model (default: chronos-2)
--batch-size, -b INTEGER Batch size for embedding extraction (default: 16)
--n-samples INTEGER Representative samples per cluster (default: 5)
--context-length INTEGER Context length for models (default: 512)
--timestamp-column TEXT Timestamp column name (auto-detected if not specified)
--create-splits Create train/val/test splits
--export-format [csv|json] Export labels format (default: csv)
--help Show this message and exit
Technical Details:
- Batch Size: Default is 16 for both GUI and CLI, optimized for memory efficiency
- Dimensionality Reduction: Automatically applied when dataset has more than 50 time series
- Context Length: Number of time steps processed by the model (512 for typical series, up to 8192 for chrono-2 and long time-series)
๐ Python API
from autoannotate import AutoAnnotator
from pathlib import Path
annotator = AutoAnnotator(
input_file=Path("./data/timeseries.csv"),
output_dir=Path("./output"),
model="chronos-t5-tiny",
clustering_method="kmeans",
n_clusters=5,
batch_size=16,
context_length=512,
timestamp_column="timestamp" # Optional
)
result = annotator.run_full_pipeline(
n_samples=7,
create_splits=True,
export_format="csv"
)
print(f"Processed {result['n_timeseries']} time series")
print(f"Created {result['n_clusters']} classes")
Manual Pipeline Control
annotator.load_timeseries()
annotator.extract_embeddings()
annotator.cluster()
stats = annotator.get_cluster_stats()
print(f"Found {stats['n_clusters']} clusters")
class_names = {
0: "increasing_trend",
1: "decreasing_trend",
2: "seasonal_pattern",
3: "stationary"
}
annotator.organize_dataset(class_names)
annotator.export_labels(format="json")
๐ Example: Real-World Sensor Data
Input CSV (sensors.csv):
timestamp,temp_A,temp_B,temp_C,humidity_A,humidity_B
2024-01-01 00:00,22.5,23.1,21.8,65.2,64.8
2024-01-01 01:00,22.8,23.0,21.9,65.5,64.9
2024-01-01 02:00,23.1,22.9,22.1,65.8,65.1
...
Command:
autoannotate-ts-cli annotate sensors.csv ./organized \
--n-clusters 3 \
--timestamp-column "timestamp"
Output:
organized/
โโโ stable_temperature/
โ โโโ stable_temperature.csv # Contains: timestamp, temp_A, temp_C
โโโ variable_temperature/
โ โโโ variable_temperature.csv # Contains: timestamp, temp_B
โโโ high_humidity/
โ โโโ high_humidity.csv # Contains: timestamp, humidity_A, humidity_B
โโโ cluster_0_preview.html
โโโ cluster_1_preview.html
โโโ cluster_2_preview.html
โโโ metadata.json
โโโ labels.csv
๐ง Model Comparison
| Model | Context | Speed | Quality | Best For |
|---|---|---|---|---|
| chronos-t5-tiny | 512 | โกโกโก | โญโญโญ | Fast inference, small datasets |
| chronos-t5-small | 512 | โกโก | โญโญโญโญ | Balanced (recommended) |
| chronos-2 | up to 8192 | โก | โญโญโญโญโญ | Best quality, long series (v2 model) |
Important Notes:
- chronos-2 is a completely new architecture (uses
Chronos2Pipeline) with support for much longer time series (up to 8192 tokens vs 512) - chronos-2 requires
chronos-forecasting>=2.0.0 - For most use cases,
chronos-t5-smalloffers the best balance of speed and quality
๐ฌ Clustering Methods
| Method | Auto K | Handles Noise | Best For | Installation |
|---|---|---|---|---|
| kmeans | โ | โ | Fast, spherical clusters | โ Included |
| hdbscan | โ | โ | Complex shapes, outliers | โ ๏ธ Optional: pip install ...[hdbscan] |
| spectral | โ | โ | Non-convex shapes | โ Included |
| dbscan | โ | โ | Density-based | โ Included |
Note: HDBSCAN requires separate installation. See Optional Dependencies section.
โ Quick Validation
Test if your CSV file is valid:
autoannotate-ts-cli validate ./your_data.csv
This shows:
- Number of time series columns found
- Column names
- Auto-detected timestamp column (if present)
With explicit timestamp column:
autoannotate-ts-cli validate ./your_data.csv --timestamp-column "timestamp"
๐ Pre-Push Checklist
Before pushing code:
# Format code with Black
black src/autoannotate tests
# Run tests
pytest tests/ -v
๐ Troubleshooting
Out of Memory?
Reduce batch size and context length for large datasets:
annotator = AutoAnnotator(
input_file=Path("./data.csv"),
output_dir=Path("./output"),
batch_size=8, # Reduce from default 16 to 8
context_length=256, # Reduce from default 512 to 256
model="chronos-t5-tiny"
)
Or for CLI:
autoannotate-ts-cli annotate data.csv output \
--batch-size 8 \
--context-length 256 \
--model chronos-t5-tiny \
--n-clusters 5
Too Many/Few Clusters?
Try HDBSCAN for automatic cluster detection:
autoannotate-ts-cli annotate data.csv output --method hdbscan
Note: HDBSCAN must be installed first:
pip install autoannotate-timeseries[hdbscan]
If you try to use HDBSCAN without installing it, you'll get an error:
ImportError: HDBSCAN is not installed. Install it with: pip install autoannotate-timeseries[hdbscan]
Need to specify timestamp column?
CLI (uses column name):
autoannotate-ts-cli annotate data.csv output --timestamp-column "datetime" --n-clusters 5
GUI (uses column index):
- Enter
0for first column,1for second column, etc. - Leave empty to auto-detect
๐ Data Preparation Tips
If you have separate CSV files per time series:
Merge them first:
import pandas as pd
from pathlib import Path
dfs = []
for csv_file in Path("./separate_files").glob("*.csv"):
df = pd.read_csv(csv_file)
series_name = csv_file.stem
df_renamed = df.rename(columns={"value": series_name})
dfs.append(df_renamed)
merged_df = pd.concat(dfs, axis=1)
merged_df.to_csv("combined_timeseries.csv", index=False)
If you have wide format with row-based time series:
Transpose it:
import pandas as pd
df = pd.read_csv("wide_format.csv")
df_transposed = df.T
df_transposed.to_csv("column_format.csv")
๐ค Contributing
- Fork the repository
- Create feature branch
- Format with Black:
black src/autoannotate tests - Run tests:
pytest tests/ -v - Push and create PR
๐ License
MIT License - see LICENSE file.
๐ Acknowledgments
Built with PyTorch, scikit-learn, pandas, numpy and more. Foundation models: Chronos-T5 and Chronos-2 (Amazon)
Made for the RAIDO Project, from MetaMind Innovations
Sister Project: AutoAnnotate-Vision - For image auto-annotation
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file autoannotate_timeseries-0.1.3.tar.gz.
File metadata
- Download URL: autoannotate_timeseries-0.1.3.tar.gz
- Upload date:
- Size: 41.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
69df2090af90cc589a0fbba1c2fab6fb10977f74f505c52be7dd42efd3a68204
|
|
| MD5 |
8288956c0661b6075c7b6b3134775591
|
|
| BLAKE2b-256 |
420e2cf5f32fed52f3bfd314d055b7c2ab4ed95f42a0904156ef2bea09769d67
|
File details
Details for the file autoannotate_timeseries-0.1.3-py3-none-any.whl.
File metadata
- Download URL: autoannotate_timeseries-0.1.3-py3-none-any.whl
- Upload date:
- Size: 32.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3c1906e6daab7ce5045b52bd64ddaecf1ca7654e5fe89481175fcff3c12713a7
|
|
| MD5 |
05bacd6d490bd0f72cdb81de05461e8c
|
|
| BLAKE2b-256 |
290ab97f79852034c111a1ac6a03d1885656937d408657fce143a792163ac37c
|