Automatic classification and localization of fluctuating signals in spectrograms
Project description
TokEye
TokEye is a open-source Python-based application for automatic classification and localization of fluctuating signals. It is designed to be used in the context of plasma physics, but can be used for any type of fluctuating signal.
Check out this poster from APS DPP 2025 or this preprint for more information.
Example Demonstration
Expected processing time:
- V100: < 0.5 seconds on any size spectrogram after warmup.
- CPU: ~5-10 seconds.
Quickstart
pip install tokeye # or: uv tool install tokeye
tokeye app # opens web app on http://localhost:7860
- The default model downloads automatically from Hugging Face on first use (~30 MB, cached — no manual setup).
- No data on hand? Click "Load Example Signal" in the app, or generate one from the shell with
tokeye example. pip installrequires Python >= 3.13;uvx/uv tool installfetch a compatible Python automatically.
Zero-install trial: uvx tokeye app runs the app without installing anything into your environment.
Python API
To use TokEye inside your own program, import the TokEye class and call it — no configuration needed:
import numpy as np
from tokeye import TokEye
eye = TokEye() # loads the default model (auto-downloads on first use)
mask = eye(signal) # 1D time series → STFT → inference
mask = eye(spectrogram) # 2D spectrogram → inference directly
coherent, transient = mask # (2, H, W) sigmoid scores in [0, 1]
Input is auto-detected by shape: a 1D array is treated as a raw time series (TokEye computes the spectrogram), a 2D array as a ready spectrogram. Standardization happens internally — no preprocessing needed.
If your 2D spectrogram is stored in linear scale (raw STFT magnitude/power), pass log=True so TokEye applies log1p first — the model expects log-scaled input:
mask = eye(linear_spectrogram, log=True) # per call
eye = TokEye(log=True) # or for every call
log is off by default and ignored for 1D inputs (the STFT already log-scales). Everything is configurable through the constructor, but the defaults just work:
eye = TokEye(
model="big_tf_unet", # registry name or path to a local .pt/.pt2
device="auto", # "cpu", "cuda", or "auto"
n_fft=1024, hop=256, # STFT settings (1D inputs only)
clip_dc=True, clip_low=1.0, clip_high=99.0,
log=False, # log1p for linear-scale 2D spectrograms
)
Batch processing (CLI)
For headless / scripted use (no browser needed), run inference directly. For example:
tokeye run "files/*.npy" --output-dir results
INPUT arguments can be files, directories (all *.npy files inside are used), or quoted glob patterns. Each input is interpreted by its shape:
- 1D array — a raw time series. TokEye computes its STFT spectrogram using the flags below before running inference.
- 2D array — a precomputed spectrogram, fed to the model directly.
For each input file, tokeye run writes:
<stem>_mask.npy— float32 array, shape(2, H, W), sigmoid scores per pixel (channel 0 = coherent, channel 1 = transient).<stem>_preview.png— a grayscale spectrogram with the mask overlaid (green = coherent, red = transient), unless--no-pngis passed.
The process exit code is the number of files that failed.
Flags:
| Flag | Default | Description |
|---|---|---|
--model |
big_tf_unet |
Registry name or path to a .pt/.pt2 checkpoint. |
--output-dir |
tokeye_output |
Directory for masks and previews. |
--n-fft |
1024 |
STFT window size (1D inputs only). |
--hop |
256 |
STFT hop size (1D inputs only). |
--keep-dc |
off | Keep the DC bin (dropped by default). |
--clip-low / --clip-high |
1.0 / 99.0 |
Percentile clip bounds applied to the spectrogram. |
--log |
off | Apply log1p to 2D spectrogram inputs stored in linear scale (1D signals are always log-scaled during the STFT). |
--threshold |
0.5 |
Mask threshold used only for the preview PNG overlay. |
--no-png |
off | Skip preview PNGs; write masks only. |
--device |
auto |
cpu, cuda, or auto. |
The released model was trained on spectrograms built with hop=128; for closest match to the training configuration use --hop 128.
On HPC clusters where compute nodes have no internet access, pre-fetch the weights on the login node, then run the batch job on the compute node:
tokeye download big_tf_unet # on the login node; prints the cached path
tokeye run ... --model big_tf_unet # on the compute node — model is already cached
Web app guide
tokeye app (or python -m tokeye.app) launches a Gradio interface with three tabs:
- Analyze — load a signal, compute its spectrogram, run a model, and visualize the result. Guided for first-time use: the model dropdown defaults to the bundled
big_tf_unetmodel, the STFT transform has working defaults, and "Load Example Signal" generates a synthetic demo signal so a brand-new user needs zero files. "Analyze" runs the whole load-model → infer → visualize pipeline in one click. View modes: Original, Enhanced (percentile-clipped amplitude), Mask (thresholded model output), Amplitude. - Annotate — manually draw and save mask annotations over a read-only backdrop image.
- Utilities — audio-format conversion and
.npyfile inspection.
Flags: tokeye app [--port 7860] [--share] [--open] — --share creates a public Gradio link, --open opens a browser tab on launch.
If you're on a remote server (e.g. an HPC login node), forward the port over SSH instead of using --share:
ssh -L 7860:localhost:7860 user@remote
Then open http://localhost:7860 in your local browser.
Verified Datatypes
- DIII-D Fast Magnetics (cite)
- DIII-D CO2 Interferometer (cite)
- DIII-D Electron Cyclotron Emission (cite)
- DIII-D Beam Emission Spectroscopy (cite)
Evaluation
Recall Scores:
- TJII2021: 0.8254
- DCLDE2011 (Delphinus capensis): 0.7708
- DCLDE2011 (Delphinus delphis): 0.7953
With more data, comes better models. Please contribute to the project!
Installation (from source / development)
uv is the dev tool for this repo:
git clone git@github.com:PlasmaControl/TokEye.git
cd TokEye
uv sync # core deps
uv sync --dev # + pytest, ruff, etc.
uv sync --group train # + training deps (lightning, h5py, etc.)
This creates a .venv/; activate it with source .venv/bin/activate, or prefix commands with uv run.
Models
| Registry name | HF file | Description |
|---|---|---|
big_tf_unet |
big_tf_unet_251210.pt |
Transformer U-Net trained on multiscale (multiwindow, multihop) spectrograms. |
Weights are hosted on Hugging Face and download automatically the first time a registry name is used (cached in ~/.cache/huggingface). Override the source repo with the TOKEYE_HF_REPO environment variable.
To use a local checkpoint instead, put .pt/.pt2 files in a model/ directory (picked up by the app's model dropdown) or pass a path directly via --model PATH.
Input should be a tensor that has shape (B, 1, H, W) where B, H, and W can vary Output will be a tensor of shape (B, 2, H, W)
Best performance when spectrograms are oriented so that when they are plotted with matplotlib, the lowest frequency bin is oriented with the bottom when origin='lower'. Spectrograms should be standardized (mean = 0, std = 1). If baseline activity is very strong, clipping the input may help, but is generally not needed.
The first channel of the output will return preferential measurements of coherent activity (useful for most tasks) The second channel of the output will return preferential measurements of transient activity
Data
Keep signals as 1D numpy float arrays (raw time series) — no need to normalize or preprocess them. The CLI also accepts 2D arrays (precomputed spectrograms) directly. The app scans a signal directory for .npy files (default data/input, configurable in the Analyze tab).
Bringing your own data takes two lines:
import numpy as np
signal = ... # any 1D float array: tokamak diagnostic, hydrophone, etc.
np.save("shots/myshot.npy", signal)
tokeye run shots/myshot.npy --output-dir results
No data yet? tokeye example writes a synthetic demo signal you can run immediately, and the web app has a matching "Load Example Signal" button.
Development
uv sync --dev
uv run ruff check .
uv run pytest
Citation
If you use this code in your research, please cite:
@article{chen_TokEye_2026,
title={TokEye: Fast Signal Extraction for Fluctuating Time Series via Offline Self-Supervised Learning From Fusion Diagnostics to Bioacoustics},
author={Chen, Nathaniel},
year={2026},
publisher={ArXiv},
doi={10.48550/arXiv.2602.20317},
url={https://www.arxiv.org/abs/2602.20317}
}
Contact
Nathaniel Chen — nathaniel [at] princeton [dot] edu — https://nathanielchen.net
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tokeye-0.11.0.tar.gz.
File metadata
- Download URL: tokeye-0.11.0.tar.gz
- Upload date:
- Size: 3.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
60799cb54449e2270508469ca698e50657b72945feb7595eae38a3437334ed5f
|
|
| MD5 |
c3931f9af1b61147b6ecddd45374ca97
|
|
| BLAKE2b-256 |
624d5a2b24907791b9f3a18733f30fc648330a2ca80825d01f7f53d02dd10c4a
|
Provenance
The following attestation bundles were made for tokeye-0.11.0.tar.gz:
Publisher:
python-publish.yml on PlasmaControl/tokeye
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
tokeye-0.11.0.tar.gz -
Subject digest:
60799cb54449e2270508469ca698e50657b72945feb7595eae38a3437334ed5f - Sigstore transparency entry: 2074022062
- Sigstore integration time:
-
Permalink:
PlasmaControl/tokeye@204bac291aed9ec89166c08707160cf3af16a3ea -
Branch / Tag:
refs/tags/v0.11.0 - Owner: https://github.com/PlasmaControl
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@204bac291aed9ec89166c08707160cf3af16a3ea -
Trigger Event:
release
-
Statement type:
File details
Details for the file tokeye-0.11.0-py3-none-any.whl.
File metadata
- Download URL: tokeye-0.11.0-py3-none-any.whl
- Upload date:
- Size: 3.2 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f010e431c481040a76ed6c4f4f56f857b451e43d0b3e82385db7c821bb91f426
|
|
| MD5 |
c6edeca4f58bc70af8cf45632547a554
|
|
| BLAKE2b-256 |
4c98f868d569ab6bdf294cf1af29aae2def44985efb82e6875e455904527be38
|
Provenance
The following attestation bundles were made for tokeye-0.11.0-py3-none-any.whl:
Publisher:
python-publish.yml on PlasmaControl/tokeye
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
tokeye-0.11.0-py3-none-any.whl -
Subject digest:
f010e431c481040a76ed6c4f4f56f857b451e43d0b3e82385db7c821bb91f426 - Sigstore transparency entry: 2074022094
- Sigstore integration time:
-
Permalink:
PlasmaControl/tokeye@204bac291aed9ec89166c08707160cf3af16a3ea -
Branch / Tag:
refs/tags/v0.11.0 - Owner: https://github.com/PlasmaControl
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@204bac291aed9ec89166c08707160cf3af16a3ea -
Trigger Event:
release
-
Statement type: