Automated ICA component classification using OpenAI Vision API for EEG data

These details have not been verified by PyPI

Project links

Project description

Autoclean EEG ICVision (Standalone)

Automated ICA component classification for EEG data using OpenAI's Vision API.

Overview

ICVision automates the tedious process of classifying ICA components from EEG data by generating component visualizations and sending them to OpenAI's Vision API for intelligent artifact identification.

Workflow: Raw EEG + ICA → Generate component plots → OpenAI Vision classification → Automated artifact removal → Clean EEG data

Key Features:

Automated classification of 7 component types (brain, eye, muscle, heart, line noise, channel noise, other)
🔄 Drop-in replacement for MNE-ICALabel: Same API, enhanced with OpenAI Vision
Multi-panel component plots (topography, time series, PSD, ERP-image)
MNE-Python integration with .fif and .set file support
EEGLAB .set file auto-detection: Single file input with automatic ICA detection
Smart file organization: Basename-prefixed output files prevent overwrites when processing multiple datasets
Continuous data only: Graceful error handling for epoched data with helpful conversion instructions
Enhanced PDF reports: Professional dual-header layout with color-coded classification results
OpenAI cost tracking: Automatic cost estimation and logging for budget monitoring
Parallel processing with configurable batch sizes
Command-line and Python API interfaces
Comprehensive PDF reports and CSV results

Installation

pip install autocleaneeg-icvision

Requirements: Python 3.8+ and OpenAI API key with vision model access (e.g., gpt-4.1)

export OPENAI_API_KEY='your_api_key_here'

Usage

Command-Line Interface (CLI)

The primary way to use ICVision is through its command-line interface.

Basic Usage:

Single EEGLAB .set file (Recommended):

autoclean-icvision /path/to/your_data.set
# or legacy command: icvision /path/to/your_data.set

Separate files:

autoclean-icvision /path/to/your_raw_data.set /path/to/your_ica_decomposition.fif
# or legacy command: icvision /path/to/your_raw_data.set /path/to/your_ica_decomposition.fif

ICVision can automatically detect and read ICA data from EEGLAB .set files, making single-file usage possible when your .set file contains both raw data and ICA decomposition.

This command will:

Load the raw EEG data and ICA solution (auto-detected from .set file or from separate files).
Classify components using the default settings.
Create an autoclean_icvision_results/ directory in your current working directory.
Save the following into the output directory (with input filename prefix for organization):
- Cleaned raw data (artifacts removed): {basename}_icvis_cleaned_raw.{format}
- Updated ICA object with component labels: {basename}_icvis_classified_ica.fif
- {basename}_icvis_results.csv detailing classifications for each component.
- {basename}_icvis_summary.txt with overall statistics.
- {basename}_icvis_report_all_comps.pdf (comprehensive PDF report with visualizations).

Note: {basename} is extracted from your input filename (e.g., sub-01_task-rest_eeg.set → sub-01_task-rest_eeg prefix). This prevents file overwrites when processing multiple datasets.

Recent Improvements

Enhanced File Organization (v2024.12):

Shared workspace: All results now saved to autoclean_icvision_results/ directory by default
Smart naming: Input filename prefixes (e.g., sub-01_task-rest_eeg_icvis_results.csv) prevent conflicts
Multi-file friendly: Process multiple datasets without overwrites - perfect for batch processing subjects

Improved User Experience:

Epoched data handling: Clear error messages with EEGLAB conversion instructions for unsupported epoched data
Enhanced PDF reports: Professional layout with IC Component titles and color-coded Vision Classification results
Clean logging output: Professional, user-focused logging with optional verbose mode for debugging
Better error messages: Informative CLI output with suggested solutions

Common Options (with defaults):

--api-key YOUR_API_KEY: Specify OpenAI API key (default: OPENAI_API_KEY env variable)
--output-dir /path/to/output/: Output directory (default: ./autoclean_icvision_results)
--model MODEL_NAME: OpenAI model (default: gpt-4.1)
--confidence-threshold 0.8: Confidence threshold for auto-exclusion (default: 0.8)
--psd-fmax 40: Maximum frequency for PSD plots in Hz (default: 80 or Nyquist)
--labels-to-exclude eye muscle heart: Artifact labels to exclude (default: all non-brain types)
--batch-size 10: Components per API request (default: 10)
--max-concurrency 4: Max parallel requests (default: 4)
--no-auto-exclude: Disable auto-exclusion (default: auto-exclude enabled)
--prompt-file /path/to/prompt.txt: Custom classification prompt (default: built-in prompt)
--no-report: Disable PDF report (default: report generation enabled)
--verbose: Enable detailed logging (default: standard logging)
--version: Show ICVision version
--help: Show full list of commands and options

Examples with options:

Single .set file usage:

autoclean-icvision data/subject01_eeg.set \
    --api-key sk-xxxxxxxxxxxxxxxxxxxx \
    --confidence-threshold 0.9 \
    --verbose

Traditional separate files:

autoclean-icvision data/subject01_raw.fif data/subject01_ica.fif \
    --api-key sk-xxxxxxxxxxxxxxxxxxxx \
    --model gpt-4.1 \
    --confidence-threshold 0.8 \
    --labels-to-exclude eye muscle line_noise channel_noise \
    --batch-size 8 \
    --verbose

For ERP studies with low-pass filtered data:

autoclean-icvision data/erp_study.set \
    --psd-fmax 40 \
    --confidence-threshold 0.85 \
    --verbose

Multi-file batch processing:

# Process multiple subjects - all results go to shared directory
autoclean-icvision data/sub-01_task-rest_eeg.set --verbose
autoclean-icvision data/sub-02_task-rest_eeg.set --verbose
autoclean-icvision data/sub-03_task-rest_eeg.set --verbose

# Results organized in autoclean_icvision_results/ with prefixed filenames
ls autoclean_icvision_results/
# sub-01_task-rest_eeg_icvis_results.csv
# sub-01_task-rest_eeg_icvis_classified_ica.fif
# sub-02_task-rest_eeg_icvis_results.csv
# sub-02_task-rest_eeg_icvis_classified_ica.fif
# ...

Python API

You can also use ICVision programmatically within your Python scripts.

Single .set file usage (NEW):

from pathlib import Path
from icvision.core import label_components

# --- Configuration ---
API_KEY = "your_openai_api_key"  # Or set as environment variable OPENAI_API_KEY
DATA_PATH = "path/to/your_data.set"  # EEGLAB .set file with ICA
OUTPUT_DIR = Path("icvision_output")

# --- Run ICVision (ICA auto-detected from .set file) ---
try:
    raw_cleaned, ica_updated, results_df = label_components(
        raw_data=DATA_PATH,              # EEGLAB .set file path
        # ica_data parameter is optional - auto-detected from .set file
        api_key=API_KEY,                 # Optional if OPENAI_API_KEY env var is set
        output_dir=OUTPUT_DIR,
    )

Traditional separate files:

from pathlib import Path
from icvision.core import label_components

# --- Configuration ---
API_KEY = "your_openai_api_key"  # Or set as environment variable OPENAI_API_KEY
RAW_DATA_PATH = "path/to/your_raw_data.set"
ICA_DATA_PATH = "path/to/your_ica_data.fif"
OUTPUT_DIR = Path("icvision_output")

# --- Run ICVision with all parameters ---
try:
    raw_cleaned, ica_updated, results_df = label_components(
        raw_data=RAW_DATA_PATH,          # Can be MNE object or path string/Path object
        ica_data=ICA_DATA_PATH,          # Can be MNE object, path, or None for auto-detection
        api_key=API_KEY,                 # Optional if OPENAI_API_KEY env var is set
        output_dir=OUTPUT_DIR,
        model_name="gpt-4.1",            # Default: "gpt-4.1"
        confidence_threshold=0.80,       # Default: 0.8
        labels_to_exclude=["eye", "muscle", "heart", "line_noise", "channel_noise"],  # Default: all non-brain
        generate_report=True,            # Default: True
        batch_size=5,                    # Default: 10
        max_concurrency=3,               # Default: 4
        auto_exclude=True,               # Default: True
        custom_prompt=None,              # Default: None (uses built-in prompt)
        psd_fmax=40.0                    # Default: None (uses 80 Hz); useful for ERP studies
    )

    print("\n--- ICVision Processing Complete ---")
    print(f"Cleaned raw data channels: {raw_cleaned.info['nchan']}")
    print(f"Updated ICA components: {ica_updated.n_components_}")
    print(f"Number of components classified: {len(results_df)}")

    if not results_df.empty:
        print(f"Number of components marked for exclusion: {results_df['exclude_vision'].sum()}")
        print("\nClassification Summary:")
        print(results_df[['component_name', 'label', 'confidence', 'exclude_vision']].head())

    print(f"\nResults saved in: {OUTPUT_DIR.resolve()}")

except Exception as e:
    print(f"An error occurred: {e}")

🔄 ICLabel Drop-in Replacement

ICVision can serve as a drop-in replacement for MNE-ICALabel with identical API and output format. This means you can upgrade existing ICLabel workflows to use OpenAI Vision API without changing any other code.

Quick Migration

Before (using MNE-ICALabel):

from mne_icalabel import label_components

# Classify components with ICLabel
result = label_components(raw, ica, method='iclabel')
print(result['labels'])  # ['brain', 'eye blink', 'other', ...]
print(ica.labels_scores_.shape)  # (n_components, 7)

After (using ICVision):

from icvision.compat import label_components  # <-- Only line that changes!

# Classify components with ICVision (same API!)
result = label_components(raw, ica, method='icvision')
print(result['labels'])  # Same format: ['brain', 'eye blink', 'other', ...]
print(ica.labels_scores_.shape)  # Same shape: (n_components, 7)

What You Get

🎯 Identical API: Same function signature, same return format
📊 Same Output: Returns dict with 'y_pred_proba' and 'labels' keys
⚙️ Same ICA Modifications: Sets ica.labels_scores_ and ica.labels_ exactly like ICLabel
🚀 Enhanced Intelligence: OpenAI Vision API instead of fixed neural network
💡 Detailed Reasoning: Each classification includes explanation (available in full API)

Why Use ICVision over ICLabel?

Feature	ICLabel	ICVision
Classification Method	Fixed neural network (2019)	OpenAI Vision API (latest models)
Accuracy	Good on typical datasets	Enhanced with modern vision AI
Reasoning	No explanations	Detailed reasoning for each decision
Customization	Fixed model	Customizable prompts and models
Updates	Static model	Benefits from OpenAI improvements
API Compatibility	✅ Original	✅ Drop-in replacement

Integration Example

The compatibility layer works seamlessly with existing MNE workflows:

def analyze_ica_components(raw, ica, method='icvision'):
    """Generic function that works with both ICLabel and ICVision"""

    if method == 'icvision':
        from icvision.compat import label_components
    else:
        from mne_icalabel import label_components

    # Same API for both!
    result = label_components(raw, ica, method=method)

    # Same return format for both
    print(f"Classified {len(result['labels'])} components")

    # Same ICA object modifications for both
    brain_components = ica.labels_['brain']
    artifact_components = [idx for key, indices in ica.labels_.items()
                          if key != 'brain' for idx in indices]

    print(f"Brain components: {brain_components}")
    print(f"Artifact components: {artifact_components}")

    return result

# Works with either classifier
result = analyze_ica_components(raw, ica, method='icvision')

Two APIs, Same Power

ICVision provides two complementary interfaces:

Original ICVision API: Rich output with detailed results and file generation

from icvision.core import label_components
raw_cleaned, ica_updated, results_df = label_components(...)

ICLabel-Compatible API: Simple output matching ICLabel exactly

from icvision.compat import label_components
result = label_components(raw, ica, method='icvision')

Choose the API that best fits your workflow - both use the same underlying OpenAI Vision classification.

Configuration Details

Input File Support

EEGLAB .set files:

Raw data: Supports EEGLAB .set files for raw EEG data
ICA data: Now supports automatic ICA detection from .set files using mne.preprocessing.read_ica_eeglab()
Single file mode: Use just a .set file when it contains both raw data and ICA decomposition

MNE formats: Other supported formats include:

Raw data: .fif, .edf, .raw
ICA data: .fif files containing MNE ICA objects

Default Parameter Values

Parameter	Default Value	Description
`model_name`	`"gpt-4.1"`	OpenAI model for classification
`confidence_threshold`	`0.8`	Minimum confidence for auto-exclusion
`auto_exclude`	`True`	Automatically exclude artifact components
`labels_to_exclude`	`["eye", "muscle", "heart", "line_noise", "channel_noise", "other_artifact"]`	Labels to exclude (all non-brain)
`output_dir`	`"./autoclean_icvision_results"`	Output directory for results
`generate_report`	`True`	Generate PDF report
`batch_size`	`10`	Components per API request
`max_concurrency`	`4`	Maximum parallel API requests
`api_key`	`None`	Uses `OPENAI_API_KEY` environment variable
`custom_prompt`	`None`	Uses built-in classification prompt

Component Labels

The standard set of labels ICVision uses (and expects from the API) are:

brain - Neural brain activity (retained)
eye - Eye movement artifacts
muscle - Muscle artifacts
heart - Cardiac artifacts
line_noise - Electrical line noise
channel_noise - Channel-specific noise
other_artifact - Other artifacts

These are defined in src/icvision/config.py.

Output Files

ICVision creates organized output files with input filename prefixes to prevent overwrites when processing multiple datasets:

{basename}_icvis_classified_ica.fif: MNE ICA object with labels and exclusions
{basename}_icvis_results.csv: Detailed classification results per component
{basename}_icvis_cleaned_raw.{format}: Cleaned EEG data with artifacts removed
{basename}_icvis_summary.txt: Summary statistics by label type
{basename}_icvis_report_all_comps.pdf: Comprehensive PDF report (if enabled)
component_IC{N}_vision_analysis.webp: Individual component plots used for API classification

Example: Processing sub-01_task-rest_eeg.set creates files like:

sub-01_task-rest_eeg_icvis_results.csv
sub-01_task-rest_eeg_icvis_classified_ica.fif
sub-01_task-rest_eeg_icvis_cleaned_raw.set

Multi-file Processing: All results are saved to the same autoclean_icvision_results/ directory, with basename prefixes ensuring no conflicts:

autoclean_icvision_results/
├── sub-01_task-rest_eeg_icvis_results.csv
├── sub-01_task-rest_eeg_icvis_classified_ica.fif
├── sub-02_task-rest_eeg_icvis_results.csv
├── sub-02_task-rest_eeg_icvis_classified_ica.fif
└── pilot_data_icvis_results.csv

Custom Classification Prompt

The default prompt is optimized for EEG component classification on EGI128 nets. You can customize it by:

CLI: --prompt-file /path/to/custom_prompt.txt
Python API: custom_prompt="Your custom prompt here"
View default: Check src/icvision/config.py

OpenAI API Costs

ICVision automatically tracks and estimates OpenAI API costs during processing:

Typical Costs (2025-05-29 pricing):

gpt-4.1: ~$0.0012 per component
gpt-4.1-mini: ~$0.0002 per component (recommended)
gpt-4.1-nano: ~$0.0001 per component (budget option)

Example costs for full ICA analysis:

10 components: $0.0006-0.012 depending on model
30 components: $0.002-0.036 depending on model
64 components: $0.004-0.077 depending on model

Cost estimates are automatically logged during processing. Use --verbose flag to see detailed per-component cost tracking.

Logging and Verbosity

ICVision provides two logging modes for different use cases:

Normal Mode (Default - Clean output for researchers):

autoclean-icvision data.set
# Output:
# 2025-05-29 13:33:43 - INFO - Starting ICVision CLI v0.1.0
# 2025-05-29 13:33:44 - INFO - OpenAI classification complete. Processed 20/20 components
# 2025-05-29 13:33:45 - INFO - ICVision workflow completed successfully!

Verbose Mode (Detailed debugging information):

autoclean-icvision data.set --verbose
# Output:
# 2025-05-29 13:33:43 - icvision - INFO - Verbose logging enabled - showing module details
# 2025-05-29 13:33:44 - icvision.core - DEBUG - Loading and validating input data...
# 2025-05-29 13:33:45 - icvision.api - DEBUG - Response ID: resp_123..., Tokens: 400/50, Cost: $0.001200
# 2025-05-29 13:33:45 - icvision.plotting - DEBUG - Plotting progress: 10/20 components completed

Verbose mode provides:

Module-level debugging information
Detailed OpenAI API cost tracking per component
Progress indicators for long-running operations
External library logging (httpx, openai, etc.)
Full error stack traces for troubleshooting

Use verbose mode when:

Debugging processing issues
Monitoring API costs in detail
Contributing to development
Troubleshooting unexpected behavior

Development

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Citation

If you use ICVision in your research, please consider citing it (details to be added upon publication/DOI generation).

Acknowledgements

This project relies heavily on the MNE-Python library.
Utilizes the OpenAI API.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.2.1

Jan 14, 2026

This version

0.2.0

Sep 5, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

autocleaneeg_icvision-0.2.0.tar.gz (83.1 kB view details)

Uploaded Sep 5, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

autocleaneeg_icvision-0.2.0-py3-none-any.whl (57.1 kB view details)

Uploaded Sep 5, 2025 Python 3

File details

Details for the file autocleaneeg_icvision-0.2.0.tar.gz.

File metadata

Download URL: autocleaneeg_icvision-0.2.0.tar.gz
Upload date: Sep 5, 2025
Size: 83.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.8.4

File hashes

Hashes for autocleaneeg_icvision-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`3ccf95f8b69910014b05665b92118125c05b2f5b278bc29de8f5de0b9b09f53f`
MD5	`914c94b5ce84cbeba7345377a2fd937f`
BLAKE2b-256	`b6903d1fa5d578faf2cea5331c0b4856c3d3c956c9ee73ca389642a3e21f63fc`

See more details on using hashes here.

File details

Details for the file autocleaneeg_icvision-0.2.0-py3-none-any.whl.

File metadata

Download URL: autocleaneeg_icvision-0.2.0-py3-none-any.whl
Upload date: Sep 5, 2025
Size: 57.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.8.4

File hashes

Hashes for autocleaneeg_icvision-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`962ce657bc8d77f285d99400ad575175f6bf35fd021cf60ab81216d2357bd675`
MD5	`66eb9ecd6cebb1db2d9c6236aca32320`
BLAKE2b-256	`1b32fe331522f529da880b0c314e9d78fd550319bd4c5123f0f6e4581a616007`

See more details on using hashes here.

autocleaneeg-icvision 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Autoclean EEG ICVision (Standalone)

Overview

Installation

Usage

Command-Line Interface (CLI)

Recent Improvements

Python API

🔄 ICLabel Drop-in Replacement

Quick Migration

What You Get

Why Use ICVision over ICLabel?

Integration Example

Two APIs, Same Power

Configuration Details

Input File Support

Default Parameter Values

Component Labels

Output Files

Custom Classification Prompt

OpenAI API Costs

Logging and Verbosity

Development

License

Citation

Acknowledgements

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes