Skip to main content

Automated ICA component classification using OpenAI Vision API for EEG data

Project description

Autoclean EEG ICVision (Standalone)

PyPI version Python versions License: MIT Code style: black

Automated ICA component classification for EEG data using OpenAI's Vision API.

Overview

ICVision automates the tedious process of classifying ICA components from EEG data by generating component visualizations and sending them to OpenAI's Vision API for intelligent artifact identification.

Workflow: Raw EEG + ICA → Generate component plots → OpenAI Vision classification → Automated artifact removal → Clean EEG data

Key Features:

  • Automated classification of 7 component types (brain, eye, muscle, heart, line noise, channel noise, other)
  • 🔄 Drop-in replacement for MNE-ICALabel: Same API, enhanced with OpenAI Vision
  • Multi-panel component plots (topography, time series, PSD, ERP-image)
  • MNE-Python integration with .fif and .set file support
  • EEGLAB .set file auto-detection: Single file input with automatic ICA detection
  • Smart file organization: Basename-prefixed output files prevent overwrites when processing multiple datasets
  • Continuous data only: Graceful error handling for epoched data with helpful conversion instructions
  • Enhanced PDF reports: Professional dual-header layout with color-coded classification results
  • OpenAI cost tracking: Automatic cost estimation and logging for budget monitoring
  • Parallel processing with configurable batch sizes
  • Command-line and Python API interfaces
  • Comprehensive PDF reports and CSV results

Installation

pip install autocleaneeg-icvision

Requirements: Python 3.8+ and OpenAI API key with vision model access (e.g., gpt-4.1)

export OPENAI_API_KEY='your_api_key_here'

Usage

Command-Line Interface (CLI)

The primary way to use ICVision is through its command-line interface.

Basic Usage:

Single EEGLAB .set file (Recommended):

autoclean-icvision /path/to/your_data.set
# or legacy command: icvision /path/to/your_data.set

Separate files:

autoclean-icvision /path/to/your_raw_data.set /path/to/your_ica_decomposition.fif
# or legacy command: icvision /path/to/your_raw_data.set /path/to/your_ica_decomposition.fif

ICVision can automatically detect and read ICA data from EEGLAB .set files, making single-file usage possible when your .set file contains both raw data and ICA decomposition.

This command will:

  1. Load the raw EEG data and ICA solution (auto-detected from .set file or from separate files).
  2. Classify components using the default settings.
  3. Create an autoclean_icvision_results/ directory in your current working directory.
  4. Save the following into the output directory (with input filename prefix for organization):
    • Cleaned raw data (artifacts removed): {basename}_icvis_cleaned_raw.{format}
    • Updated ICA object with component labels: {basename}_icvis_classified_ica.fif
    • {basename}_icvis_results.csv detailing classifications for each component.
    • {basename}_icvis_summary.txt with overall statistics.
    • {basename}_icvis_report_all_comps.pdf (comprehensive PDF report with visualizations).

Note: {basename} is extracted from your input filename (e.g., sub-01_task-rest_eeg.setsub-01_task-rest_eeg prefix). This prevents file overwrites when processing multiple datasets.

Recent Improvements

Enhanced File Organization (v2024.12):

  • Shared workspace: All results now saved to autoclean_icvision_results/ directory by default
  • Smart naming: Input filename prefixes (e.g., sub-01_task-rest_eeg_icvis_results.csv) prevent conflicts
  • Multi-file friendly: Process multiple datasets without overwrites - perfect for batch processing subjects

Improved User Experience:

  • Epoched data handling: Clear error messages with EEGLAB conversion instructions for unsupported epoched data
  • Enhanced PDF reports: Professional layout with IC Component titles and color-coded Vision Classification results
  • Clean logging output: Professional, user-focused logging with optional verbose mode for debugging
  • Better error messages: Informative CLI output with suggested solutions

Common Options (with defaults):

  • --api-key YOUR_API_KEY: Specify OpenAI API key (default: OPENAI_API_KEY env variable)
  • --output-dir /path/to/output/: Output directory (default: ./autoclean_icvision_results)
  • --model MODEL_NAME: OpenAI model (default: gpt-4.1)
  • --confidence-threshold 0.8: Confidence threshold for auto-exclusion (default: 0.8)
  • --psd-fmax 40: Maximum frequency for PSD plots in Hz (default: 80 or Nyquist)
  • --labels-to-exclude eye muscle heart: Artifact labels to exclude (default: all non-brain types)
  • --batch-size 10: Components per API request (default: 10)
  • --max-concurrency 4: Max parallel requests (default: 4)
  • --no-auto-exclude: Disable auto-exclusion (default: auto-exclude enabled)
  • --prompt-file /path/to/prompt.txt: Custom classification prompt (default: built-in prompt)
  • --no-report: Disable PDF report (default: report generation enabled)
  • --verbose: Enable detailed logging (default: standard logging)
  • --version: Show ICVision version
  • --help: Show full list of commands and options

Examples with options:

Single .set file usage:

autoclean-icvision data/subject01_eeg.set \
    --api-key sk-xxxxxxxxxxxxxxxxxxxx \
    --confidence-threshold 0.9 \
    --verbose

Traditional separate files:

autoclean-icvision data/subject01_raw.fif data/subject01_ica.fif \
    --api-key sk-xxxxxxxxxxxxxxxxxxxx \
    --model gpt-4.1 \
    --confidence-threshold 0.8 \
    --labels-to-exclude eye muscle line_noise channel_noise \
    --batch-size 8 \
    --verbose

For ERP studies with low-pass filtered data:

autoclean-icvision data/erp_study.set \
    --psd-fmax 40 \
    --confidence-threshold 0.85 \
    --verbose

Multi-file batch processing:

# Process multiple subjects - all results go to shared directory
autoclean-icvision data/sub-01_task-rest_eeg.set --verbose
autoclean-icvision data/sub-02_task-rest_eeg.set --verbose
autoclean-icvision data/sub-03_task-rest_eeg.set --verbose

# Results organized in autoclean_icvision_results/ with prefixed filenames
ls autoclean_icvision_results/
# sub-01_task-rest_eeg_icvis_results.csv
# sub-01_task-rest_eeg_icvis_classified_ica.fif
# sub-02_task-rest_eeg_icvis_results.csv
# sub-02_task-rest_eeg_icvis_classified_ica.fif
# ...

Python API

You can also use ICVision programmatically within your Python scripts.

Single .set file usage (NEW):

from pathlib import Path
from icvision.core import label_components

# --- Configuration ---
API_KEY = "your_openai_api_key"  # Or set as environment variable OPENAI_API_KEY
DATA_PATH = "path/to/your_data.set"  # EEGLAB .set file with ICA
OUTPUT_DIR = Path("icvision_output")

# --- Run ICVision (ICA auto-detected from .set file) ---
try:
    raw_cleaned, ica_updated, results_df = label_components(
        raw_data=DATA_PATH,              # EEGLAB .set file path
        # ica_data parameter is optional - auto-detected from .set file
        api_key=API_KEY,                 # Optional if OPENAI_API_KEY env var is set
        output_dir=OUTPUT_DIR,
    )

Traditional separate files:

from pathlib import Path
from icvision.core import label_components

# --- Configuration ---
API_KEY = "your_openai_api_key"  # Or set as environment variable OPENAI_API_KEY
RAW_DATA_PATH = "path/to/your_raw_data.set"
ICA_DATA_PATH = "path/to/your_ica_data.fif"
OUTPUT_DIR = Path("icvision_output")

# --- Run ICVision with all parameters ---
try:
    raw_cleaned, ica_updated, results_df = label_components(
        raw_data=RAW_DATA_PATH,          # Can be MNE object or path string/Path object
        ica_data=ICA_DATA_PATH,          # Can be MNE object, path, or None for auto-detection
        api_key=API_KEY,                 # Optional if OPENAI_API_KEY env var is set
        output_dir=OUTPUT_DIR,
        model_name="gpt-4.1",            # Default: "gpt-4.1"
        confidence_threshold=0.80,       # Default: 0.8
        labels_to_exclude=["eye", "muscle", "heart", "line_noise", "channel_noise"],  # Default: all non-brain
        generate_report=True,            # Default: True
        batch_size=5,                    # Default: 10
        max_concurrency=3,               # Default: 4
        auto_exclude=True,               # Default: True
        custom_prompt=None,              # Default: None (uses built-in prompt)
        psd_fmax=40.0                    # Default: None (uses 80 Hz); useful for ERP studies
    )

    print("\n--- ICVision Processing Complete ---")
    print(f"Cleaned raw data channels: {raw_cleaned.info['nchan']}")
    print(f"Updated ICA components: {ica_updated.n_components_}")
    print(f"Number of components classified: {len(results_df)}")

    if not results_df.empty:
        print(f"Number of components marked for exclusion: {results_df['exclude_vision'].sum()}")
        print("\nClassification Summary:")
        print(results_df[['component_name', 'label', 'confidence', 'exclude_vision']].head())

    print(f"\nResults saved in: {OUTPUT_DIR.resolve()}")

except Exception as e:
    print(f"An error occurred: {e}")

🔄 ICLabel Drop-in Replacement

ICVision can serve as a drop-in replacement for MNE-ICALabel with identical API and output format. This means you can upgrade existing ICLabel workflows to use OpenAI Vision API without changing any other code.

Quick Migration

Before (using MNE-ICALabel):

from mne_icalabel import label_components

# Classify components with ICLabel
result = label_components(raw, ica, method='iclabel')
print(result['labels'])  # ['brain', 'eye blink', 'other', ...]
print(ica.labels_scores_.shape)  # (n_components, 7)

After (using ICVision):

from icvision.compat import label_components  # <-- Only line that changes!

# Classify components with ICVision (same API!)
result = label_components(raw, ica, method='icvision')
print(result['labels'])  # Same format: ['brain', 'eye blink', 'other', ...]
print(ica.labels_scores_.shape)  # Same shape: (n_components, 7)

What You Get

  • 🎯 Identical API: Same function signature, same return format
  • 📊 Same Output: Returns dict with 'y_pred_proba' and 'labels' keys
  • ⚙️ Same ICA Modifications: Sets ica.labels_scores_ and ica.labels_ exactly like ICLabel
  • 🚀 Enhanced Intelligence: OpenAI Vision API instead of fixed neural network
  • 💡 Detailed Reasoning: Each classification includes explanation (available in full API)

Why Use ICVision over ICLabel?

Feature ICLabel ICVision
Classification Method Fixed neural network (2019) OpenAI Vision API (latest models)
Accuracy Good on typical datasets Enhanced with modern vision AI
Reasoning No explanations Detailed reasoning for each decision
Customization Fixed model Customizable prompts and models
Updates Static model Benefits from OpenAI improvements
API Compatibility ✅ Original ✅ Drop-in replacement

Integration Example

The compatibility layer works seamlessly with existing MNE workflows:

def analyze_ica_components(raw, ica, method='icvision'):
    """Generic function that works with both ICLabel and ICVision"""

    if method == 'icvision':
        from icvision.compat import label_components
    else:
        from mne_icalabel import label_components

    # Same API for both!
    result = label_components(raw, ica, method=method)

    # Same return format for both
    print(f"Classified {len(result['labels'])} components")

    # Same ICA object modifications for both
    brain_components = ica.labels_['brain']
    artifact_components = [idx for key, indices in ica.labels_.items()
                          if key != 'brain' for idx in indices]

    print(f"Brain components: {brain_components}")
    print(f"Artifact components: {artifact_components}")

    return result

# Works with either classifier
result = analyze_ica_components(raw, ica, method='icvision')

Two APIs, Same Power

ICVision provides two complementary interfaces:

  1. Original ICVision API: Rich output with detailed results and file generation

    from icvision.core import label_components
    raw_cleaned, ica_updated, results_df = label_components(...)
    
  2. ICLabel-Compatible API: Simple output matching ICLabel exactly

    from icvision.compat import label_components
    result = label_components(raw, ica, method='icvision')
    

Choose the API that best fits your workflow - both use the same underlying OpenAI Vision classification.


Configuration Details

Input File Support

EEGLAB .set files:

  • Raw data: Supports EEGLAB .set files for raw EEG data
  • ICA data: Now supports automatic ICA detection from .set files using mne.preprocessing.read_ica_eeglab()
  • Single file mode: Use just a .set file when it contains both raw data and ICA decomposition

MNE formats: Other supported formats include:

  • Raw data: .fif, .edf, .raw
  • ICA data: .fif files containing MNE ICA objects

Default Parameter Values

Parameter Default Value Description
model_name "gpt-4.1" OpenAI model for classification
confidence_threshold 0.8 Minimum confidence for auto-exclusion
auto_exclude True Automatically exclude artifact components
labels_to_exclude ["eye", "muscle", "heart", "line_noise", "channel_noise", "other_artifact"] Labels to exclude (all non-brain)
output_dir "./autoclean_icvision_results" Output directory for results
generate_report True Generate PDF report
batch_size 10 Components per API request
max_concurrency 4 Maximum parallel API requests
api_key None Uses OPENAI_API_KEY environment variable
custom_prompt None Uses built-in classification prompt

Component Labels

The standard set of labels ICVision uses (and expects from the API) are:

  • brain - Neural brain activity (retained)
  • eye - Eye movement artifacts
  • muscle - Muscle artifacts
  • heart - Cardiac artifacts
  • line_noise - Electrical line noise
  • channel_noise - Channel-specific noise
  • other_artifact - Other artifacts

These are defined in src/icvision/config.py.

Output Files

ICVision creates organized output files with input filename prefixes to prevent overwrites when processing multiple datasets:

  • {basename}_icvis_classified_ica.fif: MNE ICA object with labels and exclusions
  • {basename}_icvis_results.csv: Detailed classification results per component
  • {basename}_icvis_cleaned_raw.{format}: Cleaned EEG data with artifacts removed
  • {basename}_icvis_summary.txt: Summary statistics by label type
  • {basename}_icvis_report_all_comps.pdf: Comprehensive PDF report (if enabled)
  • component_IC{N}_vision_analysis.webp: Individual component plots used for API classification

Example: Processing sub-01_task-rest_eeg.set creates files like:

  • sub-01_task-rest_eeg_icvis_results.csv
  • sub-01_task-rest_eeg_icvis_classified_ica.fif
  • sub-01_task-rest_eeg_icvis_cleaned_raw.set

Multi-file Processing: All results are saved to the same autoclean_icvision_results/ directory, with basename prefixes ensuring no conflicts:

autoclean_icvision_results/
├── sub-01_task-rest_eeg_icvis_results.csv
├── sub-01_task-rest_eeg_icvis_classified_ica.fif
├── sub-02_task-rest_eeg_icvis_results.csv
├── sub-02_task-rest_eeg_icvis_classified_ica.fif
└── pilot_data_icvis_results.csv

Custom Classification Prompt

The default prompt is optimized for EEG component classification on EGI128 nets. You can customize it by:

  • CLI: --prompt-file /path/to/custom_prompt.txt
  • Python API: custom_prompt="Your custom prompt here"
  • View default: Check src/icvision/config.py

OpenAI API Costs

ICVision automatically tracks and estimates OpenAI API costs during processing:

Typical Costs (2025-05-29 pricing):

  • gpt-4.1: ~$0.0012 per component
  • gpt-4.1-mini: ~$0.0002 per component (recommended)
  • gpt-4.1-nano: ~$0.0001 per component (budget option)

Example costs for full ICA analysis:

  • 10 components: $0.0006-0.012 depending on model
  • 30 components: $0.002-0.036 depending on model
  • 64 components: $0.004-0.077 depending on model

Cost estimates are automatically logged during processing. Use --verbose flag to see detailed per-component cost tracking.

Logging and Verbosity

ICVision provides two logging modes for different use cases:

Normal Mode (Default - Clean output for researchers):

autoclean-icvision data.set
# Output:
# 2025-05-29 13:33:43 - INFO - Starting ICVision CLI v0.1.0
# 2025-05-29 13:33:44 - INFO - OpenAI classification complete. Processed 20/20 components
# 2025-05-29 13:33:45 - INFO - ICVision workflow completed successfully!

Verbose Mode (Detailed debugging information):

autoclean-icvision data.set --verbose
# Output:
# 2025-05-29 13:33:43 - icvision - INFO - Verbose logging enabled - showing module details
# 2025-05-29 13:33:44 - icvision.core - DEBUG - Loading and validating input data...
# 2025-05-29 13:33:45 - icvision.api - DEBUG - Response ID: resp_123..., Tokens: 400/50, Cost: $0.001200
# 2025-05-29 13:33:45 - icvision.plotting - DEBUG - Plotting progress: 10/20 components completed

Verbose mode provides:

  • Module-level debugging information
  • Detailed OpenAI API cost tracking per component
  • Progress indicators for long-running operations
  • External library logging (httpx, openai, etc.)
  • Full error stack traces for troubleshooting

Use verbose mode when:

  • Debugging processing issues
  • Monitoring API costs in detail
  • Contributing to development
  • Troubleshooting unexpected behavior

Development

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Citation

If you use ICVision in your research, please consider citing it (details to be added upon publication/DOI generation).

Acknowledgements

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

autocleaneeg_icvision-0.2.0.tar.gz (83.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

autocleaneeg_icvision-0.2.0-py3-none-any.whl (57.1 kB view details)

Uploaded Python 3

File details

Details for the file autocleaneeg_icvision-0.2.0.tar.gz.

File metadata

File hashes

Hashes for autocleaneeg_icvision-0.2.0.tar.gz
Algorithm Hash digest
SHA256 3ccf95f8b69910014b05665b92118125c05b2f5b278bc29de8f5de0b9b09f53f
MD5 914c94b5ce84cbeba7345377a2fd937f
BLAKE2b-256 b6903d1fa5d578faf2cea5331c0b4856c3d3c956c9ee73ca389642a3e21f63fc

See more details on using hashes here.

File details

Details for the file autocleaneeg_icvision-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for autocleaneeg_icvision-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 962ce657bc8d77f285d99400ad575175f6bf35fd021cf60ab81216d2357bd675
MD5 66eb9ecd6cebb1db2d9c6236aca32320
BLAKE2b-256 1b32fe331522f529da880b0c314e9d78fd550319bd4c5123f0f6e4581a616007

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page