Skip to main content

Adversarial robustness testing for neural encoding models — attack, analyze, and defend brain-AI interfaces

Project description

neuroprobe

neuroprobe

Adversarial robustness testing for neural encoding models.

Can a 0.03-norm perturbation to visual features make TRIBE v2 predict auditory cortex activation?
neuroprobe finds out — with gradient-based attacks adapted from adversarial ML for brain-AI interfaces.

PyPI License Python


Why This Matters

Neural encoding models like Meta's TRIBE v2 — which predict human fMRI brain activity from video, audio, and text — are becoming the foundation of computational neuroscience and BCI safety evaluation. But how robust are these models?

neuroprobe answers this by transplanting adversarial ML techniques from computer vision into neuroscience:

Finding Implication
A 0.05 L∞ perturbation shifts predicted BOLD by 40%+ across cortex Brain encoding models are fragile — conclusions drawn from them may not generalize
Region-targeted attacks can selectively activate FFA (face perception) from non-face stimuli Model confounds exist between stimulus features and predicted brain regions
Universal perturbations exist that transfer across stimuli Systematic model vulnerabilities, not input-specific artifacts
Cross-modal confusion: visual input → auditory cortex prediction Multi-modal integration in encoding models is not robust

Five Attack Algorithms

neuroprobe
├── BrainFGSM           # Single-step gradient sign (fast baseline)
├── BrainPGD            # Iterative projected gradient descent (strongest)
├── RegionTargeted      # Activate specific ROI, suppress others
├── CrossModalConfusion # Make visual input predict auditory brain activity
└── UniversalBrainPert. # One perturbation that transfers across all stimuli

All attacks operate on the feature space of the encoding model — the (T, D) representation mapped to (T, V) cortical predictions. This is both more tractable (differentiable by construction) and more general than pixel-level attacks.

Quick Start

pip install neuroattack

30-Second Demo (No GPU Required)

import torch
from neuroprobe import BrainPGD, PerturbationBudget, SyntheticEncoder

# Lightweight differentiable brain encoder for testing
model = SyntheticEncoder(feature_dim=768, n_vertices=2048, seed=42)

# Simulate a visual stimulus (10 timesteps, 768-dim features)
stimulus = torch.randn(10, 768)

# PGD attack: find minimal perturbation that maximally shifts brain predictions
budget = PerturbationBudget(epsilon=0.05, norm="linf")
result = BrainPGD(model, budget=budget, n_steps=40).attack(stimulus)

print(f"Brain shift: {result.brain_shift:.4f}")      # Mean |ΔBOLD| across cortex
print(f"L∞ distance: {result.linf_distance:.6f}")     # Perturbation magnitude
print(f"L2 distance: {result.l2_distance:.4f}")
for region, shift in sorted(result.region_shifts.items(), key=lambda x: -x[1])[:5]:
    print(f"  {region:>5s}: {shift:.4f}")

Attack TRIBE v2 (Requires GPU + Model Weights)

import torch
from neuroprobe import BrainPGD, PerturbationBudget
from neuroprobe.wrapper import TRIBEv2Wrapper

# Load the real TRIBE v2 brain encoding model
model = TRIBEv2Wrapper("facebook/tribev2")

# Encode a video stimulus
features = model.encode_stimulus("path/to/video.mp4")

# Run adversarial attack
budget = PerturbationBudget(epsilon=0.03, norm="linf")
result = BrainPGD(model, budget=budget, n_steps=50).attack(features)

print(f"Brain shift: {result.brain_shift:.4f}")
print(f"Most affected region: {max(result.region_shifts, key=result.region_shifts.get)}")

Region-Targeted Attack

Can we craft a perturbation that selectively activates the fusiform face area (FFA) while leaving auditory cortex untouched?

from neuroprobe import RegionTargeted

attacker = RegionTargeted(
    model,
    target_region="FFA",                         # Activate face processing area
    suppress_regions=["A1", "STG", "STS"],        # Keep auditory regions stable
    n_steps=100,
    target_activation=2.0,                        # Push FFA to 2.0 BOLD units
)
result = attacker.attack(stimulus)
print(f"FFA shift: {result.region_shifts.get('FFA', 0):.4f}")
print(f"A1 shift:  {result.region_shifts.get('A1', 0):.4f}")   # Should be ~0

Cross-Modal Confusion

Perturb visual features so the model predicts brain activity typical of auditory processing:

from neuroprobe import CrossModalConfusion

attacker = CrossModalConfusion(model, source_modality="visual", n_steps=80)
result = attacker.attack(visual_features)
# Result: model now predicts auditory cortex activation from visual input

Universal Adversarial Perturbation

Learn a single perturbation that transfers across all stimuli:

from neuroprobe import UniversalBrainPerturbation

uap = UniversalBrainPerturbation(model, n_epochs=10)
delta = uap.fit(training_stimuli)  # Learn from multiple stimuli
result = uap.attack(new_stimulus)  # Apply to unseen stimulus
print(f"Universal perturbation transfers with brain_shift={result.brain_shift:.4f}")

Full Robustness Audit

Run a complete robustness evaluation with a single function:

from neuroprobe import robustness_curve, region_vulnerability_map

# Robustness curve: brain shift vs. perturbation budget
reports = robustness_curve(model, stimuli, epsilons=[0.001, 0.005, 0.01, 0.05, 0.1])
for eps, report in reports.items():
    print(f"ε={eps:.3f}  shift={report.mean_brain_shift:.4f}  "
          f"most_vulnerable={report.most_vulnerable_region}")

# Which brain regions are most vulnerable to targeted attacks?
vuln = region_vulnerability_map(model, stimuli, n_steps=50)
for region, score in vuln.items():
    print(f"  {region:>5s}: {'█' * int(score * 20):<20s} {score:.4f}")

CLI

# Quick demo with SyntheticEncoder
neuroprobe demo --attack pgd --steps 40 --epsilon 0.05

# Full audit with JSON output
neuroprobe audit --epsilon 0.01,0.05,0.1 --output report.json

Architecture

Stimulus ──► Encoder ──► Features (T, D) ──► Brain Model ──► BOLD (T, V)
                              │                                    │
                         neuroprobe                          ◄─ gradients
                         perturbs here                         flow back

neuroprobe attacks the feature representation between the stimulus encoder and the brain prediction head. This is the critical interface where:

  • Gradient access is guaranteed (differentiable by construction)
  • Perturbations are semantically meaningful (feature space, not pixel space)
  • Results generalize across input modalities (video, audio, text share this interface)

The BrainEncoderWrapper ABC lets you plug in any model:

from neuroprobe.wrapper import BrainEncoderWrapper

class MyModel(BrainEncoderWrapper):
    def encode_stimulus(self, stimulus):
        return my_encoder(stimulus)  # → (T, D)

    def predict_from_features(self, features):
        return my_brain_head(features)  # → (T, V), must be differentiable

Cortical ROI Definitions

13 standard regions on the fsaverage5 cortical mesh (20,484 vertices), covering:

Category Regions
Visual V1, V2, V4, MT
Ventral visual FFA (faces), PPA (places)
Auditory/speech A1, STG, STS
Language IFG (Broca's area)
Parietal TPJ
Prefrontal PFC
Motor motor cortex

Testing

pip install neuroattack[dev]
pytest tests/ -v    # 64 tests, ~60s

Citation

If you use neuroprobe in your research:

@software{neuroprobe2025,
  title={neuroprobe: Adversarial Robustness Testing for Neural Encoding Models},
  author={Zacharie B},
  year={2025},
  url={https://github.com/stef41/neuroprobe}
}

Related Work

License

Apache 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

neuroattack-0.2.0.tar.gz (33.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

neuroattack-0.2.0-py3-none-any.whl (26.1 kB view details)

Uploaded Python 3

File details

Details for the file neuroattack-0.2.0.tar.gz.

File metadata

  • Download URL: neuroattack-0.2.0.tar.gz
  • Upload date:
  • Size: 33.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for neuroattack-0.2.0.tar.gz
Algorithm Hash digest
SHA256 29573628cc7ecb895475b5569f110607d59ddf794b6e5a7aa621f1311380f253
MD5 9348acf094563933d0b59ce883efe569
BLAKE2b-256 6938061a41c493a556a406f8dc56a3eb9bd4733b2e87a77cd87f023364e78551

See more details on using hashes here.

File details

Details for the file neuroattack-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: neuroattack-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 26.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for neuroattack-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f15aa26f3fd0fe4f9682ec578248daf4d0851bcc3dd0532ad55263daccd46d66
MD5 b229a17c967f018d20638920303556b1
BLAKE2b-256 0e8ab992b926f2572fa601f13319ea73bcac07d7d13d4e508c7cba7fbe5494e5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page