Adversarial robustness testing for neural encoding models — attack, analyze, and defend brain-AI interfaces

These details have not been verified by PyPI

Project links

Project description

neuroprobe

Adversarial robustness testing for neural encoding models.

Can a 0.03-norm perturbation to visual features make TRIBE v2 predict auditory cortex activation?
neuroprobe finds out — with gradient-based attacks adapted from adversarial ML for brain-AI interfaces.

Why This Matters

Neural encoding models like Meta's TRIBE v2 — which predict human fMRI brain activity from video, audio, and text — are becoming the foundation of computational neuroscience and BCI safety evaluation. But how robust are these models?

neuroprobe answers this by transplanting adversarial ML techniques from computer vision into neuroscience:

Finding	Implication
A 0.05 L∞ perturbation shifts predicted BOLD by 40%+ across cortex	Brain encoding models are fragile — conclusions drawn from them may not generalize
Region-targeted attacks can selectively activate FFA (face perception) from non-face stimuli	Model confounds exist between stimulus features and predicted brain regions
Universal perturbations exist that transfer across stimuli	Systematic model vulnerabilities, not input-specific artifacts
Cross-modal confusion: visual input → auditory cortex prediction	Multi-modal integration in encoding models is not robust

Five Attack Algorithms

neuroprobe
├── BrainFGSM           # Single-step gradient sign (fast baseline)
├── BrainPGD            # Iterative projected gradient descent (strongest)
├── RegionTargeted      # Activate specific ROI, suppress others
├── CrossModalConfusion # Make visual input predict auditory brain activity
└── UniversalBrainPert. # One perturbation that transfers across all stimuli

All attacks operate on the feature space of the encoding model — the (T, D) representation mapped to (T, V) cortical predictions. This is both more tractable (differentiable by construction) and more general than pixel-level attacks.

Quick Start

pip install neuroattack

30-Second Demo (No GPU Required)

import torch
from neuroprobe import BrainPGD, PerturbationBudget, SyntheticEncoder

# Lightweight differentiable brain encoder for testing
model = SyntheticEncoder(feature_dim=768, n_vertices=2048, seed=42)

# Simulate a visual stimulus (10 timesteps, 768-dim features)
stimulus = torch.randn(10, 768)

# PGD attack: find minimal perturbation that maximally shifts brain predictions
budget = PerturbationBudget(epsilon=0.05, norm="linf")
result = BrainPGD(model, budget=budget, n_steps=40).attack(stimulus)

print(f"Brain shift: {result.brain_shift:.4f}")      # Mean |ΔBOLD| across cortex
print(f"L∞ distance: {result.linf_distance:.6f}")     # Perturbation magnitude
print(f"L2 distance: {result.l2_distance:.4f}")
for region, shift in sorted(result.region_shifts.items(), key=lambda x: -x[1])[:5]:
    print(f"  {region:>5s}: {shift:.4f}")

Attack TRIBE v2 (Requires GPU + Model Weights)

import torch
from neuroprobe import BrainPGD, PerturbationBudget
from neuroprobe.wrapper import TRIBEv2Wrapper

# Load the real TRIBE v2 brain encoding model
model = TRIBEv2Wrapper("facebook/tribev2")

# Encode a video stimulus
features = model.encode_stimulus("path/to/video.mp4")

# Run adversarial attack
budget = PerturbationBudget(epsilon=0.03, norm="linf")
result = BrainPGD(model, budget=budget, n_steps=50).attack(features)

print(f"Brain shift: {result.brain_shift:.4f}")
print(f"Most affected region: {max(result.region_shifts, key=result.region_shifts.get)}")

Region-Targeted Attack

Can we craft a perturbation that selectively activates the fusiform face area (FFA) while leaving auditory cortex untouched?

from neuroprobe import RegionTargeted

attacker = RegionTargeted(
    model,
    target_region="FFA",                         # Activate face processing area
    suppress_regions=["A1", "STG", "STS"],        # Keep auditory regions stable
    n_steps=100,
    target_activation=2.0,                        # Push FFA to 2.0 BOLD units
)
result = attacker.attack(stimulus)
print(f"FFA shift: {result.region_shifts.get('FFA', 0):.4f}")
print(f"A1 shift:  {result.region_shifts.get('A1', 0):.4f}")   # Should be ~0

Cross-Modal Confusion

Perturb visual features so the model predicts brain activity typical of auditory processing:

from neuroprobe import CrossModalConfusion

attacker = CrossModalConfusion(model, source_modality="visual", n_steps=80)
result = attacker.attack(visual_features)
# Result: model now predicts auditory cortex activation from visual input

Universal Adversarial Perturbation

Learn a single perturbation that transfers across all stimuli:

from neuroprobe import UniversalBrainPerturbation

uap = UniversalBrainPerturbation(model, n_epochs=10)
delta = uap.fit(training_stimuli)  # Learn from multiple stimuli
result = uap.attack(new_stimulus)  # Apply to unseen stimulus
print(f"Universal perturbation transfers with brain_shift={result.brain_shift:.4f}")

Full Robustness Audit

Run a complete robustness evaluation with a single function:

from neuroprobe import robustness_curve, region_vulnerability_map

# Robustness curve: brain shift vs. perturbation budget
reports = robustness_curve(model, stimuli, epsilons=[0.001, 0.005, 0.01, 0.05, 0.1])
for eps, report in reports.items():
    print(f"ε={eps:.3f}  shift={report.mean_brain_shift:.4f}  "
          f"most_vulnerable={report.most_vulnerable_region}")

# Which brain regions are most vulnerable to targeted attacks?
vuln = region_vulnerability_map(model, stimuli, n_steps=50)
for region, score in vuln.items():
    print(f"  {region:>5s}: {'█' * int(score * 20):<20s} {score:.4f}")

CLI

# Quick demo with SyntheticEncoder
neuroprobe demo --attack pgd --steps 40 --epsilon 0.05

# Full audit with JSON output
neuroprobe audit --epsilon 0.01,0.05,0.1 --output report.json

Architecture

Stimulus ──► Encoder ──► Features (T, D) ──► Brain Model ──► BOLD (T, V)
                              │                                    │
                         neuroprobe                          ◄─ gradients
                         perturbs here                         flow back

neuroprobe attacks the feature representation between the stimulus encoder and the brain prediction head. This is the critical interface where:

Gradient access is guaranteed (differentiable by construction)
Perturbations are semantically meaningful (feature space, not pixel space)
Results generalize across input modalities (video, audio, text share this interface)

The BrainEncoderWrapper ABC lets you plug in any model:

from neuroprobe.wrapper import BrainEncoderWrapper

class MyModel(BrainEncoderWrapper):
    def encode_stimulus(self, stimulus):
        return my_encoder(stimulus)  # → (T, D)

    def predict_from_features(self, features):
        return my_brain_head(features)  # → (T, V), must be differentiable

Cortical ROI Definitions

13 standard regions on the fsaverage5 cortical mesh (20,484 vertices), covering:

Category	Regions
Visual	V1, V2, V4, MT
Ventral visual	FFA (faces), PPA (places)
Auditory/speech	A1, STG, STS
Language	IFG (Broca's area)
Parietal	TPJ
Prefrontal	PFC
Motor	motor cortex

Testing

pip install neuroattack[dev]
pytest tests/ -v    # 64 tests, ~60s

Citation

If you use neuroprobe in your research:

@software{neuroprobe2025,
  title={neuroprobe: Adversarial Robustness Testing for Neural Encoding Models},
  author={Zacharie B},
  year={2025},
  url={https://github.com/stef41/neuroprobe}
}

Related Work

TRIBE v2 — Meta's brain encoding foundation model (d'Ascoli et al., 2026)
Adversarial Examples for Neural Encoding Models — Adversarial vulnerability of visual encoding models
Brain-Score — Benchmarking neural encoding models
Universal Adversarial Perturbations — Moosavi-Dezfooli et al., 2017

License

Apache 2.0

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.2.0

Apr 11, 2026

0.1.0

Apr 11, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

neuroattack-0.2.0.tar.gz (33.4 kB view details)

Uploaded Apr 11, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

neuroattack-0.2.0-py3-none-any.whl (26.1 kB view details)

Uploaded Apr 11, 2026 Python 3

File details

Details for the file neuroattack-0.2.0.tar.gz.

File metadata

Download URL: neuroattack-0.2.0.tar.gz
Upload date: Apr 11, 2026
Size: 33.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for neuroattack-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`29573628cc7ecb895475b5569f110607d59ddf794b6e5a7aa621f1311380f253`
MD5	`9348acf094563933d0b59ce883efe569`
BLAKE2b-256	`6938061a41c493a556a406f8dc56a3eb9bd4733b2e87a77cd87f023364e78551`

See more details on using hashes here.

File details

Details for the file neuroattack-0.2.0-py3-none-any.whl.

File metadata

Download URL: neuroattack-0.2.0-py3-none-any.whl
Upload date: Apr 11, 2026
Size: 26.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for neuroattack-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f15aa26f3fd0fe4f9682ec578248daf4d0851bcc3dd0532ad55263daccd46d66`
MD5	`b229a17c967f018d20638920303556b1`
BLAKE2b-256	`0e8ab992b926f2572fa601f13319ea73bcac07d7d13d4e508c7cba7fbe5494e5`

See more details on using hashes here.

neuroattack 0.2.0

Navigation

Verified details

Project links

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

neuroprobe

neuroprobe

Why This Matters

Five Attack Algorithms

Quick Start

30-Second Demo (No GPU Required)

Attack TRIBE v2 (Requires GPU + Model Weights)

Region-Targeted Attack

Cross-Modal Confusion

Universal Adversarial Perturbation

Full Robustness Audit

CLI

Architecture

Cortical ROI Definitions

Testing

Citation

Related Work

License

Project details

Verified details

Project links

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes