A Jupyter-compatible plugin that detects risky ML model and dataset loads.
Project description
MAIS - ML Model Audit & Inspection System
A Python notebook plugin that watches for potentially risky model or dataset loads in Jupyter notebooks. MAIS analyzes code in real-time to detect when you're trying to load models that might require special permissions or licensing.
Detection Architecture - V1 vs V2
MAIS offers two detection architectures that can be toggled via feature flag:
🔄 V1: Legacy Baseline Detection (Default)
- Production-safe default for backward compatibility
- Uses configuration-based pattern matching
- Watches predefined function lists in
config.py - Best for: Stable production environments
🚀 V2: Provider-Based Detection (Enhanced)
- Specialized detectors for major ML/AI providers
- Comprehensive coverage including patterns V1 misses
- Provider-specific intelligence for better accuracy
- Best for: Development and comprehensive model monitoring
| Provider | V1 Detection | V2 Detection |
|---|---|---|
| HuggingFace | ✅ Basic patterns | ✅ Advanced + Hub integration |
| OpenAI | ❌ Missed patterns | ✅ Full API coverage |
| PyTorch | ✅ torch.load | ✅ Extended patterns |
| Anthropic | ❌ Not detected | ✅ Claude API detection |
| LangChain | ❌ Framework blind | ✅ Full framework support |
| LlamaIndex | ❌ Not detected | ✅ Document processing |
Architecture Overview
MAIS uses a flexible, strategy-based architecture with multiple specialized components:
Additional Architecture Views
| View | Purpose | Link |
|---|---|---|
| 📊 Dependencies | Component relationships & data flow | MAIS_DEPENDENCY.svg |
| ⚡ Process Flow | End-to-end analysis workflow | MAIS_PROCESS.svg |
| 🏗️ DDD Layers | Domain-driven design structure | MAIS_ARCHITECTURE.svg |
Core Components
📥 Input Layer
Processes various types of source code inputs:
- Source Code: Direct Python code analysis
- Notebooks: Jupyter notebook cell analysis
- Requirements: Dependency file scanning
- Python Files: Static file analysis
🔍 Provider-Specific Detectors
Specialized detectors for different ML/AI providers and frameworks:
- OpenAI: Detects GPT, DALL-E, and OpenAI API usage
- HuggingFace: Identifies Transformers, Datasets, and Hub model loads
- Anthropic: Catches Claude API integrations
- LangChain: Finds LangChain components and chains
- LlamaIndex: Detects LlamaIndex document processing
⚙️ Detection Strategies
Pluggable analysis approaches that detectors can use:
- AST Strategy: Advanced parsing with variable resolution for complex code analysis
- Regex Strategy: Fast pattern matching for simple detection scenarios
- LLM-based Strategy: Future AI-powered code understanding
📊 Intermediate Output
Analysis results from provider detectors:
- Model Findings: Detected model usage with metadata
- Risk Assessment: Security and compliance evaluation
- Inventory Mapping: Model-to-provider relationship mapping
📋 JSON Schema Standardization
Converts findings into structured format:
- AI Detection JSON Schema: Standardized detection results format
- Provider Attribution: Links findings to specific ML providers
- Risk Categorization: Security and compliance classifications
📦 SBOM Generation
Creates comprehensive software bills of materials:
- manifest-cli Integration: Uses external SBOM generation tools
- SBOM Builder: Internal component for SBOM creation
- Dependency Analysis: Maps AI/ML dependencies
📤 Output Formats
Multiple standard formats for integration:
- CycloneDX JSON: Industry-standard SBOM format
- SPDX JSON: Open-source license compliance format
Installation
# Using pip
pip install mais
# Import and initialize the MAIS plugin
from mais import MAIS
# V1: Default legacy detection (production-safe)
m = MAIS(api_token="<manifest-api-token>")
# V2: Enhanced provider-based detection (recommended for dev/comprehensive monitoring)
m = MAIS(api_token="<manifest-api-token>", use_v2_detectors=True)
# Now run your notebook as normal
# MAIS will monitor for potentially risky model loads
Detection Architecture Configuration
Constructor Parameter (Per Instance)
# Use V2 provider-based detection architecture
from mais import MAIS
# Enable V2 provider-based detection (default: legacy V1)
m = MAIS(api_token="token", use_v2_detectors=True)
# Use legacy detection (default)
m = MAIS(api_token="token") # or use_v2_detectors=False
# Explicitly use legacy detection
m = MAIS(api_token="token", use_v2_detectors=False)
Google Colab Usage
Perfect for environments where you can't set environment variables:
from google.colab import userdata
api_token = userdata.get('MANIFEST_API_KEY')
from mais import MAIS
# Use V2 for comprehensive OpenAI + HuggingFace detection
m = MAIS(api_token=api_token, use_v2_detectors=True)
Advanced Usage
MAIS supports different detection strategies and provider combinations:
from mais.application.services.ast_analyzer import ASTAnalyzer
# Use default baseline detection (backward compatible)
analyzer = ASTAnalyzer()
# Or use with custom detectors
from mais.domain.model_analysis.detectors.baseline_detector import BaselineDetector
analyzer = ASTAnalyzer(detectors=[BaselineDetector()])
# Analyze code for model usage
findings = analyzer.analyze_code(your_code)
SBOM Generation
# Generate an SBOM for your project or notebook environment.
m.create_sbom(path=".", publish=False)
SBOM Publishing
m.create_sbom(path=".", publish=True)
Environment Variables
MAIS supports configuration through environment variables:
Core Configuration
MANIFEST_API_TOKEN- API token for MOSAIC/Manifest integrationMAIS_MOSAIC_API_URL- Override default API URLMAIS_DEFAULT_VERBOSITY- Set default logging levelMAIS_API_TIMEOUT- API request timeout in seconds
All configuration values can be overridden with MAIS_ prefix.
Detection Mode Information
from mais import MAIS
m = MAIS(api_token="token", use_v2_detectors=True)
# Check current detection mode
print(m.get_detection_mode()) # "new" or "legacy"
# Get detailed detection information
info = m.get_detection_info()
print(info["detection_mode"]) # Current mode
print(info["source"]) # "constructor parameter" or "config/environment"
print(info["feature_flag"]) # Environment variable name
print(info["current_value"]) # Boolean value of feature flag
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mais-2.1.2.tar.gz.
File metadata
- Download URL: mais-2.1.2.tar.gz
- Upload date:
- Size: 46.4 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d2e3854ef675628445eb7e1d4e447240f02193fdcd0148664d13917625230843
|
|
| MD5 |
aa9f75eecbd66a4d67897eba10be8df6
|
|
| BLAKE2b-256 |
08e694e6dc869cb0fb706d54748abc53e2554d86b17adc70c7d1326fbd97eb11
|
File details
Details for the file mais-2.1.2-py3-none-any.whl.
File metadata
- Download URL: mais-2.1.2-py3-none-any.whl
- Upload date:
- Size: 46.8 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e7f4e18ac2afc2df7e8fa328fdbf67020b0a8009cd54907aed0cf5625ef136b3
|
|
| MD5 |
16cbb7f4f1277f4351131c3e3d83eb57
|
|
| BLAKE2b-256 |
ebaf959c2fc9c43eb7260af24b06d610b71267accce9821f4e877f24890c6f91
|