Skip to main content

A Jupyter-compatible plugin that detects risky ML model and dataset loads.

Project description

MAIS - ML Model Audit & Inspection System

A Python notebook plugin that watches for potentially risky model or dataset loads in Jupyter notebooks. MAIS analyzes code in real-time to detect when you're trying to load models that might require special permissions or licensing.

Detection Architecture - V1 vs V2

MAIS offers two detection architectures that can be toggled via feature flag:

🔄 V1: Legacy Baseline Detection (Default)

  • Production-safe default for backward compatibility
  • Uses configuration-based pattern matching
  • Watches predefined function lists in config.py
  • Best for: Stable production environments

🚀 V2: Provider-Based Detection (Enhanced)

  • Specialized detectors for major ML/AI providers
  • Comprehensive coverage including patterns V1 misses
  • Provider-specific intelligence for better accuracy
  • Best for: Development and comprehensive model monitoring
Provider V1 Detection V2 Detection
HuggingFace ✅ Basic patterns ✅ Advanced + Hub integration
OpenAI Missed patterns Full API coverage
PyTorch ✅ torch.load ✅ Extended patterns
Anthropic Not detected Claude API detection
LangChain Framework blind Full framework support
LlamaIndex Not detected Document processing

Architecture Overview

MAIS uses a flexible, strategy-based architecture with multiple specialized components:

MAIS Architecture

Additional Architecture Views

View Purpose Link
📊 Dependencies Component relationships & data flow MAIS_DEPENDENCY.svg
⚡ Process Flow End-to-end analysis workflow MAIS_PROCESS.svg
🏗️ DDD Layers Domain-driven design structure MAIS_ARCHITECTURE.svg

Core Components

📥 Input Layer

Processes various types of source code inputs:

  • Source Code: Direct Python code analysis
  • Notebooks: Jupyter notebook cell analysis
  • Requirements: Dependency file scanning
  • Python Files: Static file analysis

🔍 Provider-Specific Detectors

Specialized detectors for different ML/AI providers and frameworks:

  • OpenAI: Detects GPT, DALL-E, and OpenAI API usage
  • HuggingFace: Identifies Transformers, Datasets, and Hub model loads
  • Anthropic: Catches Claude API integrations
  • LangChain: Finds LangChain components and chains
  • LlamaIndex: Detects LlamaIndex document processing

⚙️ Detection Strategies

Pluggable analysis approaches that detectors can use:

  • AST Strategy: Advanced parsing with variable resolution for complex code analysis
  • Regex Strategy: Fast pattern matching for simple detection scenarios
  • LLM-based Strategy: Future AI-powered code understanding

📊 Intermediate Output

Analysis results from provider detectors:

  • Model Findings: Detected model usage with metadata
  • Risk Assessment: Security and compliance evaluation
  • Inventory Mapping: Model-to-provider relationship mapping

📋 JSON Schema Standardization

Converts findings into structured format:

  • AI Detection JSON Schema: Standardized detection results format
  • Provider Attribution: Links findings to specific ML providers
  • Risk Categorization: Security and compliance classifications

📦 SBOM Generation

Creates comprehensive software bills of materials:

  • manifest-cli Integration: Uses external SBOM generation tools
  • SBOM Builder: Internal component for SBOM creation
  • Dependency Analysis: Maps AI/ML dependencies

📤 Output Formats

Multiple standard formats for integration:

  • CycloneDX JSON: Industry-standard SBOM format
  • SPDX JSON: Open-source license compliance format

Installation

# Using pip
pip install mais
# Import and initialize the MAIS plugin
from mais import MAIS

# V1: Default legacy detection (production-safe)
m = MAIS(api_token="<manifest-api-token>")

# V2: Enhanced provider-based detection (recommended for dev/comprehensive monitoring)
m = MAIS(api_token="<manifest-api-token>", use_v2_detectors=True)

# Now run your notebook as normal
# MAIS will monitor for potentially risky model loads

Detection Architecture Configuration

Constructor Parameter (Per Instance)

# Use V2 provider-based detection architecture
from mais import MAIS

# Enable V2 provider-based detection (default: legacy V1)
m = MAIS(api_token="token", use_v2_detectors=True)

# Use legacy detection (default)
m = MAIS(api_token="token")  # or use_v2_detectors=False

# Explicitly use legacy detection
m = MAIS(api_token="token", use_v2_detectors=False)

Google Colab Usage

Perfect for environments where you can't set environment variables:

from google.colab import userdata
api_token = userdata.get('MANIFEST_API_KEY')

from mais import MAIS
# Use V2 for comprehensive OpenAI + HuggingFace detection
m = MAIS(api_token=api_token, use_v2_detectors=True)

Advanced Usage

MAIS supports different detection strategies and provider combinations:

from mais.application.services.ast_analyzer import ASTAnalyzer

# Use default baseline detection (backward compatible)
analyzer = ASTAnalyzer()

# Or use with custom detectors
from mais.domain.model_analysis.detectors.baseline_detector import BaselineDetector
analyzer = ASTAnalyzer(detectors=[BaselineDetector()])

# Analyze code for model usage
findings = analyzer.analyze_code(your_code)

SBOM Generation

# Generate an SBOM for your project or notebook environment.
m.create_sbom(path=".", publish=False)

SBOM Publishing

m.create_sbom(path=".", publish=True)

Environment Variables

MAIS supports configuration through environment variables:

Core Configuration

  • MANIFEST_API_TOKEN - API token for MOSAIC/Manifest integration
  • MAIS_MOSAIC_API_URL - Override default API URL
  • MAIS_DEFAULT_VERBOSITY - Set default logging level
  • MAIS_API_TIMEOUT - API request timeout in seconds

All configuration values can be overridden with MAIS_ prefix.

Detection Mode Information

from mais import MAIS

m = MAIS(api_token="token", use_v2_detectors=True)

# Check current detection mode
print(m.get_detection_mode())  # "new" or "legacy"

# Get detailed detection information
info = m.get_detection_info()
print(info["detection_mode"])      # Current mode
print(info["source"])              # "constructor parameter" or "config/environment"
print(info["feature_flag"])        # Environment variable name
print(info["current_value"])       # Boolean value of feature flag

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mais-2.1.2.tar.gz (46.4 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mais-2.1.2-py3-none-any.whl (46.8 MB view details)

Uploaded Python 3

File details

Details for the file mais-2.1.2.tar.gz.

File metadata

  • Download URL: mais-2.1.2.tar.gz
  • Upload date:
  • Size: 46.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.14

File hashes

Hashes for mais-2.1.2.tar.gz
Algorithm Hash digest
SHA256 d2e3854ef675628445eb7e1d4e447240f02193fdcd0148664d13917625230843
MD5 aa9f75eecbd66a4d67897eba10be8df6
BLAKE2b-256 08e694e6dc869cb0fb706d54748abc53e2554d86b17adc70c7d1326fbd97eb11

See more details on using hashes here.

File details

Details for the file mais-2.1.2-py3-none-any.whl.

File metadata

  • Download URL: mais-2.1.2-py3-none-any.whl
  • Upload date:
  • Size: 46.8 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.14

File hashes

Hashes for mais-2.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 e7f4e18ac2afc2df7e8fa328fdbf67020b0a8009cd54907aed0cf5625ef136b3
MD5 16cbb7f4f1277f4351131c3e3d83eb57
BLAKE2b-256 ebaf959c2fc9c43eb7260af24b06d610b71267accce9821f4e877f24890c6f91

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page