Skip to main content

A Jupyter-compatible plugin that detects risky ML model and dataset loads.

Project description

MAIS - ML Model Audit & Inspection System

A Python notebook plugin that watches for potentially risky model or dataset loads in Jupyter notebooks. MAIS analyzes code in real-time to detect when you're trying to load models that might require special permissions or licensing.

Detection Architecture - V1 vs V2

MAIS offers two detection architectures that can be toggled via feature flag:

🔄 V1: Legacy Baseline Detection (Default)

  • Production-safe default for backward compatibility
  • Uses configuration-based pattern matching
  • Watches predefined function lists in config.py
  • Best for: Stable production environments

🚀 V2: Provider-Based Detection (Enhanced)

  • Specialized detectors for major ML/AI providers
  • Comprehensive coverage including patterns V1 misses
  • Provider-specific intelligence for better accuracy
  • Best for: Development and comprehensive model monitoring
Provider V1 Detection V2 Detection
HuggingFace ✅ Basic patterns ✅ Advanced + Hub integration
OpenAI Missed patterns Full API coverage
PyTorch ✅ torch.load ✅ Extended patterns
Anthropic Not detected Claude API detection
LangChain Framework blind Full framework support
LlamaIndex Not detected Document processing

Architecture Overview

MAIS uses a flexible, strategy-based architecture with multiple specialized components:

MAIS Architecture

Additional Architecture Views

View Purpose Link
📊 Dependencies Component relationships & data flow MAIS_DEPENDENCY.svg
⚡ Process Flow End-to-end analysis workflow MAIS_PROCESS.svg
🏗️ DDD Layers Domain-driven design structure MAIS_ARCHITECTURE.svg

Core Components

📥 Input Layer

Processes various types of source code inputs:

  • Source Code: Direct Python code analysis
  • Notebooks: Jupyter notebook cell analysis
  • Requirements: Dependency file scanning
  • Python Files: Static file analysis

🔍 Provider-Specific Detectors

Specialized detectors for different ML/AI providers and frameworks:

  • OpenAI: Detects GPT, DALL-E, and OpenAI API usage
  • HuggingFace: Identifies Transformers, Datasets, and Hub model loads
  • Anthropic: Catches Claude API integrations
  • LangChain: Finds LangChain components and chains
  • LlamaIndex: Detects LlamaIndex document processing

⚙️ Detection Strategies

Pluggable analysis approaches that detectors can use:

  • AST Strategy: Advanced parsing with variable resolution for complex code analysis
  • Regex Strategy: Fast pattern matching for simple detection scenarios
  • LLM-based Strategy: Future AI-powered code understanding

📊 Intermediate Output

Analysis results from provider detectors:

  • Model Findings: Detected model usage with metadata
  • Risk Assessment: Security and compliance evaluation
  • Inventory Mapping: Model-to-provider relationship mapping

📋 JSON Schema Standardization

Converts findings into structured format:

  • AI Detection JSON Schema: Standardized detection results format
  • Provider Attribution: Links findings to specific ML providers
  • Risk Categorization: Security and compliance classifications

📦 SBOM Generation

Creates comprehensive software bills of materials:

  • manifest-cli Integration: Uses external SBOM generation tools
  • SBOM Builder: Internal component for SBOM creation
  • Dependency Analysis: Maps AI/ML dependencies

📤 Output Formats

Multiple standard formats for integration:

  • CycloneDX JSON: Industry-standard SBOM format
  • SPDX JSON: Open-source license compliance format

Installation

# Using pip
pip install mais
# Import and initialize the MAIS plugin
from mais import MAIS

# V1: Default legacy detection (production-safe)
m = MAIS(api_token="<manifest-api-token>")

# V2: Enhanced provider-based detection (recommended for dev/comprehensive monitoring)
m = MAIS(api_token="<manifest-api-token>", use_v2_detectors=True)

# Now run your notebook as normal
# MAIS will monitor for potentially risky model loads

Detection Architecture Configuration

Constructor Parameter (Per Instance)

# Use V2 provider-based detection architecture
from mais import MAIS

# Enable V2 provider-based detection (default: legacy V1)
m = MAIS(api_token="token", use_v2_detectors=True)

# Use legacy detection (default)
m = MAIS(api_token="token")  # or use_v2_detectors=False

# Explicitly use legacy detection
m = MAIS(api_token="token", use_v2_detectors=False)

Google Colab Usage

Perfect for environments where you can't set environment variables:

from google.colab import userdata
api_token = userdata.get('MANIFEST_API_KEY')

from mais import MAIS
# Use V2 for comprehensive OpenAI + HuggingFace detection
m = MAIS(api_token=api_token, use_v2_detectors=True)

Advanced Usage

MAIS supports different detection strategies and provider combinations:

from mais.application.services.ast_analyzer import ASTAnalyzer

# Use default baseline detection (backward compatible)
analyzer = ASTAnalyzer()

# Or use with custom detectors
from mais.domain.model_analysis.detectors.baseline_detector import BaselineDetector
analyzer = ASTAnalyzer(detectors=[BaselineDetector()])

# Analyze code for model usage
findings = analyzer.analyze_code(your_code)

SBOM Generation

# Generate an SBOM for your project or notebook environment.
m.create_sbom(path=".", publish=False)

SBOM Publishing

m.create_sbom(path=".", publish=True)

Environment Variables

MAIS supports configuration through environment variables:

Core Configuration

  • MANIFEST_API_TOKEN - API token for MOSAIC/Manifest integration
  • MAIS_MOSAIC_API_URL - Override default API URL
  • MAIS_DEFAULT_VERBOSITY - Set default logging level
  • MAIS_API_TIMEOUT - API request timeout in seconds

All configuration values can be overridden with MAIS_ prefix.

Detection Mode Information

from mais import MAIS

m = MAIS(api_token="token", use_v2_detectors=True)

# Check current detection mode
print(m.get_detection_mode())  # "new" or "legacy"

# Get detailed detection information
info = m.get_detection_info()
print(info["detection_mode"])      # Current mode
print(info["source"])              # "constructor parameter" or "config/environment"
print(info["feature_flag"])        # Environment variable name
print(info["current_value"])       # Boolean value of feature flag

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mais-2.1.3.tar.gz (46.4 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mais-2.1.3-py3-none-any.whl (46.8 MB view details)

Uploaded Python 3

File details

Details for the file mais-2.1.3.tar.gz.

File metadata

  • Download URL: mais-2.1.3.tar.gz
  • Upload date:
  • Size: 46.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.14

File hashes

Hashes for mais-2.1.3.tar.gz
Algorithm Hash digest
SHA256 b48594849561a30399fdc7d34f6a6dc13b947e35b4fec927a1d830a5eef05920
MD5 651bd29db0b40f23be4bac0bd8b4ae03
BLAKE2b-256 21e01db107be0df46fb9af7e2339e983470fd5f47a95bca5d16ca38d6516ed70

See more details on using hashes here.

File details

Details for the file mais-2.1.3-py3-none-any.whl.

File metadata

  • Download URL: mais-2.1.3-py3-none-any.whl
  • Upload date:
  • Size: 46.8 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.14

File hashes

Hashes for mais-2.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 9b4ec5f61e54af62b46370d77116221d72ef1b1977649c9225c014ce28e005fc
MD5 de43b0bfc8860630be793599c07ea82d
BLAKE2b-256 7af5bd5bf2de38a51f73adcdc8c8547f64881d6bfd5c5751a72e4e21c2659f7b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page