Skip to main content

A Jupyter-compatible plugin that detects risky ML model and dataset loads.

Project description

MAIS - ML Model Audit & Inspection System

A Python notebook plugin that watches for potentially risky model or dataset loads in Jupyter notebooks. MAIS analyzes code in real-time to detect when you're trying to load models that might require special permissions or licensing.

Detection Architecture - V1 vs V2

MAIS offers two detection architectures that can be toggled via feature flag:

🔄 V1: Legacy Baseline Detection (Default)

  • Production-safe default for backward compatibility
  • Uses configuration-based pattern matching
  • Watches predefined function lists in config.py
  • Best for: Stable production environments

🚀 V2: Provider-Based Detection (Enhanced)

  • Specialized detectors for major ML/AI providers
  • Comprehensive coverage including patterns V1 misses
  • Provider-specific intelligence for better accuracy
  • Best for: Development and comprehensive model monitoring
Provider V1 Detection V2 Detection
HuggingFace ✅ Basic patterns ✅ Advanced + Hub integration
OpenAI Missed patterns Full API coverage
PyTorch ✅ torch.load ✅ Extended patterns
Anthropic Not detected Claude API detection
LangChain Framework blind Full framework support
LlamaIndex Not detected Document processing

Architecture Overview

MAIS uses a flexible, strategy-based architecture with multiple specialized components:

MAIS Architecture

Additional Architecture Views

View Purpose Link
📊 Dependencies Component relationships & data flow MAIS_DEPENDENCY.svg
⚡ Process Flow End-to-end analysis workflow MAIS_PROCESS.svg
🏗️ DDD Layers Domain-driven design structure MAIS_ARCHITECTURE.svg

Core Components

📥 Input Layer

Processes various types of source code inputs:

  • Source Code: Direct Python code analysis
  • Notebooks: Jupyter notebook cell analysis
  • Requirements: Dependency file scanning
  • Python Files: Static file analysis

🔍 Provider-Specific Detectors

Specialized detectors for different ML/AI providers and frameworks:

  • OpenAI: Detects GPT, DALL-E, and OpenAI API usage
  • HuggingFace: Identifies Transformers, Datasets, and Hub model loads
  • Anthropic: Catches Claude API integrations
  • LangChain: Finds LangChain components and chains
  • LlamaIndex: Detects LlamaIndex document processing

⚙️ Detection Strategies

Pluggable analysis approaches that detectors can use:

  • AST Strategy: Advanced parsing with variable resolution for complex code analysis
  • Regex Strategy: Fast pattern matching for simple detection scenarios
  • LLM-based Strategy: Future AI-powered code understanding

📊 Intermediate Output

Analysis results from provider detectors:

  • Model Findings: Detected model usage with metadata
  • Risk Assessment: Security and compliance evaluation
  • Inventory Mapping: Model-to-provider relationship mapping

📋 JSON Schema Standardization

Converts findings into structured format:

  • AI Detection JSON Schema: Standardized detection results format
  • Provider Attribution: Links findings to specific ML providers
  • Risk Categorization: Security and compliance classifications

📦 SBOM Generation

Creates comprehensive software bills of materials:

  • manifest-cli Integration: Uses external SBOM generation tools
  • SBOM Builder: Internal component for SBOM creation
  • Dependency Analysis: Maps AI/ML dependencies

📤 Output Formats

Multiple standard formats for integration:

  • CycloneDX JSON: Industry-standard SBOM format
  • SPDX JSON: Open-source license compliance format

Installation

# Using pip
pip install mais
# Import and initialize the MAIS plugin
from mais import MAIS

# V1: Default legacy detection (production-safe)
m = MAIS(api_token="<manifest-api-token>")

# V2: Enhanced provider-based detection (recommended for dev/comprehensive monitoring)
m = MAIS(api_token="<manifest-api-token>", use_v2_detectors=True)

# Now run your notebook as normal
# MAIS will monitor for potentially risky model loads

Detection Architecture Configuration

Constructor Parameter (Per Instance)

# Use V2 provider-based detection architecture
from mais import MAIS

# Enable V2 provider-based detection (default: legacy V1)
m = MAIS(api_token="token", use_v2_detectors=True)

# Use legacy detection (default)
m = MAIS(api_token="token")  # or use_v2_detectors=False

# Explicitly use legacy detection
m = MAIS(api_token="token", use_v2_detectors=False)

Google Colab Usage

Perfect for environments where you can't set environment variables:

from google.colab import userdata
api_token = userdata.get('MANIFEST_API_KEY')

from mais import MAIS
# Use V2 for comprehensive OpenAI + HuggingFace detection
m = MAIS(api_token=api_token, use_v2_detectors=True)

Advanced Usage

MAIS supports different detection strategies and provider combinations:

from mais.application.services.ast_analyzer import ASTAnalyzer

# Use default baseline detection (backward compatible)
analyzer = ASTAnalyzer()

# Or use with custom detectors
from mais.domain.model_analysis.detectors.baseline_detector import BaselineDetector
analyzer = ASTAnalyzer(detectors=[BaselineDetector()])

# Analyze code for model usage
findings = analyzer.analyze_code(your_code)

SBOM Generation

# Generate an SBOM for your project or notebook environment.
m.create_sbom(path=".", publish=False)

SBOM Publishing

m.create_sbom(path=".", publish=True)

Environment Variables

MAIS supports configuration through environment variables:

Core Configuration

  • MANIFEST_API_TOKEN - API token for MOSAIC/Manifest integration
  • MAIS_MOSAIC_API_URL - Override default API URL
  • MAIS_DEFAULT_VERBOSITY - Set default logging level
  • MAIS_API_TIMEOUT - API request timeout in seconds

All configuration values can be overridden with MAIS_ prefix.

Detection Mode Information

from mais import MAIS

m = MAIS(api_token="token", use_v2_detectors=True)

# Check current detection mode
print(m.get_detection_mode())  # "new" or "legacy"

# Get detailed detection information
info = m.get_detection_info()
print(info["detection_mode"])      # Current mode
print(info["source"])              # "constructor parameter" or "config/environment"
print(info["feature_flag"])        # Environment variable name
print(info["current_value"])       # Boolean value of feature flag

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mais-2.1.1.tar.gz (46.4 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mais-2.1.1-py3-none-any.whl (46.8 MB view details)

Uploaded Python 3

File details

Details for the file mais-2.1.1.tar.gz.

File metadata

  • Download URL: mais-2.1.1.tar.gz
  • Upload date:
  • Size: 46.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.14

File hashes

Hashes for mais-2.1.1.tar.gz
Algorithm Hash digest
SHA256 8d96cce31e3151ab81b424607f6a303398ebf03305efb541033203db4e496744
MD5 0e49ae39ff5e16d79f8f8658d95307ca
BLAKE2b-256 80afb1824733878587d79c0197232d56ace04277a23105cfe4adad7e647d0f08

See more details on using hashes here.

File details

Details for the file mais-2.1.1-py3-none-any.whl.

File metadata

  • Download URL: mais-2.1.1-py3-none-any.whl
  • Upload date:
  • Size: 46.8 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.14

File hashes

Hashes for mais-2.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 28c9ac98047a5a57fed5623541ef95222025796c8c271a5903cb773dd9a7acca
MD5 db15bdb63af9aabda113154fd72a358c
BLAKE2b-256 6ac48cab69639da703db8ef974621702937724cecd84e5ecfd00f9e5d624f949

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page