Skip to main content

A powerful 'git diff' tool for language models, providing deep behavioral and structural analysis between fine-tuned and base models.

Project description

Semantic Model Diff

PyPI version

A powerful, production-grade "git diff" for language models. semantic-model-diff performs deep behavioral and structural comparisons between a base language model and its fine-tuned counterpart, producing human-readable capability diff reports.

It goes beyond standard benchmarks to tell you exactly how a model's behavior changed after fine-tuning.

What is it?

When you fine-tune an LLM (e.g., using LoRA, QLoRA, or full fine-tuning), you alter its underlying weights and conceptual capabilities. Standard validation losses only tell you part of the story. semantic-model-diff actively tests both models side-by-side using a suite of internal benchmarks to evaluate shifts in reasoning, instruction following, creativity, and more.

Key Features

  • 10 Core Dimensions: Evaluates models across various capabilities:
    • Instruction Following
    • Mathematical Reasoning
    • Creative Variance
    • Reasoning Depth
    • Code Quality
    • Factual Recall
    • Context Retention
    • Safety Adherence
    • Response Conciseness
    • Structured Output
  • Layer-by-Layer Weight Analysis: Identifies exactly where in the network your adapter or fine-tuning made changes.
  • Statistical Significance: Robust 95% confidence intervals via bootstrap resampling.
  • Local & Fast: Runs entirely locally. Fits in 16GB CPU RAM with 4-bit quantization if needed.
  • Rich Reporting: Generates beautiful Terminal reports, Markdown files, HTML pages, and machine-readable JSON reports.
  • Gradio UI & Docker Integration: Full Gradio web UI and Docker Compose ready out of the box.

Installation

You can install the package directly from PyPI. For the complete set of features (including local analysis, UI, and reporting), install with the [full] flag:

pip install "semantic-model-diff[full]"

Quick Start

Command Line Interface

You can run the analysis via the CLI. Point the tool to your HuggingFace base model and your local or remote fine-tuned adapter/model.

semantic-diff analyze \
  --base Qwen/Qwen2.5-3B \
  --finetuned Qwen/Qwen2.5-3B-Instruct \
  --dimensions instruction_following,mathematical_reasoning,creative_variance,reasoning_depth \
  --tier quick \
  --device cuda \
  --format terminal

Options:

  • --base: Base model ID on Hugging Face (e.g., google/gemma-2-2b).
  • --finetuned: Path to local adapter/model, or Hugging Face model ID.
  • --dimensions: Comma-separated list of capabilities to test.
  • --tier: Determines the depth of the test (quick, standard, comprehensive).
  • --device: Target compute device (cuda, cpu, mps).
  • --format: Output format (terminal, html, json, markdown).

Python API

You can also use the library programmatically in your own scripts or Jupyter notebooks:

from semantic_diff import analyze_models

report = analyze_models(
    base_model="Qwen/Qwen2.5-3B",
    finetuned_model="Qwen/Qwen2.5-3B-Instruct",
    dimensions=["instruction_following", "mathematical_reasoning"],
    device="cuda"
)

print(report.summary)

Web UI

If you prefer a graphical interface, you can launch the Gradio UI:

python -m semantic_diff.ui.app

Or run it instantly via Docker:

docker-compose up ui

Creating Custom Dimensions (Plugin System)

semantic-model-diff is highly extensible. You can build and register your own custom evaluation dimensions. Check out the examples/custom_dimension.py script in the repository for a complete example on how to define and load your own rules and evaluators.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request on GitHub.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

semantic_model_diff-0.3.2.tar.gz (51.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

semantic_model_diff-0.3.2-py3-none-any.whl (60.0 kB view details)

Uploaded Python 3

File details

Details for the file semantic_model_diff-0.3.2.tar.gz.

File metadata

  • Download URL: semantic_model_diff-0.3.2.tar.gz
  • Upload date:
  • Size: 51.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for semantic_model_diff-0.3.2.tar.gz
Algorithm Hash digest
SHA256 7942f4cd30c9e69373e5141a684a78dd1c20b7ed8b86458ac47241b913bce708
MD5 f86a5e989c9ef8e4e0e9bd3d58ce7c0e
BLAKE2b-256 fa88bdb45d7353ddb7fe84b691fabaaca1588f90042abd087dea5c697a5c7728

See more details on using hashes here.

File details

Details for the file semantic_model_diff-0.3.2-py3-none-any.whl.

File metadata

File hashes

Hashes for semantic_model_diff-0.3.2-py3-none-any.whl
Algorithm Hash digest
SHA256 cdb7645b52b5a7787583a090cd885f28f0cb3c99c8964b67a010c2b85c1cfe36
MD5 a509b83fab5caa59a767b119f76992d0
BLAKE2b-256 3c7c99c4e12ca0c490f2a082bd5c2fb1e4476c825a947c141e3547fcaf5fbb6a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page