Skip to main content

A powerful 'git diff' tool for language models, providing deep behavioral and structural analysis between fine-tuned and base models.

Project description

Semantic Model Diff

PyPI version

A powerful, production-grade "git diff" for language models. semantic-model-diff performs deep behavioral and structural comparisons between a base language model and its fine-tuned counterpart, producing human-readable capability diff reports.

It goes beyond standard benchmarks to tell you exactly how a model's behavior changed after fine-tuning.

What is it?

When you fine-tune an LLM (e.g., using LoRA, QLoRA, or full fine-tuning), you alter its underlying weights and conceptual capabilities. Standard validation losses only tell you part of the story. semantic-model-diff actively tests both models side-by-side using a suite of internal benchmarks to evaluate shifts in reasoning, instruction following, creativity, and more.

Key Features

  • 10 Core Dimensions: Evaluates models across various capabilities:
    • Instruction Following
    • Mathematical Reasoning
    • Creative Variance
    • Reasoning Depth
    • Code Quality
    • Factual Recall
    • Context Retention
    • Safety Adherence
    • Response Conciseness
    • Structured Output
  • Layer-by-Layer Weight Analysis: Identifies exactly where in the network your adapter or fine-tuning made changes.
  • Statistical Significance: Robust 95% confidence intervals via bootstrap resampling.
  • Local & Fast: Runs entirely locally. Fits in 16GB CPU RAM with 4-bit quantization if needed.
  • Rich Reporting: Generates beautiful Terminal reports, Markdown files, HTML pages, and machine-readable JSON reports.
  • Gradio UI & Docker Integration: Full Gradio web UI and Docker Compose ready out of the box.

Installation

You can install the package directly from PyPI. For the complete set of features (including local analysis, UI, and reporting), install with the [full] flag:

pip install "semantic-model-diff[full]"

Quick Start

Command Line Interface

You can run the analysis via the CLI. Point the tool to your HuggingFace base model and your local or remote fine-tuned adapter/model.

semantic-diff analyze \
  --base Qwen/Qwen2.5-3B \
  --finetuned Qwen/Qwen2.5-3B-Instruct \
  --dimensions instruction_following,mathematical_reasoning,creative_variance,reasoning_depth \
  --tier quick \
  --device cuda \
  --format terminal

Options:

  • --base: Base model ID on Hugging Face (e.g., google/gemma-2-2b).
  • --finetuned: Path to local adapter/model, or Hugging Face model ID.
  • --dimensions: Comma-separated list of capabilities to test.
  • --tier: Determines the depth of the test (quick, standard, comprehensive).
  • --device: Target compute device (cuda, cpu, mps).
  • --format: Output format (terminal, html, json, markdown).

Python API

You can also use the library programmatically in your own scripts or Jupyter notebooks:

from semantic_diff import analyze_models

report = analyze_models(
    base_model="Qwen/Qwen2.5-3B",
    finetuned_model="Qwen/Qwen2.5-3B-Instruct",
    dimensions=["instruction_following", "mathematical_reasoning"],
    device="cuda"
)

print(report.summary)

Web UI

If you prefer a graphical interface, you can launch the Gradio UI:

python -m semantic_diff.ui.app

Or run it instantly via Docker:

docker-compose up ui

Creating Custom Dimensions (Plugin System)

semantic-model-diff is highly extensible. You can build and register your own custom evaluation dimensions. Check out the examples/custom_dimension.py script in the repository for a complete example on how to define and load your own rules and evaluators.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request on GitHub.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

semantic_model_diff-0.3.0.tar.gz (41.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

semantic_model_diff-0.3.0-py3-none-any.whl (50.5 kB view details)

Uploaded Python 3

File details

Details for the file semantic_model_diff-0.3.0.tar.gz.

File metadata

  • Download URL: semantic_model_diff-0.3.0.tar.gz
  • Upload date:
  • Size: 41.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for semantic_model_diff-0.3.0.tar.gz
Algorithm Hash digest
SHA256 ed47e88e62d8b7b0352e29c80a68ebb107c55ec1e33dcc109911456b2ca1efff
MD5 4db175ad170bcd06120834c44d9b7442
BLAKE2b-256 70ce965943ff227022a23c6a470172c2335a4a05318ac7827582349884f2177b

See more details on using hashes here.

File details

Details for the file semantic_model_diff-0.3.0-py3-none-any.whl.

File metadata

File hashes

Hashes for semantic_model_diff-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4daf6fa66907275f083af58d0b476543ea5cc53a3a69022786921c06b01e98dc
MD5 dfd3a4434350a18d8177c99b30350849
BLAKE2b-256 f27d759f77901538a2a0451c51721f4c4007412b1227bf23c242c37e71897ba1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page