Skip to main content

A powerful 'git diff' tool for language models, providing deep behavioral and structural analysis between fine-tuned and base models.

Project description

Semantic Model Diff

PyPI version

A powerful, production-grade "git diff" for language models. semantic-model-diff performs deep behavioral and structural comparisons between a base language model and its fine-tuned counterpart, producing human-readable capability diff reports.

It goes beyond standard benchmarks to tell you exactly how a model's behavior changed after fine-tuning.

What is it?

When you fine-tune an LLM (e.g., using LoRA, QLoRA, or full fine-tuning), you alter its underlying weights and conceptual capabilities. Standard validation losses only tell you part of the story. semantic-model-diff actively tests both models side-by-side using a suite of internal benchmarks to evaluate shifts in reasoning, instruction following, creativity, and more.

Key Features

  • 10 Core Dimensions: Evaluates models across various capabilities:
    • Instruction Following
    • Mathematical Reasoning
    • Creative Variance
    • Reasoning Depth
    • Code Quality
    • Factual Recall
    • Context Retention
    • Safety Adherence
    • Response Conciseness
    • Structured Output
  • Layer-by-Layer Weight Analysis: Identifies exactly where in the network your adapter or fine-tuning made changes.
  • Statistical Significance: Robust 95% confidence intervals via bootstrap resampling.
  • Local & Fast: Runs entirely locally. Fits in 16GB CPU RAM with 4-bit quantization if needed.
  • Rich Reporting: Generates beautiful Terminal reports, Markdown files, HTML pages, and machine-readable JSON reports.
  • Gradio UI & Docker Integration: Full Gradio web UI and Docker Compose ready out of the box.

Installation

You can install the package directly from PyPI. For the complete set of features (including local analysis, UI, and reporting), install with the [full] flag:

pip install "semantic-model-diff[full]"

Quick Start

Command Line Interface

You can run the analysis via the CLI. Point the tool to your HuggingFace base model and your local or remote fine-tuned adapter/model.

semantic-diff analyze \
  --base Qwen/Qwen2.5-3B \
  --finetuned Qwen/Qwen2.5-3B-Instruct \
  --dimensions instruction_following,mathematical_reasoning,creative_variance,reasoning_depth \
  --tier quick \
  --device cuda \
  --format terminal

Options:

  • --base: Base model ID on Hugging Face (e.g., google/gemma-2-2b).
  • --finetuned: Path to local adapter/model, or Hugging Face model ID.
  • --dimensions: Comma-separated list of capabilities to test.
  • --tier: Determines the depth of the test (quick, standard, comprehensive).
  • --device: Target compute device (cuda, cpu, mps).
  • --format: Output format (terminal, html, json, markdown).

Python API

You can also use the library programmatically in your own scripts or Jupyter notebooks:

from semantic_diff import analyze_models

report = analyze_models(
    base_model="Qwen/Qwen2.5-3B",
    finetuned_model="Qwen/Qwen2.5-3B-Instruct",
    dimensions=["instruction_following", "mathematical_reasoning"],
    device="cuda"
)

print(report.summary)

Web UI

If you prefer a graphical interface, you can launch the Gradio UI:

python -m semantic_diff.ui.app

Or run it instantly via Docker:

docker-compose up ui

Creating Custom Dimensions (Plugin System)

semantic-model-diff is highly extensible. You can build and register your own custom evaluation dimensions. Check out the examples/custom_dimension.py script in the repository for a complete example on how to define and load your own rules and evaluators.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request on GitHub.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

semantic_model_diff-0.3.1.tar.gz (45.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

semantic_model_diff-0.3.1-py3-none-any.whl (53.9 kB view details)

Uploaded Python 3

File details

Details for the file semantic_model_diff-0.3.1.tar.gz.

File metadata

  • Download URL: semantic_model_diff-0.3.1.tar.gz
  • Upload date:
  • Size: 45.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for semantic_model_diff-0.3.1.tar.gz
Algorithm Hash digest
SHA256 20b0fad94f5ab74e20a45f92af6899c8e3439532589d25d4c02f95edd1f94917
MD5 75821553917d1b83c1775068de0bad7e
BLAKE2b-256 6f8cdea6bf0e21b6c2201107f5615ef62fbc7a9e3ff955c6e949ee4ff09ebbc8

See more details on using hashes here.

File details

Details for the file semantic_model_diff-0.3.1-py3-none-any.whl.

File metadata

File hashes

Hashes for semantic_model_diff-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 3ebaf9810c3ad106dd8c4c7a5961c93ee5a06399d4e75ec4d825ff0147dc9672
MD5 08ae2ff2b4d9b624e186a69e50dfc946
BLAKE2b-256 9987618091a9f203fe5ad26306994ad0270567fa16979add37f4a18e6341c27a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page