A powerful 'git diff' tool for language models, providing deep behavioral and structural analysis between fine-tuned and base models.
Project description
Semantic Model Diff
A powerful, production-grade "git diff" for language models. semantic-model-diff performs deep behavioral and structural comparisons between a base language model and its fine-tuned counterpart, producing human-readable capability diff reports.
It goes beyond standard benchmarks to tell you exactly how a model's behavior changed after fine-tuning.
What is it?
When you fine-tune an LLM (e.g., using LoRA, QLoRA, or full fine-tuning), you alter its underlying weights and conceptual capabilities. Standard validation losses only tell you part of the story. semantic-model-diff actively tests both models side-by-side using a suite of internal benchmarks to evaluate shifts in reasoning, instruction following, creativity, and more.
Key Features
- 10 Core Dimensions: Evaluates models across various capabilities:
- Instruction Following
- Mathematical Reasoning
- Creative Variance
- Reasoning Depth
- Code Quality
- Factual Recall
- Context Retention
- Safety Adherence
- Response Conciseness
- Structured Output
- Layer-by-Layer Weight Analysis: Identifies exactly where in the network your adapter or fine-tuning made changes.
- Statistical Significance: Robust 95% confidence intervals via bootstrap resampling.
- Local & Fast: Runs entirely locally. Fits in 16GB CPU RAM with 4-bit quantization if needed.
- Rich Reporting: Generates beautiful Terminal reports, Markdown files, HTML pages, and machine-readable JSON reports.
- Gradio UI & Docker Integration: Full Gradio web UI and Docker Compose ready out of the box.
Installation
You can install the package directly from PyPI. For the complete set of features (including local analysis, UI, and reporting), install with the [full] flag:
pip install "semantic-model-diff[full]"
Quick Start
Command Line Interface
You can run the analysis via the CLI. Point the tool to your HuggingFace base model and your local or remote fine-tuned adapter/model.
semantic-diff analyze \
--base Qwen/Qwen2.5-3B \
--finetuned Qwen/Qwen2.5-3B-Instruct \
--dimensions instruction_following,mathematical_reasoning,creative_variance,reasoning_depth \
--tier quick \
--device cuda \
--format terminal
Options:
--base: Base model ID on Hugging Face (e.g.,google/gemma-2-2b).--finetuned: Path to local adapter/model, or Hugging Face model ID.--dimensions: Comma-separated list of capabilities to test.--tier: Determines the depth of the test (quick,standard,comprehensive).--device: Target compute device (cuda,cpu,mps).--format: Output format (terminal,html,json,markdown).
Python API
You can also use the library programmatically in your own scripts or Jupyter notebooks:
from semantic_diff import analyze_models
report = analyze_models(
base_model="Qwen/Qwen2.5-3B",
finetuned_model="Qwen/Qwen2.5-3B-Instruct",
dimensions=["instruction_following", "mathematical_reasoning"],
device="cuda"
)
print(report.summary)
Web UI
If you prefer a graphical interface, you can launch the Gradio UI:
python -m semantic_diff.ui.app
Or run it instantly via Docker:
docker-compose up ui
Creating Custom Dimensions (Plugin System)
semantic-model-diff is highly extensible. You can build and register your own custom evaluation dimensions. Check out the examples/custom_dimension.py script in the repository for a complete example on how to define and load your own rules and evaluators.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request on GitHub.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file semantic_model_diff-0.3.2.tar.gz.
File metadata
- Download URL: semantic_model_diff-0.3.2.tar.gz
- Upload date:
- Size: 51.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7942f4cd30c9e69373e5141a684a78dd1c20b7ed8b86458ac47241b913bce708
|
|
| MD5 |
f86a5e989c9ef8e4e0e9bd3d58ce7c0e
|
|
| BLAKE2b-256 |
fa88bdb45d7353ddb7fe84b691fabaaca1588f90042abd087dea5c697a5c7728
|
File details
Details for the file semantic_model_diff-0.3.2-py3-none-any.whl.
File metadata
- Download URL: semantic_model_diff-0.3.2-py3-none-any.whl
- Upload date:
- Size: 60.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cdb7645b52b5a7787583a090cd885f28f0cb3c99c8964b67a010c2b85c1cfe36
|
|
| MD5 |
a509b83fab5caa59a767b119f76992d0
|
|
| BLAKE2b-256 |
3c7c99c4e12ca0c490f2a082bd5c2fb1e4476c825a947c141e3547fcaf5fbb6a
|