Black-box LLM fingerprinting system for model identification
Project description
LLM Fingerprinting System
A black-box fingerprinting system that identifies the underlying LLM model family (GPT, LLaMA, Mistral, etc.) by analysing response patterns across 31 carefully selected prompts. The system can identify fine-tuned models as well, tracing them back to their foundational base model.
Note: Check config.py to see all identifiable model families.
A pre-trained classifier is bundled with the package in the model/ directory.
How It Works
Fingerprinting runs in three sequential layers:
-
31 prompts across 3 layers (discriminative → behavioral → stylistic):
- Discriminative (11): Identity, knowledge cutoff, architecture, reasoning — most separating power
- Behavioral (7): Safety boundaries, jailbreak resistance, honesty, policy handling
- Stylistic (13): Formatting, creativity, constraint following, default voice
-
Feature extraction per response: 384-dim sentence embeddings + 12 linguistic features + 6 behavioral features = 402 dims per layer, 1206 dims total
-
Embedding rebalancing: Per-layer PCA compresses 384-dim embeddings to 64 dims → 246-dim working space
-
Ensemble classification: Random Forest (45%) + SVM (45%) + MLP (10%)
-
Two-stage identification: Ensemble → model family, Template classifier → specific model version
-
Early stopping: After each layer the classifier checks confidence — if it exceeds the threshold (default 0.95) the remaining layers are skipped, saving API calls.
Supported Backends
| Backend | Description | API Key Required |
|---|---|---|
ollama |
Local Ollama instance | ❌ No |
ollama-cloud |
Ollama Cloud API | ✅ OLLAMA_CLOUD_API_KEY |
openai |
OpenAI API (or compatible) | ✅ OPENAI_API_KEY |
gemini |
Gemini API | ✅ GEMINI_API_KEY |
custom |
Any HTTP-based LLM API | ✅ Optional |
About the Custom Backend
The custom backend is the most flexible option — use it with:
- Proprietary LLM APIs not natively supported
- Self-hosted LLMs behind HTTP endpoints
- API proxies and gateways
- Any HTTP-based LLM service
All you need is an HTTP request template file. See examples in ./example/.
Installation
From PyPI
# Core package
pip install llm-fingerprinter
# With OpenAI support
pip install llm-fingerprinter[openai]
# With Gemini support
pip install llm-fingerprinter[gemini]
# With all backends
pip install llm-fingerprinter[all]
Quick Start
1. Identify a Model (Pre-trained Classifier)
# Local Ollama
llm-fingerprinter identify -b ollama --model llama3.2
# OpenAI
export OPENAI_API_KEY="your-key"
llm-fingerprinter identify -b openai --model gpt-4o-mini
# Custom endpoint
llm-fingerprinter identify -b custom -r ./custom_request.txt
2. Train Your Own Classifier
# Step 1: Generate training fingerprints for each family
# Temperature is automatically varied across simulations for diversity
llm-fingerprinter simulate -b ollama --model llama3.2 --family llama --num-sims 5
llm-fingerprinter simulate -b openai --model gpt-4o-mini --family gpt --num-sims 5
# Step 2: Train the ensemble classifier
llm-fingerprinter train
# Step 3: Build template classifiers (for two-stage identification)
llm-fingerprinter build-templates
llm-fingerprinter build-model-templates
# Step 4: Identify unknown models
llm-fingerprinter identify -b ollama --model some-unknown-model
build-templates — Build Family Template Classifier
Compute per-family mean vectors from training fingerprints for the open-set template classifier. Run after train.
llm-fingerprinter build-templates
The template classifier uses cosine distance to nearest mean — it doesn't require retraining when adding new families.
build-model-templates — Build Model-Level Templates
Build templates at the specific model version level (e.g. gpt-4o-mini vs gpt-4.1) for two-stage identification.
llm-fingerprinter build-model-templates
Requires fingerprints that contain model_name in their metadata (all fingerprints generated with simulate on this version do).
add-family — Add a New Family Without Retraining
Add a new model family to the template classifier from a few fingerprint samples, without retraining the full ensemble.
llm-fingerprinter add-family --model deepseek-chat --family deepseek --num-fps 3 -b deepseek
Recommended minimum: 3 fingerprints for a reliable mean template.
Environment Variables
| Variable | Backend | Description |
|---|---|---|
OLLAMA_CLOUD_API_KEY |
ollama-cloud | Ollama Cloud API key |
OPENAI_API_KEY |
openai | OpenAI API key |
GEMINI_API_KEY |
gemini | Gemini API key |
DEEPSEEK_API_KEY |
deepseek | DeepSeek API key |
LOG_LEVEL |
all | Logging level (DEBUG, INFO, WARNING) |
LLM_FINGERPRINTER_DATA |
all | Override data directory (fingerprints, model, logs) |
License
MIT License
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llm_fingerprinter-0.4.1.tar.gz.
File metadata
- Download URL: llm_fingerprinter-0.4.1.tar.gz
- Upload date:
- Size: 3.0 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
600e8cb70ffc5b6f274604194232210ec737ed8b1ecd02bf0b03d510623d8241
|
|
| MD5 |
d8ec79ec5e8a4b7d7bf16f6e29490bc6
|
|
| BLAKE2b-256 |
e49adb5c648ab82af1b94388f2374b2aa370d1201a436f11c8120250505d41f2
|
File details
Details for the file llm_fingerprinter-0.4.1-py3-none-any.whl.
File metadata
- Download URL: llm_fingerprinter-0.4.1-py3-none-any.whl
- Upload date:
- Size: 3.0 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a05608d823e258d96c688d2b8ab232a1c6b7cc11ed63b204ad4db877526b8b0a
|
|
| MD5 |
ce1f0d5aed1a54243cf77c1ade4545c1
|
|
| BLAKE2b-256 |
d992d9344acd27bf2662a540b396ae84df80d7e137bedec3f8d5eb671f2d9188
|