Black-box LLM fingerprinting system for model identification
Project description
LLM Fingerprinting System
A black-box fingerprinting system that identifies the underlying LLM model family (GPT, LLaMA, Mistral, etc.) by analyzing response patterns across 75 discriminative prompts. The system can identify fine-tuned models as well, tracing them back to their foundational base model.
Note: Check config.py to see all identifiable model families
A pre-trained classifier is bundled with the package in the model directory.
Supported Backends
| Backend | Description | API Key Required |
|---|---|---|
ollama |
Local Ollama instance | ❌ No |
ollama-cloud |
Ollama Cloud API | ✅ OLLAMA_CLOUD_API_KEY |
openai |
OpenAI API (or compatible) | ✅ OPENAI_API_KEY |
gemini |
Gemini API (or compatible) | ✅ GEMINI_API_KEY |
deepseek |
Deepseek API (or compatible) | ✅ DEEPSEEK_API_KEY |
custom |
Any HTTP-based LLM API | ✅ Optional |
About the Custom Backend
The custom backend is the most flexible option - use it with:
- Proprietary LLM APIs not natively supported
- Self-hosted LLMs behind HTTP endpoints
- API proxies and gateways
- Any HTTP-based LLM service
All you need is an HTTP request template file! See examples in ./example/ directory.
Installation
From PyPI
# Core package (Ollama + custom backends)
pip install llm-fingerprinter
# With OpenAI support
pip install llm-fingerprinter[openai]
# With Gemini support
pip install llm-fingerprinter[gemini]
# With all backends
pip install llm-fingerprinter[all]
From source (development)
git clone https://github.com/litemars/LLM-Fingerprinter.git
cd LLM-Fingerprinter
pip install -e ".[all,dev]"
# Optional: Download NLTK data for text processing
python -c "import nltk; nltk.download('punkt_tab'); nltk.download('stopwords')"
Quick Start
1. Identify a Model (Using Pre-trained Classifier)
# Custom endpoint
llm-fingerprinter identify -b custom -r ./custom_request.txt
# Local Ollama
llm-fingerprinter identify -b ollama --model llama3.2
# OpenAI
export OPENAI_API_KEY="your-key"
llm-fingerprinter identify -b openai --model gpt-4o-mini
2. Train Your Own Classifier
# Step 1: Generate fingerprints for a known model
llm-fingerprinter simulate -b ollama --model llama3.2 --family llama --num-sims 3
# Step 2: Train classifier from fingerprints
llm-fingerprinter train
# Step 3: Identify unknown models
llm-fingerprinter identify -b ollama --model some-other-model
3. Backend-Specific Examples
Ollama (Local)
# List available models
llm-fingerprinter list-models -b ollama
# Identify
llm-fingerprinter identify -b ollama --model llama3.2
# Generate fingerprints
llm-fingerprinter simulate -b ollama --model llama3.2 --family llama
Ollama Cloud
export OLLAMA_CLOUD_API_KEY="your-key"
llm-fingerprinter identify -b ollama-cloud --model llama3.2
llm-fingerprinter simulate -b ollama-cloud --model llama3.2 --family llama
OpenAI
export OPENAI_API_KEY="your-key"
llm-fingerprinter identify -b openai --model gpt-4o
llm-fingerprinter simulate -b openai --model gpt-4 --family gpt --num-sims 5
Gemini
export GEMINI_API_KEY="your-key"
llm-fingerprinter identify -b gemini --model gemini-2.5-pro
llm-fingerprinter simulate -b gemini --model gemini-2.5-pro --family gemini
DeepSeek
export DEEPSEEK_API_KEY="your-key"
llm-fingerprinter identify -b deepseek --model deepseek-chat
llm-fingerprinter simulate -b deepseek --model deepseek-chat --family deepseek
Custom API (Any HTTP Endpoint)
Works with any LLM API via HTTP request template. No native backend support needed!
export CUSTOM_API_KEY="your-api-key"
llm-fingerprinter identify -b custom -r ./custom_request.txt
llm-fingerprinter identify -b custom -r ./custom_request.txt -k my-api-key
llm-fingerprinter simulate -b custom -r ./custom_request.txt --family gpt
Python API
You can also use the library programmatically:
from llm_fingerprinter import LLMFingerprinter, EnsembleClassifier, FeatureExtractor, PromptSuite
from llm_fingerprinter.ollama_client import OllamaClient
# Setup components
client = OllamaClient(endpoint="http://localhost:11434")
suite = PromptSuite()
extractor = FeatureExtractor()
classifier = EnsembleClassifier()
# Create fingerprinter and identify a model
fingerprinter = LLMFingerprinter("http://localhost:11434", client, suite, extractor, classifier)
fingerprint = fingerprinter.fingerprint_model("llama3.2")
Commands Reference
Global Options
| Option | Short | Description |
|---|---|---|
--verbose |
-v |
Enable verbose output (debug logging) |
Backend Options (Common to all LLM commands)
These options are available for: identify, simulate, test, fingerprint, and list-models
| Option | Short | Default | Description |
|---|---|---|---|
--backend |
-b |
ollama |
Backend: ollama, ollama-cloud, openai, deepseek, gemini, custom |
--endpoint |
-e |
auto | API endpoint URL (overrides default) |
--api-key |
-k |
env var | API key (fallback to environment variable) |
--request-file |
-r |
- | Request template file (required for custom backend) |
identify - Identify Unknown Model
Classify an unknown model using the trained classifier. Works with any LLM backend including custom HTTP endpoints.
llm-fingerprinter identify [OPTIONS]
Options:
| Option | Short | Default | Description |
|---|---|---|---|
--model |
-m |
- | Model name (optional, may be in request template for custom backend) |
--repeats |
- | 1 | Number of times to repeat each prompt (increases confidence) |
--backend |
-b |
ollama |
LLM backend |
--endpoint |
-e |
auto | API endpoint |
--api-key |
-k |
env var | API key |
Examples:
# Local Ollama (simplest)
llm-fingerprinter identify -b ollama --model llama3.2
# With multiple repeats for higher confidence
llm-fingerprinter identify -b ollama --model llama3.2 --repeats 3
# OpenAI
export OPENAI_API_KEY="sk-..."
llm-fingerprinter identify -b openai --model gpt-4o-mini
# ⭐ Custom endpoint (e.g., proprietary LLM, local instance, proxy)
llm-fingerprinter identify -b custom -r ./example/openai_request.txt
# ⭐ Custom with API key
llm-fingerprinter identify -b custom -r ./example/openai_request.txt -k "your-api-key"
# ⭐ Any HTTP-based LLM (examples in ./example/)
llm-fingerprinter identify -b custom -r ./example/ollama_cloud_request.txt
Output:
═══════════════════════════════════════════════════════════════
IDENTIFICATION REPORT
═══════════════════════════════════════════════════════════════
Identified: GPT (or LLAMA, GEMINI, etc.)
Confidence: 92.5%
Probabilities:
gpt 92.5% █████████████████████
llama 5.2% █
gemini 1.8%
mistral 0.5%
═══════════════════════════════════════════════════════════════
simulate - Generate Training Fingerprints
Create fingerprints for known models to build/improve the classifier. Works with any backend including custom HTTP endpoints.
llm-fingerprinter simulate [OPTIONS]
Options:
| Option | Short | Default | Description |
|---|---|---|---|
--model |
-m |
- | Model name (optional) |
--family |
-f |
- | Required. Model family: gpt, claude, llama, gemini, mistral, qwen, gemma, deepseek |
--num-sims |
-n |
3 | Number of fingerprints to generate |
--repeats |
- | 2 | Prompts repeats per simulation |
--backend |
-b |
ollama |
LLM backend |
--endpoint |
-e |
auto | API endpoint |
--api-key |
-k |
env var | API key |
Examples:
# Basic simulation (3 fingerprints, 2 repeats each)
llm-fingerprinter simulate -b ollama --model llama3.2 --family llama
# More comprehensive (10 fingerprints, 5 repeats each)
llm-fingerprinter simulate -b ollama --model llama3.2 --family llama --num-sims 10 --repeats 5
# OpenAI
export OPENAI_API_KEY="sk-..."
llm-fingerprinter simulate -b openai --model gpt-4 --family gpt --num-sims 5
# Custom endpoint with specific API
llm-fingerprinter simulate -b openai -e https://api.groq.com/openai/v1 -k $GROQ_KEY \
--model llama-3.1-70b --family llama
train - Build Classifier
Train an ensemble classifier from saved fingerprints.
llm-fingerprinter train [OPTIONS]
Options:
| Option | Default | Description |
|---|---|---|
--augment / --no-augment |
--augment |
Enable/disable data augmentation |
--use-pca |
false | Use PCA dimensionality reduction |
--pca-components |
64 | Number of PCA components (if --use-pca) |
--cross-validate / -cv |
false | Run k-fold cross-validation |
--cv-folds |
5 | Number of cross-validation folds |
Examples:
# Default: raw features (402-dim), with augmentation
llm-fingerprinter train
# With PCA reduction (faster, less accurate)
llm-fingerprinter train --use-pca
# Custom PCA components
llm-fingerprinter train --use-pca --pca-components 128
# With cross-validation
llm-fingerprinter train --cross-validate --cv-folds 5
# Disable augmentation
llm-fingerprinter train --no-augment
Output:
🧠 Training classifier (raw features (402-dim))...
📊 Training data:
gpt: 15 samples (402 dims)
llama: 12 samples (402 dims)
gemini: 10 samples (402 dims)
Total: 37
📈 Running 5-fold cross-validation...
Mean accuracy: 94.6% (5 folds)
Per-family metrics:
Family Prec Recall F1 Support
──────────────────────────────────────────────
gpt 0.96 0.95 0.96 15
llama 0.93 0.92 0.92 12
gemini 0.92 0.90 0.91 10
Fold accuracies: 93%, 95%, 94%, 96%, 95%
✅ Classifier trained and saved!
Mode: raw features (402-dim)
Input dim: 402
test - Test Backend Connection
Verify connectivity and generation with a backend.
llm-fingerprinter train [--augment/--no-augment] [--cross-validate]
| Option | Default | Description |
|---|---|---|
--augment/--no-augment |
--augment |
Data augmentation |
--use-pca |
off | Use PCA reduction |
--pca-components |
64 | PCA components |
--cross-validate / -cv |
off | Run k-fold cross-validation |
--cv-folds |
5 | Number of CV folds |
identify
Examples:
# Test local Ollama
llm-fingerprinter test -b ollama --model llama3.2
# Test OpenAI
export OPENAI_API_KEY="sk-..."
llm-fingerprinter test -b openai --model gpt-4o
# Test with custom prompt
llm-fingerprinter test -b ollama --model llama3.2 -p "What is 2+2?"
# Test custom backend
llm-fingerprinter test -b custom -r ./custom_request.txt
fingerprint - Generate Standalone Fingerprint
Generate a fingerprint without using the classifier (useful for analysis).
llm-fingerprinter fingerprint [OPTIONS]
Options:
| Option | Short | Default | Description |
|---|---|---|---|
--model |
-m |
- | Model name (optional) |
--repeats |
- | 1 | Prompt repeats |
--output |
- | ./fingerprints |
Output directory |
--backend |
-b |
ollama |
LLM backend |
--endpoint |
-e |
auto | API endpoint |
--api-key |
-k |
env var | API key |
Examples:
# Generate and save fingerprint
llm-fingerprinter fingerprint -b ollama --model llama3.2
# With custom output directory
llm-fingerprinter fingerprint -b ollama --model llama3.2 --output ./my_fingerprints
# Multiple repeats for better accuracy
llm-fingerprinter fingerprint -b openai --model gpt-4o --repeats 3
list-models - List Available Models
Show all models available on the backend.
llm-fingerprinter list-models [OPTIONS]
Options:
| Option | Short | Description |
|---|---|---|
--backend |
-b |
LLM backend |
--endpoint |
-e |
API endpoint |
--api-key |
-k |
API key |
Examples:
# List Ollama models
llm-fingerprinter list-models -b ollama
# List OpenAI models
export OPENAI_API_KEY="sk-..."
llm-fingerprinter list-models -b openai
# Custom endpoint
llm-fingerprinter list-models -b openai -e https://api.groq.com/openai/v1 -k $GROQ_KEY
list-fingerprints - List Saved Fingerprints
Show count of fingerprints by model family.
llm-fingerprinter list-fingerprints
Output:
📚 Fingerprints:
gpt 15 ████████████████████
llama 12 ████████████████
gemini 10 ██████████████
mistral 8 ███████████
Total: 45
✅ Classifier trained (raw features, 402 dims)
info - Show System Information
Display configuration, installed backends, available families, and status.
llm-fingerprinter info
Output:
⚙️ Config:
Fingerprints: /folder/ fingerprints
Embedding: all-MiniLM-L6-v2 (384d)
Total dims: 402 (384 + 12 + 6)
🔌 Backends:
ollama: http://localhost:11434
ollama-cloud: https://api.ollama.ai
openai: https://api.openai.com/v1
deepseek: https://api.deepseek.com
gemini: https://generativelanguage.googleapis.com
custom: Via request template file (-r)
📋 Families: claude, deepseek, gemini, gemma, gpt, llama, mistral, qwen
📊 Status:
Fingerprints: 45
Classifier: ✅ trained (raw features, 402 dims)
💡 Training options:
train # Use raw 402-dim features (default)
train --use-pca # Use PCA reduction (64 dims)
Usage Workflow
Complete Training Workflow
# 1. Generate fingerprints for GPT models
llm-fingerprinter simulate -b openai --model gpt-4 --family gpt --num-sims 5 --repeats 3
llm-fingerprinter simulate -b openai --model gpt-4o --family gpt --num-sims 5 --repeats 3
# 2. Generate fingerprints for LLaMA models
llm-fingerprinter simulate -b ollama --model llama3.2 --family llama --num-sims 5 --repeats 3
llm-fingerprinter simulate -b ollama --model llama2 --family llama --num-sims 5 --repeats 3
# 3. List all fingerprints
llm-fingerprinter list-fingerprints
# 4. Train classifier with cross-validation
llm-fingerprinter train --cross-validate
# 5. Test on unknown models
llm-fingerprinter identify -b ollama --model some-unknown-model
llm-fingerprinter identify -b openai --model gpt-4o-mini --repeats 3
Quick Identification Workflow
# 1. Test connection
llm-fingerprinter test -b ollama --model llama3.2
# 2. Identify model
llm-fingerprinter identify -b ollama --model llama3.2
# 3. View results
llm-fingerprinter list-fingerprints
Common Patterns
Using Environment Variables for API Keys
# Set once, use multiple times
export OPENAI_API_KEY="sk-..."
export GEMINI_API_KEY="AIza..."
# No need to pass -k flag each time
llm-fingerprinter simulate -b openai --model gpt-4 --family gpt
llm-fingerprinter identify -b openai --model gpt-4o
llm-fingerprinter test -b gemini --model gemini-2.5-pro
⭐ Custom Backend with Request Template (Universal LLM Support)
The custom backend lets you use fingerprinting with any HTTP-based LLM API by providing a request template file.
# Use a request template file for custom APIs
llm-fingerprinter identify -b custom -r ./example/openai_request.txt
# Can also pass API key
llm-fingerprinter identify -b custom -r ./example/openai_request.txt -k "api-key-here"
# Generate training fingerprints
llm-fingerprinter simulate -b custom -r ./example/openai_request.txt --family gpt --num-sims 5
# Test connection
llm-fingerprinter test -b custom -r ./example/openai_request.txt
# See example templates in ./example/ directory:
# - openai_request.txt (OpenAI-compatible APIs)
# - ollama_cloud_request.txt
# - ollama_local_request.txt
Why use custom backend?
- 🔓 Support for proprietary/closed LLMs not in native backends
- 🏠 Self-hosted LLM servers behind HTTP endpoints
- 🔀 API proxies, gateways, and load balancers
- 🌐 Any HTTP-based LLM service (local or remote)
- 🎯 Complete control over request format
Multi-Endpoint Configuration
# Test same model on different endpoints
llm-fingerprinter test -b openai -e https://api.openai.com/v1 --model gpt-4
llm-fingerprinter test -b openai -e https://api.groq.com/openai/v1 --model llama-3.1-70b -k $GROQ_KEY
# Identify via different providers
llm-fingerprinter identify -b openai --model gpt-4o
llm-fingerprinter identify -b openai -e https://my-proxy.com/v1 --model gpt-4o -k "proxy-key"
Improving Accuracy
# Use higher repeats for more confident predictions
llm-fingerprinter identify -b ollama --model llama3.2 --repeats 5
# Train with more simulations per model
llm-fingerprinter simulate -b ollama --model llama3.2 --family llama --num-sims 10 --repeats 5
# Use PCA for faster training with slight accuracy trade-off
llm-fingerprinter train --use-pca --pca-components 128
# Cross-validate before deployment
llm-fingerprinter train --cross-validate --cv-folds 10
Environment Variables
| Variable | Backend | Description |
|---|---|---|
OLLAMA_CLOUD_API_KEY |
ollama-cloud | Ollama Cloud API key |
OPENAI_API_KEY |
openai | OpenAI API key |
GEMINI_API_KEY |
gemini | Gemini API key |
DEEPSEEK_API_KEY |
deepseek | DeepSeek API key |
CUSTOM_API_KEY |
custom | Custom API key |
LOG_LEVEL |
all | Logging level (DEBUG, INFO, etc.) |
LLM_FINGERPRINTER_DATA |
all | Custom data directory path |
Data Storage
When installed via pip, runtime data (fingerprints, trained models, logs) is stored in ~/.llm-fingerprinter/. You can override this with the LLM_FINGERPRINTER_DATA environment variable. When running from a git checkout, data is stored in the project directory (backward compatible).
🔧 Custom Backend Deep Dive
The custom backend is the most powerful feature - it allows fingerprinting of any LLM accessible via HTTP, regardless of whether a native backend exists.
How It Works
- Create an HTTP request template file (JSON format)
- Include placeholders for
modelandprompt - Pass template to fingerprinter with
-b custom -r ./template.txt - The system automatically sends requests and analyzes responses
Example: Creating a Custom Template
{
"url": "https://api.example.com/v1/completions",
"method": "POST",
"headers": {
"Content-Type": "application/json",
"Authorization": "Bearer {api_key}"
},
"body": {
"model": "{model}",
"prompt": "{prompt}",
"max_tokens": 200,
"temperature": 0.7
}
}
Usage Examples
# Create your template file
cat > my_llm_template.txt << 'EOF'
{
"url": "https://my-llm.com/api/generate",
"method": "POST",
"headers": {
"Authorization": "Bearer your-key"
},
"body": {
"model": "{model}",
"prompt": "{prompt}",
"max_tokens": 200
}
}
EOF
# Identify models
llm-fingerprinter identify -b custom -r ./my_llm_template.txt
# Generate training fingerprints
llm-fingerprinter simulate -b custom -r ./my_llm_template.txt --family gpt --num-sims 5
# Test connectivity
llm-fingerprinter test -b custom -r ./my_llm_template.txt
# Pass API key via environment or CLI
export CUSTOM_API_KEY="your-secret-key"
llm-fingerprinter identify -b custom -r ./my_llm_template.txt
# Or pass directly
llm-fingerprinter identify -b custom -r ./my_llm_template.txt -k "your-secret-key"
Supported Template Placeholders
| Placeholder | Description | Example |
|---|---|---|
{model} |
Model name passed via CLI | gpt-4, llama3.2 |
{prompt} |
The fingerprinting prompt | (automatically populated) |
{api_key} |
API key from environment or CLI | (injected automatically) |
Pre-built Examples
See ./example/ directory for ready-to-use templates:
- openai_request.txt - OpenAI, Groq, and compatible APIs
- ollama_cloud_request.txt - Ollama Cloud
- ollama_local_request.txt - Local Ollama
Copy and adapt these for your use case!
How It Works
-
75 Prompts across 3 layers:
- Stylistic: Analyze writing style and formatting preferences
- Behavioral: Assess response patterns and decision-making behavior
- Discriminative: Identify model-specific characteristics and inconsistencies
-
Feature Extraction: 384-dim embeddings + 12 linguistic + 6 behavioral features
-
PCA reduction to 64 dimensions (Optional)
-
Ensemble Classification: Random Forest (45%) + SVM (45%) + MLP (10%)
Contributing
Contributions are welcome! Whether you're adding support for new models, improving accuracy, or extending to additional clients, please see CONTRIBUTING.md for guidelines.
License
MIT License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llm_fingerprinter-0.2.0.tar.gz.
File metadata
- Download URL: llm_fingerprinter-0.2.0.tar.gz
- Upload date:
- Size: 3.8 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8f23243f0c73ab04cc79be2f9772b3ece05667c6510329017b663d11b3a23433
|
|
| MD5 |
64a12ba32dc9766738e0bc46c6245122
|
|
| BLAKE2b-256 |
4b7ba074c6d62db876e76f1c6dad62a71ca3f1f833aef96cfde39616ee77f97e
|
File details
Details for the file llm_fingerprinter-0.2.0-py3-none-any.whl.
File metadata
- Download URL: llm_fingerprinter-0.2.0-py3-none-any.whl
- Upload date:
- Size: 48.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e8c6215e95629d7679f0d0d01ca4591737bf7cc3beb3a259acce82e10c07cf57
|
|
| MD5 |
59b9d52a794f6b9a5def9be17a04e254
|
|
| BLAKE2b-256 |
5c8ef3c876ec4b5b93a872588cc7deabec8f707fdbf33f9e1d44ba7504bf5192
|