Skip to main content

Black-box LLM fingerprinting system for model identification

Project description

LLM Fingerprinting System

A black-box fingerprinting system that identifies the underlying LLM model family (GPT, LLaMA, Mistral, etc.) by analyzing response patterns across 75 discriminative prompts. The system can identify fine-tuned models as well, tracing them back to their foundational base model.

Note: Check config.py to see all identifiable model families

You can find an already NLP trained model in the model directory.

GPT

Supported Backends

Backend Description API Key Required
ollama Local Ollama instance ❌ No
ollama-cloud Ollama Cloud API OLLAMA_CLOUD_API_KEY
openai OpenAI API (or compatible) OPENAI_API_KEY
gemini Gemini API (or compatible) GEMINI_API_KEY
deepseek Deepseek API (or compatible) DEEPSEEK_API_KEY
custom Custom HTTP request CUSTOM_API_KEY

Installation

pip install -r requirements.txt

# Or install as a package
pip3 install -e .

# Optional: Download NLTK data for text processing
python -c "import nltk; nltk.download('punkt_tab'); nltk.download('stopwords')"

Quick Start

Ollama

# Identify model and fine-tuning

llm-fingerprinter identify -b ollama --model some-model 

# Train your own classifier
# Fingerprint the LLM
llm-fingerprinter simulate --model llama3.2 --family llama
# Train on the Fingerprints
llm-fingerprinter train

Custom - Interact with any LLM via HTTP request

llm-fingerprinter identify -r ./custom_request.txt --api-key <API_KEY>
# Example of custom request inside the example folder

Ollama Cloud

export OLLAMA_CLOUD_API_KEY="your-key"
llm-fingerprinter simulate -b ollama-cloud --model llama3.2 --family llama

OpenAI

export OPENAI_API_KEY="your-key"
llm-fingerprinter simulate -b openai --model gpt-4 --family gpt

Gemini

export GEMINI_API_KEY="your-key"
llm-fingerprinter simulate -b gemini --model gemini-2.5-pro --family gpt

Deepseek

export DEEPSEEK_API_KEY="your-key"
llm-fingerprinter simulate -b deepseek --model deepseek-v3.2 --family deepseek

Custom API

export CUSTOM_API_KEY="your-key"
llm-fingerprinter simulate -b custom -e http://your-api.com/v1 --model your-model --family llama

Commands

Backend Options (all LLM commands)

Option Short Default Description
--backend -b custom Backend: ollama, ollama-cloud, openai,deepseek,gemini ,custom
--endpoint -e auto API endpoint URL
--api-key -k env var API key

simulate

Run fingerprinting simulations for training data.

llm-fingerprinter simulate [OPTIONS]
Option Default Description
--model required Model name
--family required Family: gpt, claude, llama, gemini, mistral, qwen, gemma
--num-sims optional Number of simulations
--repeats optional Prompt repeats per simulation

Examples:

# Ollama local
llm-fingerprinter simulate --model llama3.2 --family llama

# Ollama Cloud
llm-fingerprinter simulate -b ollama-cloud --model llama3.2 --family llama

# OpenAI
llm-fingerprinter simulate -b openai --model gpt-4 --family gpt --num-sims 5

# Custom endpoint
llm-fingerprinter simulate -b openai -e https://api.groq.com/openai/v1 -k $GROQ_KEY --model llama-3.1-70b --family llama

train

Train classifier from saved fingerprints.

llm-fingerprinter train [--augment/--no-augment]

identify

Identify model family using trained classifier.

llm-fingerprinter identify --model <model-name> [-b <backend>]

Other commands

list-models

List available models on the API.

llm-fingerprinter list-models [-b <backend>]

list-fingerprints

List saved fingerprints by family.

llm-fingerprinter list-fingerprints

info

Show configuration and status.

llm-fingerprinter info

Environment Variables

Variable Backend Description
OLLAMA_CLOUD_API_KEY ollama-cloud Ollama Cloud API key
OPENAI_API_KEY openai OpenAI API key
GEMINI_API_KEY gemini Gemini API key
DEEPSEEK_API_KEY deepseek DeepSeek API key
CUSTOM_API_KEY custom Custom API key
LOG_LEVEL all Logging level (DEBUG, INFO, etc.)

How It Works

  1. 75 Prompts across 3 layers:

    • Stylistic: Analyze writing style and formatting preferences
    • Behavioral: Assess response patterns and decision-making behavior
    • Discriminative: Identify model-specific characteristics and inconsistencies
  2. Feature Extraction: 384-dim embeddings + 12 linguistic + 6 behavioral features

  3. PCA reduction to 64 dimensions (Optional)

  4. Ensemble Classification: Random Forest (45%) + SVM (45%) + MLP (10%)


Contributing

Contributions are welcome! Whether you're adding support for new models, improving accuracy, or extending to additional clients, please see CONTRIBUTING.md for guidelines.


License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_fingerprinter-0.1.0.tar.gz (2.6 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llm_fingerprinter-0.1.0-py3-none-any.whl (37.0 kB view details)

Uploaded Python 3

File details

Details for the file llm_fingerprinter-0.1.0.tar.gz.

File metadata

  • Download URL: llm_fingerprinter-0.1.0.tar.gz
  • Upload date:
  • Size: 2.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for llm_fingerprinter-0.1.0.tar.gz
Algorithm Hash digest
SHA256 6bd4dd0795d8504d86c86b3b80544b42dc8a685cb914f61769778c2f2e6329d5
MD5 39272e10e60f20482c041807defe816e
BLAKE2b-256 756040818886f2927218b99319185becb269cf3046fdf9ff58d647481ce0d320

See more details on using hashes here.

File details

Details for the file llm_fingerprinter-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for llm_fingerprinter-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 143aa2443ad5bb12a8266f4f8114f45bafe5e6b7997fd10f886e952db3a4f21b
MD5 cebcf325dec89406f6422c6105996bbb
BLAKE2b-256 c8ff6cf6ff3c9d286402f114fc5f75546cca779af9ed3c83f25edcd11744dc09

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page