Skip to main content

Identify LLMs by their response fingerprints

Project description

🕵 LLM Fingerprint

Status Tag PyPI

Identify LLMs by their response fingerprints


Research Question

Is it possible to identify which LLM generated a response by analyzing its semantic patterns across multiple standardized prompts?

Approach

LLM Fingerprint uses semantic similarity patterns across multiple prompts to create model-specific "fingerprints":

  1. Fingerprint Creation: Generate multiple responses from known LLMs using standardized prompts

    • Fixed sampling parameters to ensure consistent sampling behavior
    • Multiple response samples per prompt to account for sampling variance
    • Several distinct prompts to capture model characteristics
  2. Similarity Analysis: Measure semantic similarity within and between prompt response groups

    • Within-prompt similarity reveals consistency characteristics
    • Cross-prompt similarity patterns create a unique model signature
  3. Model Identification: Match patterns from unknown models against the fingerprint database

    • Generate responses from the unknown model using the same standardized prompts
    • Compare similarity patterns with known models
    • Identify the closest matching fingerprint

Usage

Set required environment variables. See .envrc.example for more details.

Creating Model Fingerprints

# Generate samples for known model responses
llm-fingerprint generate \
  --language-model "model-1" "model-2" "model-3" \
  --prompts-path "./data/prompts/prompts_general_v1.jsonl" \
  --samples-path "samples.jsonl" \
  --samples-num 4

# Upload samples to ChromaDB
llm-fingerprint upload \
  --language-model "embedding-model" \
  --samples-path "samples.jsonl" \
  --collection-name "samples"

Identifying Unknown Models

# Generate samples for unknown model (or use an external service)
# Let's suppose the we don't know we are using model-2
llm-fingerprint generate \
  --language-model "model-2" \
  --prompts-path "./data/prompts/prompts_single_v1.jsonl" \
  --samples-path "unk-samples.jsonl" \
  --samples-num 1

# Query ChromaDB for model identification
llm-fingerprint query \
  --language-model "embedding-model" \
  --samples-path "unk-samples.jsonl" \
  --results-path "results.jsonl" \
  --results-num 2

# matches.jsonl will contain the results
# {"model": "model-2", "score": ... }
# {"model": "model-1", "score": ... }

Installation

The preferred way to install llm-fingerprint is using uv (although you can also use pip).

# Clone the repository
git clone https://github.com/S1M0N38/llm-fingerprint.git
cd llm-fingerprint

# Create a virtual environment
uv venv

# Install the package
uv sync # --all-groups # for installing ml and dev groups

Requirements

  • Python 3.11+
  • OpenAI-compatible API endpoints (/chat/completions and /embeddings)
  • Access to ChromaDB (locally or hosted)

Contributing

This toy/research project is still in its early stages, and I welcome any feedback, suggestions, and contributions! If you're interested in discussing ideas or have questions about the approach, please start a conversation in GitHub Discussions.

For detailed information on setting up your development environment, understanding the project structure, and the contribution workflow, please refer to CONTRIBUTING.md.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_fingerprint-0.6.0.tar.gz (119.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llm_fingerprint-0.6.0-py3-none-any.whl (15.1 kB view details)

Uploaded Python 3

File details

Details for the file llm_fingerprint-0.6.0.tar.gz.

File metadata

  • Download URL: llm_fingerprint-0.6.0.tar.gz
  • Upload date:
  • Size: 119.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.6.11

File hashes

Hashes for llm_fingerprint-0.6.0.tar.gz
Algorithm Hash digest
SHA256 e0e40a863fdb358b4f3ef30369201dc7430606cd0eb7783d37a7ddc7ff1c679d
MD5 0efb00bbfc1aff53412ccc179476373e
BLAKE2b-256 e2aad44ad47f9355b4523e5d880f1b93d694ae9b797b7e1b39ab743913c4778b

See more details on using hashes here.

File details

Details for the file llm_fingerprint-0.6.0-py3-none-any.whl.

File metadata

File hashes

Hashes for llm_fingerprint-0.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4d9916aa9a2f8a7e8282a1b34ffdfe0ab6a537a4663b3c2ceec19a41b9dd5b77
MD5 8c869fcab70acf18af25756b6ebb7a1d
BLAKE2b-256 ca7f0b6c72c234cedb824df6f186402ee6ee9042dcc647bdec393cc1a2c1a932

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page