Skip to main content

Extract LLM DNA vectors — low-dimensional representations that capture functional behavior and model evolution.

Project description

LLM-DNA

Python 3.10+ PyPI version License Tests

Extract LLM DNA vectors — low-dimensional, training-free representations that capture functional behavior and evolutionary relationships between language models.

📄 Paper: LLM DNA: Tracing Model Evolution via Functional Representations (ICLR 2026 Oral)

Overview

The explosive growth of large language models has created a vast but opaque landscape: millions of models exist, yet their evolutionary relationships through fine-tuning, distillation, or adaptation are often undocumented. LLM-DNA provides a general, scalable, training-free pipeline for extracting LLM DNA — mathematically-grounded representations that satisfy inheritance and genetic determinism properties.

Key Features:

  • 🧬 Extract DNA vectors from any HuggingFace or local model
  • 🚀 Training-free, works across architectures and tokenizers
  • 📊 Tested on 305+ LLMs with superior or competitive performance
  • 🔍 Uncover undocumented relationships between models
  • 🌳 Build evolutionary trees using phylogenetic algorithms

Installation

pip install llm-dna

Use llm-dna for install/package naming, and llm_dna for Python imports.

Quick Start

from llm_dna import DNAExtractionConfig, calc_dna

config = DNAExtractionConfig(
    model_name="distilgpt2",
    dataset="rand",
    gpu_id=0,
    max_samples=100,
)

result = calc_dna(config)
print(f"DNA shape: {result.vector.shape}")  # (128,)

Python API

from llm_dna import DNAExtractionConfig, calc_dna

config = DNAExtractionConfig(
    model_name="Qwen/Qwen2.5-0.5B-Instruct",
    dataset="rand",
    gpu_id=0,
    max_samples=100,
    dna_dim=128,
    reduction_method="random_projection",  # or "pca", "svd"
    trust_remote_code=True,
)

result = calc_dna(config)

# DNA vector (numpy.ndarray)
vector = result.vector

# Saved paths (when save=True)
print(result.output_path)
print(result.summary_path)

CLI

# Single model
calc-dna --model-name distilgpt2 --dataset rand --gpus 0

# Multiple models with round-robin GPU assignment
calc-dna --llm-list ./configs/llm_list.txt --gpus 0,1

# With hyperparameters
calc-dna \
  --model-name mistralai/Mistral-7B-v0.1 \
  --dna-dim 256 \
  --max-samples 200 \
  --reduction-method pca \
  --load-in-8bit

Notes

  • Metadata auto-fetched: Model metadata is automatically retrieved from HuggingFace Hub and cached.
  • Auth token: Pass via token=... or set HF_TOKEN environment variable.
  • Chat templates: Applied automatically when supported by the tokenizer.

Tests

# All tests (including integration tests with real model loading)
pytest tests/ -v

# Fast tests only (skip real model loading)
pytest tests/ -m "not slow"

Citation

If you use LLM-DNA in your research, please cite:

@inproceedings{wu2026llmdna,
  title={LLM DNA: Tracing Model Evolution via Functional Representations},
  author={Wu, Zhaomin and Zhao, Haodong and Wang, Ziyang and Guo, Jizhou and Wang, Qian and He, Bingsheng},
  booktitle={The Fourteenth International Conference on Learning Representations},
  year={2026},
  url={https://openreview.net/pdf?id=UIxHaAqFqQ}
}

License

Apache 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_dna-0.1.3.tar.gz (82.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llm_dna-0.1.3-py3-none-any.whl (95.7 kB view details)

Uploaded Python 3

File details

Details for the file llm_dna-0.1.3.tar.gz.

File metadata

  • Download URL: llm_dna-0.1.3.tar.gz
  • Upload date:
  • Size: 82.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for llm_dna-0.1.3.tar.gz
Algorithm Hash digest
SHA256 4b4e40a5d0a11cb647baea5c8b4de20e4319b5ccb61bc806cd228a2b9f1c0282
MD5 2e5fe677da4e81380120f94a88636e97
BLAKE2b-256 7dc22e6fdb625c50c02cac55af427c42a0bb4b1f3947eb8fcf62de8a7f89289c

See more details on using hashes here.

Provenance

The following attestation bundles were made for llm_dna-0.1.3.tar.gz:

Publisher: release.yml on Xtra-Computing/LLM-DNA

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file llm_dna-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: llm_dna-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 95.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for llm_dna-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 ec4ef3b091f5a71fa99bdfe45b7c7022da06a43832f68bdc63cc3e1ae3640c19
MD5 5b98748f89d7054b4d25b0d790d1d3c5
BLAKE2b-256 29f9d01833a88416d340929f8b7b624b4f1ed5db1746b002db9a1ebb0263309d

See more details on using hashes here.

Provenance

The following attestation bundles were made for llm_dna-0.1.3-py3-none-any.whl:

Publisher: release.yml on Xtra-Computing/LLM-DNA

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page