Extract LLM DNA vectors — low-dimensional representations that capture functional behavior and model evolution.
This project has been archived.
The maintainers of this project have marked this project as archived. No new releases are expected.
Project description
RepTrace
Extract LLM DNA vectors — low-dimensional, training-free representations that capture functional behavior and evolutionary relationships between language models.
📄 Paper: LLM DNA: Tracing Model Evolution via Functional Representations (ICLR 2026 Oral)
Overview
The explosive growth of large language models has created a vast but opaque landscape: millions of models exist, yet their evolutionary relationships through fine-tuning, distillation, or adaptation are often undocumented. RepTrace provides a general, scalable, training-free pipeline for extracting LLM DNA — mathematically-grounded representations that satisfy inheritance and genetic determinism properties.
Key Features:
- 🧬 Extract DNA vectors from any HuggingFace or local model
- 🚀 Training-free, works across architectures and tokenizers
- 📊 Tested on 305+ LLMs with superior or competitive performance
- 🔍 Uncover undocumented relationships between models
- 🌳 Build evolutionary trees using phylogenetic algorithms
Installation
pip install reptrace
Quick Start
from reptrace import DNAExtractionConfig, calc_dna
config = DNAExtractionConfig(
model_name="distilgpt2",
dataset="rand",
gpu_id=0,
max_samples=100,
)
result = calc_dna(config)
print(f"DNA shape: {result.vector.shape}") # (128,)
Python API
from reptrace import DNAExtractionConfig, calc_dna
config = DNAExtractionConfig(
model_name="Qwen/Qwen2.5-0.5B-Instruct",
dataset="rand",
gpu_id=0,
max_samples=100,
dna_dim=128,
reduction_method="random_projection", # or "pca", "svd"
trust_remote_code=True,
)
result = calc_dna(config)
# DNA vector (numpy.ndarray)
vector = result.vector
# Saved paths (when save=True)
print(result.output_path)
print(result.summary_path)
CLI
# Single model
calc-dna --model-name distilgpt2 --dataset rand --gpus 0
# Multiple models with round-robin GPU assignment
calc-dna --llm-list ./configs/llm_list.txt --gpus 0,1
# With hyperparameters
calc-dna \
--model-name mistralai/Mistral-7B-v0.1 \
--dna-dim 256 \
--max-samples 200 \
--reduction-method pca \
--load-in-8bit
Notes
- Metadata auto-fetched: Model metadata is automatically retrieved from HuggingFace Hub and cached.
- Auth token: Pass via
token=...or setHF_TOKENenvironment variable. - Chat templates: Applied automatically when supported by the tokenizer.
Tests
# All tests (including integration tests with real model loading)
pytest tests/ -v
# Fast tests only (skip real model loading)
pytest tests/ -m "not slow"
Citation
If you use RepTrace in your research, please cite:
@inproceedings{wu2026llmdna,
title={LLM DNA: Tracing Model Evolution via Functional Representations},
author={Wu, Zhaomin and Zhao, Haodong and Wang, Ziyang and Guo, Jizhou and Wang, Qian and He, Bingsheng},
booktitle={The Fourteenth International Conference on Learning Representations},
year={2026},
url={https://openreview.net/pdf?id=UIxHaAqFqQ}
}
License
Apache 2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file reptrace-0.1.2b1.tar.gz.
File metadata
- Download URL: reptrace-0.1.2b1.tar.gz
- Upload date:
- Size: 9.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fdfb0f21cdaf8fd5a565944c4f9ba6ebc1e3d829d4259ee0f3ffb04ad283bb96
|
|
| MD5 |
689b2cfe04bed97a8a6c0195c67dd878
|
|
| BLAKE2b-256 |
c343cb2f076b1de6b5f2fd4dc867157a6cf6b832d44e44b3915c9097515900db
|
Provenance
The following attestation bundles were made for reptrace-0.1.2b1.tar.gz:
Publisher:
release.yml on JerryLife/RepTrace
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
reptrace-0.1.2b1.tar.gz -
Subject digest:
fdfb0f21cdaf8fd5a565944c4f9ba6ebc1e3d829d4259ee0f3ffb04ad283bb96 - Sigstore transparency entry: 930453972
- Sigstore integration time:
-
Permalink:
JerryLife/RepTrace@44e3194e770957a1af696891262cab5b3e9c91ff -
Branch / Tag:
refs/tags/v0.1.2b1 - Owner: https://github.com/JerryLife
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@44e3194e770957a1af696891262cab5b3e9c91ff -
Trigger Event:
push
-
Statement type:
File details
Details for the file reptrace-0.1.2b1-py3-none-any.whl.
File metadata
- Download URL: reptrace-0.1.2b1-py3-none-any.whl
- Upload date:
- Size: 10.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
35927ff3d72d4afb6cad67e355bbd95046bc4fd7f6ba710e55dd880782e0f154
|
|
| MD5 |
ff2f822f6abf4c250e8d558d17287d40
|
|
| BLAKE2b-256 |
3897ff9fa3d6ccb8c61c621dcabed82b1a79f715baf81ed09cbf3c4b1107fb10
|
Provenance
The following attestation bundles were made for reptrace-0.1.2b1-py3-none-any.whl:
Publisher:
release.yml on JerryLife/RepTrace
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
reptrace-0.1.2b1-py3-none-any.whl -
Subject digest:
35927ff3d72d4afb6cad67e355bbd95046bc4fd7f6ba710e55dd880782e0f154 - Sigstore transparency entry: 930453975
- Sigstore integration time:
-
Permalink:
JerryLife/RepTrace@44e3194e770957a1af696891262cab5b3e9c91ff -
Branch / Tag:
refs/tags/v0.1.2b1 - Owner: https://github.com/JerryLife
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@44e3194e770957a1af696891262cab5b3e9c91ff -
Trigger Event:
push
-
Statement type: