Skip to main content

Extract LLM DNA vectors — low-dimensional representations that capture functional behavior and model evolution.

This project has been archived.

The maintainers of this project have marked this project as archived. No new releases are expected.

Project description

RepTrace

Python 3.10+ PyPI version License Tests

Extract LLM DNA vectors — low-dimensional, training-free representations that capture functional behavior and evolutionary relationships between language models.

📄 Paper: LLM DNA: Tracing Model Evolution via Functional Representations (ICLR 2026 Oral)

Overview

The explosive growth of large language models has created a vast but opaque landscape: millions of models exist, yet their evolutionary relationships through fine-tuning, distillation, or adaptation are often undocumented. RepTrace provides a general, scalable, training-free pipeline for extracting LLM DNA — mathematically-grounded representations that satisfy inheritance and genetic determinism properties.

Key Features:

  • 🧬 Extract DNA vectors from any HuggingFace or local model
  • 🚀 Training-free, works across architectures and tokenizers
  • 📊 Tested on 305+ LLMs with superior or competitive performance
  • 🔍 Uncover undocumented relationships between models
  • 🌳 Build evolutionary trees using phylogenetic algorithms

Installation

pip install reptrace

Quick Start

from reptrace import DNAExtractionConfig, calc_dna

config = DNAExtractionConfig(
    model_name="distilgpt2",
    dataset="rand",
    gpu_id=0,
    max_samples=100,
)

result = calc_dna(config)
print(f"DNA shape: {result.vector.shape}")  # (128,)

Python API

from reptrace import DNAExtractionConfig, calc_dna

config = DNAExtractionConfig(
    model_name="Qwen/Qwen2.5-0.5B-Instruct",
    dataset="rand",
    gpu_id=0,
    max_samples=100,
    dna_dim=128,
    reduction_method="random_projection",  # or "pca", "svd"
    trust_remote_code=True,
)

result = calc_dna(config)

# DNA vector (numpy.ndarray)
vector = result.vector

# Saved paths (when save=True)
print(result.output_path)
print(result.summary_path)

CLI

# Single model
calc-dna --model-name distilgpt2 --dataset rand --gpus 0

# Multiple models with round-robin GPU assignment
calc-dna --llm-list ./configs/llm_list.txt --gpus 0,1

# With hyperparameters
calc-dna \
  --model-name mistralai/Mistral-7B-v0.1 \
  --dna-dim 256 \
  --max-samples 200 \
  --reduction-method pca \
  --load-in-8bit

Notes

  • Metadata auto-fetched: Model metadata is automatically retrieved from HuggingFace Hub and cached.
  • Auth token: Pass via token=... or set HF_TOKEN environment variable.
  • Chat templates: Applied automatically when supported by the tokenizer.

Tests

# All tests (including integration tests with real model loading)
pytest tests/ -v

# Fast tests only (skip real model loading)
pytest tests/ -m "not slow"

Citation

If you use RepTrace in your research, please cite:

@inproceedings{wu2026llmdna,
  title={LLM DNA: Tracing Model Evolution via Functional Representations},
  author={Wu, Zhaomin and Zhao, Haodong and Wang, Ziyang and Guo, Jizhou and Wang, Qian and He, Bingsheng},
  booktitle={The Fourteenth International Conference on Learning Representations},
  year={2026},
  url={https://openreview.net/pdf?id=UIxHaAqFqQ}
}

License

Apache 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

reptrace-0.1.2b1.tar.gz (9.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

reptrace-0.1.2b1-py3-none-any.whl (10.8 kB view details)

Uploaded Python 3

File details

Details for the file reptrace-0.1.2b1.tar.gz.

File metadata

  • Download URL: reptrace-0.1.2b1.tar.gz
  • Upload date:
  • Size: 9.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for reptrace-0.1.2b1.tar.gz
Algorithm Hash digest
SHA256 fdfb0f21cdaf8fd5a565944c4f9ba6ebc1e3d829d4259ee0f3ffb04ad283bb96
MD5 689b2cfe04bed97a8a6c0195c67dd878
BLAKE2b-256 c343cb2f076b1de6b5f2fd4dc867157a6cf6b832d44e44b3915c9097515900db

See more details on using hashes here.

Provenance

The following attestation bundles were made for reptrace-0.1.2b1.tar.gz:

Publisher: release.yml on JerryLife/RepTrace

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file reptrace-0.1.2b1-py3-none-any.whl.

File metadata

  • Download URL: reptrace-0.1.2b1-py3-none-any.whl
  • Upload date:
  • Size: 10.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for reptrace-0.1.2b1-py3-none-any.whl
Algorithm Hash digest
SHA256 35927ff3d72d4afb6cad67e355bbd95046bc4fd7f6ba710e55dd880782e0f154
MD5 ff2f822f6abf4c250e8d558d17287d40
BLAKE2b-256 3897ff9fa3d6ccb8c61c621dcabed82b1a79f715baf81ed09cbf3c4b1107fb10

See more details on using hashes here.

Provenance

The following attestation bundles were made for reptrace-0.1.2b1-py3-none-any.whl:

Publisher: release.yml on JerryLife/RepTrace

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page