Clone and prune transformer models with new tokenizers

These details have not been verified by PyPI

Project links

Project description

Transformer Cloner

Clone and prune transformer models with new tokenizers. Create smaller, more efficient models by mapping vocabularies, reducing dimensions, and pruning layers.

Features

🔄 Vocabulary Mapping: Map tokens from a new tokenizer to original model embeddings
📉 Model Pruning: Reduce hidden size, layers, attention heads, and more
🎯 Multiple Strategies: Choose from mean, sum, first, last, weighted, max, min for embedding combination
✅ Validation: Automatic config validation to prevent incompatible architectures
🚀 Fast: Batch processing for efficient token ID mapping

Installation

pip install transformer-cloner

Quick Start

Clone with New Tokenizer

from transformer_cloner import TransformerCloner, EmbeddingStrategy

cloner = TransformerCloner(
    org_model_id="google/gemma-3-270m-it",
    target_tokenizer_id="your-username/custom-tokenizer",
)

# Clone with mean embedding strategy
model = cloner.clone(strategy=EmbeddingStrategy.MEAN)
model.save_pretrained("cloned-model")

Prune Model Architecture

from transformer_cloner import TransformerCloner, PruningConfig, EmbeddingStrategy

cloner = TransformerCloner(
    org_model_id="google/gemma-3-270m-it",
    target_tokenizer_id="your-username/custom-tokenizer",
)

# Create a smaller model
pruning_config = PruningConfig(
    hidden_size=320,           # Reduce embedding dimension
    num_hidden_layers=9,       # Fewer layers
    intermediate_size=1024,    # Smaller FFN
    num_attention_heads=2,     # Fewer attention heads
)

model = cloner.clone_pruned(
    pruning_config=pruning_config,
    strategy=EmbeddingStrategy.MEAN,
)
model.save_pretrained("pruned-model")

Vocabulary Pruning (Direct 1:1 Mapping)

from transformer_cloner import TransformerCloner

cloner = TransformerCloner(
    org_model_id="google/gemma-3-270m-it",
    target_tokenizer_id="google/gemma-3-270m-it",  # Same tokenizer
)

# Keep only first 16k tokens
model, tokenizer = cloner.clone_with_vocab_pruning(vocab_size=16000)

model.save_pretrained("vocab-pruned-model")
tokenizer.save_pretrained("vocab-pruned-model")

Embedding Strategies

When a target token maps to multiple source tokens, choose how to combine them:

Strategy	Description
`MEAN`	Average of all source embeddings (default)
`SUM`	Sum of all source embeddings
`FIRST`	Use only the first token's embedding
`LAST`	Use only the last token's embedding
`WEIGHTED`	Weighted average (more weight to first tokens)
`MAX`	Element-wise maximum
`MIN`	Element-wise minimum

Pruning Options

Parameter	Description
`hidden_size`	Embedding dimension
`num_hidden_layers`	Number of transformer layers
`intermediate_size`	FFN intermediate dimension
`num_attention_heads`	Number of attention heads
`num_key_value_heads`	Number of KV heads (for GQA)
`head_dim`	Dimension per attention head

License

MIT License - see LICENSE for details.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.2.10

Dec 30, 2025

0.2.9

Dec 30, 2025

0.2.6

Dec 29, 2025

0.2.5

Dec 29, 2025

0.2.4

Dec 22, 2025

0.2.1

Dec 22, 2025

0.2.0

Dec 21, 2025

0.1.6

Dec 20, 2025

0.1.5

Dec 20, 2025

0.1.4

Dec 20, 2025

0.1.3

Dec 20, 2025

0.1.2

Dec 20, 2025

0.1.1

Dec 20, 2025

This version

0.1.0

Dec 20, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

transformer_cloner-0.1.0.tar.gz (10.7 kB view details)

Uploaded Dec 20, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

transformer_cloner-0.1.0-py3-none-any.whl (11.7 kB view details)

Uploaded Dec 20, 2025 Python 3

File details

Details for the file transformer_cloner-0.1.0.tar.gz.

File metadata

Download URL: transformer_cloner-0.1.0.tar.gz
Upload date: Dec 20, 2025
Size: 10.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for transformer_cloner-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`14af8fe1b8dfcbd20fe52f195daa954b216982eda22037f3b11a93a056eebf55`
MD5	`32f577dce1760d53579dece5965e4673`
BLAKE2b-256	`ed5533929ac0f1cd8efb320d980a472173097f953c564b7d750d6c97fbc1aa01`

See more details on using hashes here.

File details

Details for the file transformer_cloner-0.1.0-py3-none-any.whl.

File metadata

Download URL: transformer_cloner-0.1.0-py3-none-any.whl
Upload date: Dec 20, 2025
Size: 11.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for transformer_cloner-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f88eef07ebea60138d4f32316f39c3d4b25ea56db2ecffda683a99a9130a9f0c`
MD5	`eb8dd9efdb3e35b44ba3033269c10b4c`
BLAKE2b-256	`6c547a6f1074b37a2c87c92c0a9213fc7d9ba584bf1028cbbc76e5ad25278c5d`

See more details on using hashes here.

transformer-cloner 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Transformer Cloner

Features

Installation

Quick Start

Clone with New Tokenizer

Prune Model Architecture

Vocabulary Pruning (Direct 1:1 Mapping)

Embedding Strategies

Pruning Options

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes