Clone and prune transformer models with new tokenizers

These details have not been verified by PyPI

Project links

Project description

🔄 Transformer Cloner

Clone and prune transformer models with new tokenizers. Create smaller, more efficient models by mapping vocabularies, reducing dimensions, and pruning layers.

Use Cases

🌍 Language Adaptation: Use a custom tokenizer optimized for your language
📉 Model Compression: Create smaller models for edge deployment
🎓 Knowledge Distillation: Generate student models from teacher models
🔬 Research: Experiment with different model architectures

📦 Installation

pip install transformer-cloner

Requirements:

Python 3.10+
PyTorch 2.0+
Transformers 4.40+

📖 Complete API Reference

TransformerCloner

The main class for cloning and pruning transformer models.

from transformer_cloner import TransformerCloner

cloner = TransformerCloner(
    org_model_id: str,         # HuggingFace model ID or local path to original model
    target_tokenizer_id: str,  # HuggingFace tokenizer ID or local path to target tokenizer
    token: str = None,         # Optional HuggingFace API token for gated models
)

Attributes after initialization:

cloner.org_model - The loaded original model
cloner.org_tokenizer - The original model's tokenizer
cloner.target_tokenizer - The target tokenizer
cloner.token - The HuggingFace API token (if provided)

Method: `build_token_id_map()`

Build a mapping from target tokenizer IDs to original tokenizer IDs.

token_map = cloner.build_token_id_map(
    batch_size: int = 5000,   # Number of tokens to process per batch (higher = faster but more memory)
    verbose: bool = True,     # Whether to print progress
) -> dict[int, list[int]]     # Returns {target_token_id: [source_token_ids]}

Example:

cloner = TransformerCloner(
    org_model_id="google/gemma-3-270m-it",
    target_tokenizer_id="alibayram/turkish-tokenizer",
)

# Build the token mapping
token_map = cloner.build_token_id_map(batch_size=10000, verbose=True)
# Output: Building token ID map for 65536 tokens...
#         Processed 10000/65536 tokens
#         ...
#         Token ID map built with 65536 entries

print(token_map[100])  # [234, 567] - target token 100 maps to source tokens 234, 567

Method: `clone()`

Clone the model with a new tokenizer, mapping embeddings from the original model.

model = cloner.clone(
    strategy: EmbeddingStrategy = EmbeddingStrategy.MEAN,  # How to combine multiple source embeddings
    verbose: bool = True,                                   # Whether to print progress
) -> AutoModelForCausalLM                                   # Returns the cloned model

Example:

from transformer_cloner import TransformerCloner, EmbeddingStrategy

cloner = TransformerCloner(
    org_model_id="google/gemma-3-270m-it",
    target_tokenizer_id="alibayram/turkish-tokenizer",
)

# Clone with mean embedding strategy
model = cloner.clone(strategy=EmbeddingStrategy.MEAN, verbose=True)
# Output: Building token ID map for 65536 tokens...
#         Cloning model with strategy: mean
#         Model vocab size: 65536, Tokenizer vocab size: 65536
#         Copying weights from original model...
#         Mapping embeddings...
#         Mapped 1000/65536 embeddings
#         ...
#         Model cloning complete!

# Save the cloned model
model.save_pretrained("./cloned-model")

Method: `clone_with_lm_head()`

Clone the model including the language modeling head (for models with untied weights).

model = cloner.clone_with_lm_head(
    strategy: EmbeddingStrategy = EmbeddingStrategy.MEAN,  # How to combine embeddings
    verbose: bool = True,                                   # Whether to print progress
) -> AutoModelForCausalLM                                   # Returns the cloned model

Example:

# For models where lm_head is NOT tied to embeddings
model = cloner.clone_with_lm_head(strategy=EmbeddingStrategy.WEIGHTED)
# Output: ... (same as clone)
#         Mapping lm_head weights...
#         lm_head mapping complete!

model.save_pretrained("./cloned-with-lm-head")

Method: `clone_pruned()`

Clone the model with architecture pruning (smaller hidden size, fewer layers, etc.).

model = cloner.clone_pruned(
    pruning_config: PruningConfig,                          # Configuration for pruned dimensions
    strategy: EmbeddingStrategy = EmbeddingStrategy.MEAN,   # How to combine embeddings
    verbose: bool = True,                                    # Whether to print progress
) -> AutoModelForCausalLM                                    # Returns the pruned model

Example:

from transformer_cloner import TransformerCloner, PruningConfig, EmbeddingStrategy

cloner = TransformerCloner(
    org_model_id="google/gemma-3-270m-it",
    target_tokenizer_id="alibayram/turkish-tokenizer",
)

# Create a smaller model
pruning_config = PruningConfig(
    hidden_size=320,           # Reduce from 640 to 320
    num_hidden_layers=9,       # Reduce from 18 to 9
    intermediate_size=1024,    # Reduce from 2048 to 1024
    num_attention_heads=2,     # Reduce from 4 to 2
    num_key_value_heads=1,     # Keep at 1
)

model = cloner.clone_pruned(
    pruning_config=pruning_config,
    strategy=EmbeddingStrategy.MEAN,
    verbose=True,
)
# Output: Original: hidden=640, layers=18, intermediate=2048, heads=4, kv_heads=1
#         Pruned:   hidden=320, layers=9, intermediate=1024, heads=2, kv_heads=1
#         Model vocab size: 65536
#         Copying and pruning weights from original model...
#         Mapping embeddings with pruning...
#         Pruned model cloning complete!

model.save_pretrained("./pruned-model")

Method: `clone_with_vocab_pruning()`

Clone model with a reduced embedding table (fewer tokens).

model, tokenizer, id_mapping = cloner.clone_with_vocab_pruning(
    keep_token_ids: Optional[list[int]] = None,   # Specific token IDs to keep
    vocab_size: Optional[int] = None,             # Keep first N tokens (ignored if keep_token_ids provided)
    pruning_config: Optional[PruningConfig] = None,  # Optional architecture pruning
    verbose: bool = True,                          # Whether to print progress
) -> tuple[AutoModelForCausalLM, AutoTokenizer, dict[int, int]]
# Returns: (model, original_tokenizer, id_mapping)
# id_mapping: {old_token_id: new_embedding_index}

Note: The original tokenizer is returned unchanged because modifying SentencePiece/BPE vocabularies breaks them. Use id_mapping to convert token IDs to embedding indices.

Example 1: Keep first N tokens

from transformer_cloner import TransformerCloner

cloner = TransformerCloner(
    org_model_id="google/gemma-3-270m-it",
    target_tokenizer_id="google/gemma-3-270m-it",  # Same tokenizer
)

# Keep only first 8000 tokens
model, tokenizer, id_mapping = cloner.clone_with_vocab_pruning(
    vocab_size=8000,
    verbose=True,
)
# Output: Cloning with vocab pruning: 8000 tokens
#         New vocab size: 8000
#         Creating model with vocab_size=8000
#         Copying weights from original model...
#         Mapping embeddings (direct 1:1)...
#         Mapped 8000 embeddings directly
#         Vocab-pruned model cloning complete!

model.save_pretrained("./vocab-pruned-model")

# Use id_mapping to convert token IDs
print(id_mapping)  # {0: 0, 1: 1, 2: 2, ..., 7999: 7999}

Example 2: Keep specific tokens

# Keep only specific token IDs
important_tokens = [0, 1, 2, 100, 200, 500, 1000, 2000, 5000]

model, tokenizer, id_mapping = cloner.clone_with_vocab_pruning(
    keep_token_ids=important_tokens,
    verbose=True,
)

print(id_mapping)
# {0: 0, 1: 1, 2: 2, 100: 3, 200: 4, 500: 5, 1000: 6, 2000: 7, 5000: 8}

Example 3: Combined vocab + architecture pruning

from transformer_cloner import TransformerCloner, PruningConfig

cloner = TransformerCloner(
    org_model_id="google/gemma-3-270m-it",
    target_tokenizer_id="google/gemma-3-270m-it",
)

# Combine vocab pruning with architecture pruning
model, tokenizer, id_mapping = cloner.clone_with_vocab_pruning(
    vocab_size=8000,
    pruning_config=PruningConfig(
        hidden_size=320,
        num_hidden_layers=6,
        intermediate_size=1024,
    ),
    verbose=True,
)

# Result: Tiny model with 8000 tokens and smaller architecture
model.save_pretrained("./tiny-model")

Method: `get_token_info()`

Get information about how a specific token is mapped.

info = cloner.get_token_info(
    token: str,  # The token string to look up
) -> dict      # Returns token mapping information

Example:

cloner.build_token_id_map()

info = cloner.get_token_info("hello")
print(info)
# {
#     'token': 'hello',
#     'target_id': 1234,
#     'source_ids': [567, 890],
#     'source_tokens': ['hel', 'lo']
# }

# Token not found
info = cloner.get_token_info("xyz123")
# {'error': "Token 'xyz123' not found in target tokenizer"}

Method: `print_vocab_samples()`

Print sample vocabulary entries from both tokenizers.

cloner.print_vocab_samples(
    n: int = 10,  # Number of samples to print
) -> None

Example:

cloner.print_vocab_samples(n=5)
# Output:
# Original tokenizer samples:
#   0: '<bos>'   0
#   1: '<eos>'   1
#   2: '<pad>'   2
#   3: '▁'       3
#   4: '▁the'    4
# Total: 262144 tokens
#
# Target tokenizer samples:
#   0: '<bos>'   0
#   1: '<eos>'   1
#   2: '<pad>'   2
#   3: '▁'       3
#   4: '▁ve'     4
# Total: 65536 tokens

🎯 EmbeddingStrategy

When a target token maps to multiple source tokens, choose how to combine their embeddings:

from transformer_cloner import EmbeddingStrategy

Strategy	Value	Description	Best For
`MEAN`	`"mean"`	Average of all embeddings	Default, balanced representation
`SUM`	`"sum"`	Sum of all embeddings	Preserving total magnitude
`FIRST`	`"first"`	First token's embedding only	Prefix-focused tokens
`LAST`	`"last"`	Last token's embedding only	Suffix-focused tokens
`WEIGHTED`	`"weighted"`	Weighted average (first tokens weighted more)	Morphological priority
`MAX`	`"max"`	Element-wise maximum	Preserving dominant features
`MIN`	`"min"`	Element-wise minimum	Preserving minimal features

Example:

# Use different strategies
model = cloner.clone(strategy=EmbeddingStrategy.MEAN)     # Average
model = cloner.clone(strategy=EmbeddingStrategy.WEIGHTED) # First tokens matter more
model = cloner.clone(strategy=EmbeddingStrategy.FIRST)    # Only first token

⚙️ PruningConfig

Configuration dataclass for model architecture pruning.

from transformer_cloner import PruningConfig

config = PruningConfig(
    hidden_size: Optional[int] = None,           # Embedding dimension
    num_hidden_layers: Optional[int] = None,     # Number of transformer layers
    intermediate_size: Optional[int] = None,     # FFN intermediate dimension
    num_attention_heads: Optional[int] = None,   # Number of attention heads
    num_key_value_heads: Optional[int] = None,   # Number of KV heads (for GQA)
    head_dim: Optional[int] = None,              # Dimension per attention head
)

Set any value to None to keep the original model's value.

Example configurations:

# Half the layers only
config = PruningConfig(num_hidden_layers=9)  # 18 -> 9

# Half all dimensions
config = PruningConfig(
    hidden_size=320,          # 640 -> 320
    num_hidden_layers=9,      # 18 -> 9
    intermediate_size=1024,   # 2048 -> 1024
)

# Tiny model
config = PruningConfig(
    hidden_size=128,
    num_hidden_layers=3,
    intermediate_size=512,
    num_attention_heads=2,
    num_key_value_heads=1,
)

Validation

Use validate() to check if your config is valid before cloning:

errors = config.validate(cloner.org_model.config)
if errors:
    print("Validation errors:", errors)
else:
    print("Config is valid!")

Validation checks:

✅ Dimensions don't exceed original model
✅ All values are positive
✅ num_attention_heads is divisible by num_key_value_heads
✅ hidden_size is compatible with attention configuration

📊 Gemma-3-270m Architecture Reference

For google/gemma-3-270m-it:

Parameter	Original Value
`hidden_size`	640
`num_hidden_layers`	18
`intermediate_size`	2048
`num_attention_heads`	4
`num_key_value_heads`	1
`head_dim`	256
`vocab_size`	262144

Example pruned configs:

# ~50% size (9 layers, same hidden)
PruningConfig(num_hidden_layers=9)

# ~25% size (half dimensions)
PruningConfig(
    hidden_size=320,
    num_hidden_layers=9,
    intermediate_size=1024,
)

# Tiny (~12% size)
PruningConfig(
    hidden_size=160,
    num_hidden_layers=6,
    intermediate_size=512,
    num_attention_heads=2,
)

🔧 Complete Workflow Example

from transformer_cloner import TransformerCloner, PruningConfig, EmbeddingStrategy

# 1. Initialize
cloner = TransformerCloner(
    org_model_id="google/gemma-3-270m-it",
    target_tokenizer_id="my-org/turkish-gemma-tokenizer",
)

# 2. Explore the vocabularies
cloner.print_vocab_samples(n=5)

# 3. Build token mapping
token_map = cloner.build_token_id_map()

# 4. Check a specific token
info = cloner.get_token_info("merhaba")
print(f"'merhaba' maps to source tokens: {info['source_tokens']}")

# 5. Create pruned model
pruning_config = PruningConfig(
    hidden_size=320,
    num_hidden_layers=9,
    intermediate_size=1024,
)

# Validate first
errors = pruning_config.validate(cloner.org_model.config)
if errors:
    raise ValueError(f"Invalid config: {errors}")

# 6. Clone with pruning
model = cloner.clone_pruned(
    pruning_config=pruning_config,
    strategy=EmbeddingStrategy.MEAN,
)

# 7. Save
model.save_pretrained("./turkish-gemma-small")
cloner.target_tokenizer.save_pretrained("./turkish-gemma-small")

print("Done! Model saved to ./turkish-gemma-small")

🧪 Test Scripts

The repository includes test scripts to validate the package with various models:

Script	Description
`test_multi_model_cloning.py`	Test vocab pruning across multiple model families
`test_generation.py`	Test text generation with cloned models
`test_cross_tokenizer_cloning.py`	Test cloning with a custom tokenizer

Run locally:

# Clone with vocab pruning and save locally
python scripts/test_multi_model_cloning.py

# Test generation on saved models
python scripts/test_generation.py

# Clone with a custom Turkish tokenizer
python scripts/test_cross_tokenizer_cloning.py

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

📄 License

MIT License - see LICENSE for details.

🙏 Acknowledgments

Built on top of 🤗 Transformers
Inspired by vocabulary adaptation research in multilingual NLP

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.2.10

Dec 30, 2025

0.2.9

Dec 30, 2025

0.2.6

Dec 29, 2025

0.2.5

Dec 29, 2025

0.2.4

Dec 22, 2025

This version

0.2.1

Dec 22, 2025

0.2.0

Dec 21, 2025

0.1.6

Dec 20, 2025

0.1.5

Dec 20, 2025

0.1.4

Dec 20, 2025

0.1.3

Dec 20, 2025

0.1.2

Dec 20, 2025

0.1.1

Dec 20, 2025

0.1.0

Dec 20, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

transformer_cloner-0.2.1.tar.gz (20.3 kB view details)

Uploaded Dec 22, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

transformer_cloner-0.2.1-py3-none-any.whl (17.9 kB view details)

Uploaded Dec 22, 2025 Python 3

File details

Details for the file transformer_cloner-0.2.1.tar.gz.

File metadata

Download URL: transformer_cloner-0.2.1.tar.gz
Upload date: Dec 22, 2025
Size: 20.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for transformer_cloner-0.2.1.tar.gz
Algorithm	Hash digest
SHA256	`0a56822320ba5e51393db2b0f807d21d723da7f3d53cf97fb0629ea64401c1bf`
MD5	`d5942032d3ec47c6cd496b16ccf9a5c3`
BLAKE2b-256	`b5b1c20e4847bf97c7c2469e747adb619c323a3883519906e790be1573cdf742`

See more details on using hashes here.

File details

Details for the file transformer_cloner-0.2.1-py3-none-any.whl.

File metadata

Download URL: transformer_cloner-0.2.1-py3-none-any.whl
Upload date: Dec 22, 2025
Size: 17.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for transformer_cloner-0.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8d2cc218132706d24a0337cefb5a2828ddaf45b687de945c2a9ea6a99fcb5790`
MD5	`faef043f2d0903036213360ca2752779`
BLAKE2b-256	`19d737bc7e54bd6155e40355e9100cc060689959228b9ddb76cf67002990d241`

See more details on using hashes here.

transformer-cloner 0.2.1

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

🔄 Transformer Cloner

Use Cases

📦 Installation

📖 Complete API Reference

TransformerCloner

Method: build_token_id_map()

Method: clone()

Method: clone_with_lm_head()

Method: clone_pruned()

Method: clone_with_vocab_pruning()

Method: get_token_info()

Method: print_vocab_samples()

🎯 EmbeddingStrategy

⚙️ PruningConfig

Validation

📊 Gemma-3-270m Architecture Reference

🔧 Complete Workflow Example

🧪 Test Scripts

🤝 Contributing

📄 License

🙏 Acknowledgments

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Method: `build_token_id_map()`

Method: `clone()`

Method: `clone_with_lm_head()`

Method: `clone_pruned()`

Method: `clone_with_vocab_pruning()`

Method: `get_token_info()`

Method: `print_vocab_samples()`