Token translation for language models.
Project description
transtokenizers
Token translation for language models
- Documentation: https://ipieter.github.io/transtokenizer
- GitHub: https://github.com/LAGoM-NLP/transtokenizer
- PyPI: https://pypi.org/project/trans-tokenizers/
- Licence: MIT
Features
- TODO
Usage
from transtokenizers import transform_model
from transformers import AutoTokenizer, AutoModelForCausalLM
source_tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-v0.1")
target_tokenizer = AutoTokenizer.from_pretrained("pdelobelle/robbert-2023-dutch-base")
source_model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-v0.1")
target_model = transform_model(source_model, source_tokenizer=source_tokenizer, target_tokenizer=target_tokenizer)
Credits
If this repo was useful to you, please cite the following paper
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
No source distribution files available for this release.See tutorial on generating distribution archives.
Built Distribution
Close
Hashes for trans_tokenizers-0.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4f129ed9d74858914788dd4a4ed53ef7e86ad5c0f641e9e1a1e21b17229ad7fa |
|
MD5 | c91900e827dd5d94dbc5edbdf96981ec |
|
BLAKE2b-256 | 13eb120f6991d921a40af783718a341b81ce0c70a5b6c4c2b84f3adae9f46ef6 |