Skip to main content

Trimming vocabulary of pre-trained multilingual language models to language localization.

Project description

[WIP] LM-Vocab-Trimmer: A Simple Model Compression by Trimming Embedding Matrix

The LM-Vocab-Trimmer a.k.a. vocabtrimmer is a model compression tool aiming at reducing the parameter size of multilingual LMs by trimming unused tokens from the embedding matrix. This library assumes that you want to use or already fine-tuned a multilingual LM in a few specific languages, and other languages are not needed to be covered by the LM anymore. Then, vocabtrimmer remove those tokens in the out-of-scope languages from the embedding matrix,

the input and the output embedding matrix .

Multilingual LMs (mT5, mBART, XLM-R, etc) are

is a pythob

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vocabtrimmer-0.0.1.tar.gz (10.3 kB view details)

Uploaded Source

File details

Details for the file vocabtrimmer-0.0.1.tar.gz.

File metadata

  • Download URL: vocabtrimmer-0.0.1.tar.gz
  • Upload date:
  • Size: 10.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.10

File hashes

Hashes for vocabtrimmer-0.0.1.tar.gz
Algorithm Hash digest
SHA256 8b5f912ce596fd7bdcdee2ba47f19f294bd66671f144a66e1bb52aadc51a52a2
MD5 9150fcc6ed5ea80cd7a9f5f24cef5099
BLAKE2b-256 09e2438d343f6b29df40a8fe15c76877f9a1df95491862f1cc7e68a287ad28a2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page