Skip to main content

mPLM-Sim: Better Cross-Lingual Similarity and Transfer in Multilingual Pretrained Language Models

Project description

mPLM-Sim: Better Cross-Lingual Similarity and Transfer in Multilingual Pretrained Language Models

arXiv

mplm-sim is a language similarity tool providing:

  • Loader: Accessing high-quality language similarity results directly.
  • Executor: Obtaining similarity results from scratch.

Quickstart

Download the repo for use or alternatively install with PyPi

pip install mplm_sim

or directly with pip from GitHub

pip install --upgrade git+https://github.com/cisnlp/mPLM-Sim.git#egg=mplm_sim

Loader

from mplm_sim import Loader

# loading existing results given model_name and corpus_name
loader = Loader.from_pretrained(model_name='cis-lmu/glot500-base', corpus_name='flores200')
# Or loading results given similarity file
# loader = Loader.from_tsv('your_similarity_file.tsv')

# Getting similarity given language pairs
# iso3_script
sim = loader.get_sim('eng_Latn', 'cmn_Hani')
# or language name
sim = loader.get_sim('English', 'Chinese')

Executor

from mplm_sim import Loader

# model_name: any text/speech language model support by Huggingface
# corpus_name: specific corpus name for saving
# corpus_path: path for multi-parallel corpora, see corpora_demo for file formatting
# corpus_type: text or speech
executor = Executor(model_name='cis-lmu/glot500-base', corpus_name='own',
                    corpus_path='corpora/own', corpus_type='text')

# Run
executor.run()

Citation

@article{DBLP:journals/corr/abs-2305-13684,
  author       = {Peiqin Lin and
                  Chengzhi Hu and
                  Zheyu Zhang and
                  Andr{\'{e}} F. T. Martins and
                  Hinrich Sch{\"{u}}tze},
  title        = {mPLM-Sim: Unveiling Better Cross-Lingual Similarity and Transfer in
                  Multilingual Pretrained Language Models},
  journal      = {CoRR},
  volume       = {abs/2305.13684},
  year         = {2023},
  url          = {https://doi.org/10.48550/arXiv.2305.13684},
  doi          = {10.48550/ARXIV.2305.13684},
  eprinttype    = {arXiv},
  eprint       = {2305.13684},
  timestamp    = {Mon, 05 Jun 2023 15:42:15 +0200},
  biburl       = {https://dblp.org/rec/journals/corr/abs-2305-13684.bib},
  bibsource    = {dblp computer science bibliography, https://dblp.org}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mplm_sim-0.1.0.tar.gz (9.9 kB view details)

Uploaded Source

Built Distribution

mplm_sim-0.1.0-py2.py3-none-any.whl (9.8 kB view details)

Uploaded Python 2Python 3

File details

Details for the file mplm_sim-0.1.0.tar.gz.

File metadata

  • Download URL: mplm_sim-0.1.0.tar.gz
  • Upload date:
  • Size: 9.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.8

File hashes

Hashes for mplm_sim-0.1.0.tar.gz
Algorithm Hash digest
SHA256 2dd40f40c27ace8f745d0ab1d59b1e28eca8fda2cb67667c2b5284c7449ea776
MD5 291db56c68e5494e4d4a9f471ecd3ace
BLAKE2b-256 43d9ddc3fca946d7beec4e7a554c71a703192c7f7bc616f3e921c4ac748c0bff

See more details on using hashes here.

File details

Details for the file mplm_sim-0.1.0-py2.py3-none-any.whl.

File metadata

  • Download URL: mplm_sim-0.1.0-py2.py3-none-any.whl
  • Upload date:
  • Size: 9.8 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.8

File hashes

Hashes for mplm_sim-0.1.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 8c222f2cc84892a04afb8f53132f0afb4b57006117396e71ffbe589bac404f31
MD5 4b4f330aadd664345d5067af4bf6b217
BLAKE2b-256 48aeb658e7624a475148d0cb1dba809fee8ca7fe5e881c2ee4a8b893ba0a1b20

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page