mPLM-Sim: Better Cross-Lingual Similarity and Transfer in Multilingual Pretrained Language Models
Project description
mPLM-Sim: Better Cross-Lingual Similarity and Transfer in Multilingual Pretrained Language Models
mplm-sim is a language similarity tool providing:
Loader: Accessing high-quality language similarity results directly.Executor: Obtaining similarity results from scratch.
Quickstart
Download the repo for use or alternatively install with PyPi
pip install mplm_sim
or directly with pip from GitHub
pip install --upgrade git+https://github.com/cisnlp/mPLM-Sim.git#egg=mplm_sim
Loader
from mplm_sim import Loader
# loading existing results given model_name and corpus_name
loader = Loader.from_pretrained(model_name='cis-lmu/glot500-base', corpus_name='flores200')
# Or loading results given similarity file
# loader = Loader.from_tsv('your_similarity_file.tsv')
# Getting similarity given language pairs
# iso3_script
sim = loader.get_sim('eng_Latn', 'cmn_Hani')
# or language name
sim = loader.get_sim('English', 'Chinese')
Executor
from mplm_sim import Loader
# model_name: any text/speech language model support by Huggingface
# corpus_name: specific corpus name for saving
# corpus_path: path for multi-parallel corpora, see corpora_demo for file formatting
# corpus_type: text or speech
executor = Executor(model_name='cis-lmu/glot500-base', corpus_name='own',
corpus_path='corpora/own', corpus_type='text')
# Run
executor.run()
Citation
@article{DBLP:journals/corr/abs-2305-13684,
author = {Peiqin Lin and
Chengzhi Hu and
Zheyu Zhang and
Andr{\'{e}} F. T. Martins and
Hinrich Sch{\"{u}}tze},
title = {mPLM-Sim: Unveiling Better Cross-Lingual Similarity and Transfer in
Multilingual Pretrained Language Models},
journal = {CoRR},
volume = {abs/2305.13684},
year = {2023},
url = {https://doi.org/10.48550/arXiv.2305.13684},
doi = {10.48550/ARXIV.2305.13684},
eprinttype = {arXiv},
eprint = {2305.13684},
timestamp = {Mon, 05 Jun 2023 15:42:15 +0200},
biburl = {https://dblp.org/rec/journals/corr/abs-2305-13684.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mplm_sim-0.1.0.tar.gz.
File metadata
- Download URL: mplm_sim-0.1.0.tar.gz
- Upload date:
- Size: 9.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2dd40f40c27ace8f745d0ab1d59b1e28eca8fda2cb67667c2b5284c7449ea776
|
|
| MD5 |
291db56c68e5494e4d4a9f471ecd3ace
|
|
| BLAKE2b-256 |
43d9ddc3fca946d7beec4e7a554c71a703192c7f7bc616f3e921c4ac748c0bff
|
File details
Details for the file mplm_sim-0.1.0-py2.py3-none-any.whl.
File metadata
- Download URL: mplm_sim-0.1.0-py2.py3-none-any.whl
- Upload date:
- Size: 9.8 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8c222f2cc84892a04afb8f53132f0afb4b57006117396e71ffbe589bac404f31
|
|
| MD5 |
4b4f330aadd664345d5067af4bf6b217
|
|
| BLAKE2b-256 |
48aeb658e7624a475148d0cb1dba809fee8ca7fe5e881c2ee4a8b893ba0a1b20
|