mPLM-Sim: Better Cross-Lingual Similarity and Transfer in Multilingual Pretrained Language Models
Project description
mPLM-Sim: Better Cross-Lingual Similarity and Transfer in Multilingual Pretrained Language Models
mplm-sim is a language similarity tool providing:
Loader
: Accessing high-quality language similarity results directly.Executor
: Obtaining similarity results from scratch.
Quickstart
Download the repo for use or alternatively install with PyPi
pip install mplm_sim
or directly with pip from GitHub
pip install --upgrade git+https://github.com/cisnlp/mPLM-Sim.git#egg=mplm_sim
Loader
from mplm_sim import Loader
# loading existing results given model_name and corpus_name
loader = Loader.from_pretrained(model_name='cis-lmu/glot500-base', corpus_name='flores200')
# Or loading results given similarity file
# loader = Loader.from_tsv('your_similarity_file.tsv')
# Getting similarity given language pairs
# iso3_script
sim = loader.get_sim('eng_Latn', 'cmn_Hani')
# or language name
sim = loader.get_sim('English', 'Chinese')
Executor
from mplm_sim import Loader
# model_name: any text/speech language model support by Huggingface
# corpus_name: specific corpus name for saving
# corpus_path: path for multi-parallel corpora, see corpora_demo for file formatting
# corpus_type: text or speech
executor = Executor(model_name='cis-lmu/glot500-base', corpus_name='own',
corpus_path='corpora/own', corpus_type='text')
# Run
executor.run()
Citation
@article{DBLP:journals/corr/abs-2305-13684,
author = {Peiqin Lin and
Chengzhi Hu and
Zheyu Zhang and
Andr{\'{e}} F. T. Martins and
Hinrich Sch{\"{u}}tze},
title = {mPLM-Sim: Unveiling Better Cross-Lingual Similarity and Transfer in
Multilingual Pretrained Language Models},
journal = {CoRR},
volume = {abs/2305.13684},
year = {2023},
url = {https://doi.org/10.48550/arXiv.2305.13684},
doi = {10.48550/ARXIV.2305.13684},
eprinttype = {arXiv},
eprint = {2305.13684},
timestamp = {Mon, 05 Jun 2023 15:42:15 +0200},
biburl = {https://dblp.org/rec/journals/corr/abs-2305-13684.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
mplm_sim-0.1.0.tar.gz
(9.9 kB
view details)
Built Distribution
File details
Details for the file mplm_sim-0.1.0.tar.gz
.
File metadata
- Download URL: mplm_sim-0.1.0.tar.gz
- Upload date:
- Size: 9.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
2dd40f40c27ace8f745d0ab1d59b1e28eca8fda2cb67667c2b5284c7449ea776
|
|
MD5 |
291db56c68e5494e4d4a9f471ecd3ace
|
|
BLAKE2b-256 |
43d9ddc3fca946d7beec4e7a554c71a703192c7f7bc616f3e921c4ac748c0bff
|
File details
Details for the file mplm_sim-0.1.0-py2.py3-none-any.whl
.
File metadata
- Download URL: mplm_sim-0.1.0-py2.py3-none-any.whl
- Upload date:
- Size: 9.8 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
8c222f2cc84892a04afb8f53132f0afb4b57006117396e71ffbe589bac404f31
|
|
MD5 |
4b4f330aadd664345d5067af4bf6b217
|
|
BLAKE2b-256 |
48aeb658e7624a475148d0cb1dba809fee8ca7fe5e881c2ee4a8b893ba0a1b20
|