Skip to main content

toolbox for various tasks in the area of vector space models of computational linguistic

Project description

The library for distributed pretraining and finetuning of language models.

Supported features:

  • vanilla pre-training of BERT-like models

  • distributed training on multi-node/multi-GPU systems

  • benchmarking/finetuning the following tasks
    • all GLUE

    • MNLI + additional validation on HANS

    • more coming soon

  • using siamese architectures for finutuning

Pretraining

Pretraining a model:

mpirun -np N python -m langmo.pretraining config.yaml

langmo saves 2 types of snapshots: in pytorch_ligning format

To resume crashed/aborted pretraining session:

mpirun -np N python -m langmo.pretraining.resume path_to_run

Finetuning/Evaluation

Finetuning on one of the GLUE tasks:

mpirun -np N python -m langmo.benchmarks.GLUE config.yaml glue_task

supported tasks: cola, rte, stsb, mnli, mnli-mm, mrpc, sst2, qqp, qnli

NLI task has additional special implentation which supports validation on adversarial HANS dataset, as well as additional staticics for each label/heuristic.

To perfrorm fibetuning on NLI run as:

mpirun -np N python -m langmo.benchmarks.NLI config.yaml

Finetuning on extractive question-answering tasks:

mpirun -np N python -m langmo.benchmarks.QA config.yaml qa_task

supported tasks: squad, squad_v2

example config file:

model_name: "roberta-base"
batch_size: 32
cnt_epochs: 4
path_results: ./logs
max_lr: 0.0005
siamese: true
freeze_encoder: false
encoder_wrapper: pooler
shuffle: true

Automatic evaluation

langmo supports automatic scheduling of evaluation runs for a model saved in a given location, or for all snapshots found int /snapshots folder. To configure langmo the user has to create the following file:

./configs/langmo.yaml with entry “submit_command” correspoding to a job submission command of a given cluster. If the file is not present, the jobs will not be submitted to the job queue, but executed immediately one by one on the same node.

./configs/auto_finetune.inc - the content of this file will be copied to the beginning of the job scripts. Place here directive for e.g. slurm job scheduler such as which resource group to use, how many nodes to allocate, time limit etc. Set up all necessary environment variables, particulalry NUM_GPUS_PER_NODE and PL_TORCH_DISTRIBUTED_BACKED (MPI, NCCL or GLOO). Finally add mpirun command with necessay option and end the file with new line. Command to invoke langmo in the right way will be added automatically.

./configs/auto_finetune.yaml - any parameters such as batch size etc to owerride the defaults in a fine-tuning run.

To schedule evaluation jobs run from the login node:

python -m langmo.benchmarks path_to_model task_name

the results will be saved in the eval/task_name/run_name/ subfolder in the same folder the model is saved.

Fugaku notes

Add these lines before the return of _compare_version statement of pytorch_lightning/utilities/imports.py.:

if str(pkg_version).startswith(version):
    return True

This sed command should do the trick:

sed -i -e '/pkg_version = Version(pkg_version.base_version/a\    if str(pkg_version).startswith(version):\n\        return True' \
  ~/.local/lib/python3.8/site-packages/pytorch_lightning/utilities/imports.py

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

langmo-0.2.0-py3-none-any.whl (76.3 kB view details)

Uploaded Python 3

File details

Details for the file langmo-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: langmo-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 76.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.12

File hashes

Hashes for langmo-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ab926f539b88dc3c0a0587a406154f494fb62b8a891c49e688719d2feaf69db4
MD5 15046d93bc43963ff3ad3295bb4f2241
BLAKE2b-256 0fab0b7fe68e8662514d66a47c98f3886c91bdf2f97ea60f94a50134f59ff2cf

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page