Skip to main content

CompassionAI Project Garland - machine translation for classical Tibetan

Project description

CompassionAI project Garland - neural machine translation from classical Tibetan

Machine translation from classical literary Tibetan. Current focus is on:

  • Custom neural machine translation models that build on Hugging Face Transformers.
  • Using short sentence translation models published by the big research labs, such as FAIR, as backbone models for translating long texts.

Eventually these techniques may generalize to other low resource languages.

Installation

There are two modes for this library - inference and research. We provide instructions for Linux.

  • Inference should work on MacOS and Windows mutatis mutandis.
  • We very strongly recommend doing research only on Linux. We will not provide any support to people trying to perform research tasks without installing Linux.

Virtual environment

We strongly recommend using a virtual environment for all your Python package installations, including anything from CompassionAI. To facilitate this, we provide a simple Conda environment YAML file in the CompassionAI/common repo. We recommend first installing miniconda, see https://docs.conda.io/en/main/miniconda.html. We then recommend installing Mamba, see https://github.com/mamba-org/mamba.

bash Miniconda3-latest-Linux-x86_64.sh
conda install mamba -c conda-forge
cd compassionai/common
mamba env create -f env-minimal.yml -n my-env
conda activate my-env

Inference

Just install with pip:

pip install compassionai-garland

Research

Begin by installing for inference. Then install the CompassionAI data registry repo and set two environment variables:

$CAI_TEMP_PATH
$CAI_DATA_BASE_PATH

We strongly recommend setting them with conda in your virtual environment:

conda activate my-env
conda env config vars set CAI_TEMP_PATH=#directory on a mountpoint with plenty of space, does not need to be fast
conda env config vars set CAI_DATA_BASE_PATH=#absolute path to the CompassionAI data registry

Our code uses these environment variables to load datasets from the registry, output processed datasets and store training results.

Usage

Inference

This is a supporting library for our main inference repos, such as Lotsawa. You shouldn't need to use it directly.

Research

This library implements neural machine translation models from classical Tibetan to English, with experiments for other target languages as well.

  • Dataset preparation code, especially see cai_garland/data/parallel_dataset_prep.py.
  • Implementation of modified tokenizers and neural model architectures that builds on Hugging Face Transformers.
  • Training drivers to fine-tune models on tasks relevant to translation, such as translation itself or text segmentation.
  • Utility code for the above, including simple libraries of preprocessors and segmenters, as well as a translation utility class that implements the core loops of our contextual translation algorithms.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

compassionai-garland-0.1.0.tar.gz (77.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

compassionai_garland-0.1.0-py3-none-any.whl (116.1 kB view details)

Uploaded Python 3

File details

Details for the file compassionai-garland-0.1.0.tar.gz.

File metadata

  • Download URL: compassionai-garland-0.1.0.tar.gz
  • Upload date:
  • Size: 77.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.6

File hashes

Hashes for compassionai-garland-0.1.0.tar.gz
Algorithm Hash digest
SHA256 3e5a304d8b944a03fc4b414740d5db18da718f2119ce1e766d7d6c2369ed65e4
MD5 82106159a6815a87b60904988b58d4a3
BLAKE2b-256 a9d428cd4ff96a108ac312e88d2cf6a98c37baf22b0eb0ef0c8a8fe1c5ff75a0

See more details on using hashes here.

File details

Details for the file compassionai_garland-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for compassionai_garland-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e6e7e6cf8fbd9863cfa31ec79aceb0b109820e85ddff2e60f7440dc8a6691244
MD5 f9646df2542662af8e4bc044516c371f
BLAKE2b-256 181f3f75df37f1f420281c99c566dc5c20f31c04ae790092fca09f459218ac19

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page