Skip to main content

CompassionAI Project Manas - a bidirectional Tibetan transformer

Project description

CompassionAI project Manas - classical Tibetan language understanding models

Monolingual classical literary Tibetan modeling. The current focus is on pretrained transformer models for:

  • Monolingual tasks that are useful for teaching to read Tibetan, especially word segmentation, part-of-speech tagging and named entity recognition.
  • Use as an encoder for the machine translation model.

Installation

There are two modes for this library - inference and research. We provide instructions for Linux.

  • Inference should work on MacOS and Windows mutatis mutandis.
  • We very strongly recommend doing research only on Linux. We will not provide any support to people trying to perform research tasks without installing Linux.

Virtual environment

We strongly recommend using a virtual environment for all your Python package installations, including anything from CompassionAI. To facilitate this, we provide a simple Conda environment YAML file in the CompassionAI/common repo. We recommend first installing miniconda, see https://docs.conda.io/en/main/miniconda.html. We then recommend installing Mamba, see https://github.com/mamba-org/mamba.

bash Miniconda3-latest-Linux-x86_64.sh
conda install mamba -c conda-forge
cd compassionai/common
mamba env create -f env-minimal.yml -n my-env
conda activate my-env

Inference

Just install with pip:

pip install compassionai-manas

Research

Begin by installing for inference. Then install the CompassionAI data registry repo and set two environment variables:

$CAI_TEMP_PATH
$CAI_DATA_BASE_PATH

We strongly recommend setting them with conda in your virtual environment:

conda activate my-env
conda env config vars set CAI_TEMP_PATH=#directory on a mountpoint with plenty of space, does not need to be fast
conda env config vars set CAI_DATA_BASE_PATH=#absolute path to the CompassionAI data registry

Our code uses these environment variables to load datasets from the registry, output processed datasets and store training results.

Usage

Inference

This is a supporting library for our main inference repos, such as Lotsawa. You shouldn't need to use it directly.

Research

This library implements language understanding for classical Tibetan.

  • Tokenization.
  • Pre-training code.
  • Fine-tuning on language understanding tasks, such as word segmentation and part-of-speech tagging.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

compassionai-manas-0.2.2.tar.gz (27.0 kB view details)

Uploaded Source

Built Distribution

compassionai_manas-0.2.2-py3-none-any.whl (29.6 kB view details)

Uploaded Python 3

File details

Details for the file compassionai-manas-0.2.2.tar.gz.

File metadata

  • Download URL: compassionai-manas-0.2.2.tar.gz
  • Upload date:
  • Size: 27.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.0

File hashes

Hashes for compassionai-manas-0.2.2.tar.gz
Algorithm Hash digest
SHA256 55381d4636cde28f9adce0a6185c1698510f3e6ebbb331bdd3503c61c59727c9
MD5 04abc7e45d07eed6c4ce1518679d3839
BLAKE2b-256 55c21f4e01ca4d0cbb95a72c2bdc8572194d1c173455b2659750994324681889

See more details on using hashes here.

File details

Details for the file compassionai_manas-0.2.2-py3-none-any.whl.

File metadata

File hashes

Hashes for compassionai_manas-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 973cdb0a402fa44f7d28f8230a4922e0a2a56c60a74297795883f7b7e7d74ecd
MD5 c8eb64b6b4efb7d907db26b514353a1f
BLAKE2b-256 c679303b1792789e13dd2ef5ca900ac6340bc19aad879d1287e4828a23de8f39

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page