Skip to main content

NLP Pipelines for Tagalog

Project description

calamanCy: NLP pipelines for Tagalog

calamanCy is a Tagalog natural language preprocessing framework made with spaCy. Its goal is to provide pipelines and datasets for downstream NLP tasks. This repository contains material for using calamanCy, reproduction of results, and guides on usage.

calamanCy takes inspiration from other language-specific spaCy Universe frameworks such as DaCy, huSpaCy, and graCy. The name is based from calamansi, a citrus fruit native to the Philippines and used in traditional Filipino cuisine.

🔧 Installation

To get started with calamanCy, simply install it using pip by running the following line in your terminal:

pip install calamanCy

Development

If you are developing calamanCy, first clone the repository:

git clone git@github.com:ljvmiranda921/calamanCy.git

Then, create a virtual environment and install the dependencies:

python -m venv venv
venv/bin/pip install -e .  # requires pip>=23.0
venv/bin/pip install .[dev]

# Activate the virtual environment
source venv/bin/activate

or alternatively, use make dev.

👩‍💻 Usage

To use calamanCy you first have to download either the medium, large, or transformer model. To see a list of all available models, run:

import calamancy
from model in calamancy.models():
    print(model)

# ..
# tl_calamancy_md-0.1.0
# tl_calamancy_lg-0.1.0
# tl_calamancy_trf-0.1.0

To download and load a model, run:

nlp = calamancy.load("tl_calamancy_md-0.1.0")
doc = nlp("Ako si Juan de la Cruz")

The nlp object is an instance of spaCy's Language class and you can use it as any other spaCy pipeline.

📦 Models and Datasets

calamanCy provides Tagalog models and datasets that you can use in your spaCy pipelines. You can download them directly or use the calamancy Python library to access them. The training procedure for each pipeline can be found in the models/ directory. They are further subdivided into versions. Each folder is an instance of a spaCy project.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

calamanCy-0.1.0.tar.gz (5.1 kB view details)

Uploaded Source

Built Distribution

calamanCy-0.1.0-py3-none-any.whl (5.2 kB view details)

Uploaded Python 3

File details

Details for the file calamanCy-0.1.0.tar.gz.

File metadata

  • Download URL: calamanCy-0.1.0.tar.gz
  • Upload date:
  • Size: 5.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.17

File hashes

Hashes for calamanCy-0.1.0.tar.gz
Algorithm Hash digest
SHA256 3b5f8cf49ca0adb771acebe29f3769afd890215eb20c51699274f8117cba0075
MD5 bf008999ad5ac7bc43458ff0a5e900d0
BLAKE2b-256 d93dc86dc47eda1a477b9275865577b07252405a95f6196f4efd162b68711251

See more details on using hashes here.

File details

Details for the file calamanCy-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: calamanCy-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 5.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.17

File hashes

Hashes for calamanCy-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ca9e4499c5c6188ae9c22be25735610544ff6304d30286337397cfda89101e25
MD5 7de05e20f40dc12bea02b9c2fc1f9a6b
BLAKE2b-256 1c01d3d0bc595c2541ede7bd3acba64f07a57ffe30da92a660071b0e87682b54

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page