Skip to main content

Python package for Natural Language Processing (NLP), focused on low-resource languages spoken in Mexico.

Project description

Py-Elotl

Python package for Natural Language Processing (NLP), focused on low-resource languages spoken in Mexico.

This is a project of Comunidad Elotl.

Developed by:

Requiere python>=3.8

Installation

Using pip

pip install elotl

From source

git clone https://github.com/ElotlMX/py-elotl.git
cd py-elotl
pip install -e .

Use

Working with corpus

import elotl.corpus

Listing available corpus

print("Name\t\tDescription")
list_of_corpus = elotl.corpus.list_of_corpus()
for row in list_of_corpus:
    print(row)

Output:

Name		Description
['axolotl', 'Is a Spanish-Nahuatl parallel corpus']
['tsunkua', 'Is a Spanish-otomí parallel corpus']

Loading a corpus

If a non-existent corpus is requested, a value of 0 is returned.

axolotl = elotl.corpus.load('axolotlr')
if axolotl == 0:
    print("The name entered does not correspond to any corpus")

If an existing corpus is entered, a list is returned.

axolotl = elotl.corpus.load('axolotl')
for row in axolotl:
    print(row)
[
    'Hay que adivinar: un pozo, a la mitad del cerro, te vas a encontrar.',
    'See tosaasaanil, see tosaasaanil. Tias iipan see tepeetl, iitlakotian tepeetl, tikoonextis san see aameyalli.',
    '',
    'Adivinanzas nahuas'
]

Each element of the list has four indices:

  • non_original_language
  • original_language
  • variant
  • document_name
tsunkua = elotl.corpus.load('tsunkua')
  for row in tsunkua:
      print(row[0]) # language 1
      print(row[1]) # language 2
      print(row[2]) # variant
      print(row[3]) # document
Una vez una señora se emborrachó
nándi na ra t'u̱xú bintí
Otomí del Estado de México (ots)
El otomí de toluca, Yolanda Lastra

Package structure

The following structure is a reference. As the package grows it will be better documented.

├── CODE_OF_CONDUCT.md
├── CONTRIBUTING.md
├── dist
├── docs
├── elotl                           Top-level package
    ├── corpora                     Here are the corpus data
    ├── corpus                      Subpackage to load corpus
    ├── huave                       Huave language subpackage
        └── orthography.py          Module to normalyze huave orthography and phonemas
    ├── __init__.py                 Initialize the package
    ├── nahuatl                     Nahuatl language subpackage
        └── orthography.py          Module to normalyze nahuatl orthography and phonemas
    ├── otomi                       Otomi language subpackage
        └── orthography.py          Module to normalyze otomi orthography and phonemas
    ├── __pycache__
    └── utils                       Subpackage with common functions and files
        └── fst                     Finite State Transducer functions
            └── att                 Module with static .att files
├── LICENSE
├── Makefile
├── MANIFEST.in
├── pyproject.toml
├── README.md
└── tests

Development

Requirements

  • python>=3.8
  • HFST
  • GNU make
  • poetry
    • For python packaging backend and virtualenvs

Quick build

poetry env use 3.x
poetry shell
make all

Where 3.x is your local python version. Check managing environments with poetry

Step by step

Build FSTs

Build the FSTs with make.

make fst

Create a virtual environment and activate it.

poetry env use 3.x
poetry shell

Update pip and generate distribution files.

python -m pip install --upgrade pip
poetry build

Testing the package locally

python -m pip install -e .

Send to PyPI

poetry publish

Remember to configure your PyPi credentials

License

Mozilla Public License 2.0 (MPL 2.0)

References

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

elotl-0.1.0.tar.gz (5.0 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

elotl-0.1.0-py3-none-any.whl (5.1 MB view details)

Uploaded Python 3

File details

Details for the file elotl-0.1.0.tar.gz.

File metadata

  • Download URL: elotl-0.1.0.tar.gz
  • Upload date:
  • Size: 5.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.5 CPython/3.13.1 Linux/6.13.1-arch1-1

File hashes

Hashes for elotl-0.1.0.tar.gz
Algorithm Hash digest
SHA256 a10f6128d21bc3afca4e9fd1951d5940aa16a9787cab40a8289373621e74c122
MD5 308ec9fd5189a743cfb0a909e5d19239
BLAKE2b-256 2a049adf86cefecd693a73a03d1955f365b3712a5d000af2731eab8b49bde71d

See more details on using hashes here.

File details

Details for the file elotl-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: elotl-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 5.1 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.5 CPython/3.13.1 Linux/6.13.1-arch1-1

File hashes

Hashes for elotl-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 56737ac6523b66085d7187f0048b95e37b3a13b48d6614cdc54f1f8b0e3d99c8
MD5 a66d514413c35b0e21817f6f9b3968c6
BLAKE2b-256 6da1457c7567de6c0836e793406c2fc931af9adb6b7466c5f18011553f02c460

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page