Utility library for analysis & (pre)processing of Yorùbá text

These details have not been verified by PyPI

Project links

Development Status
- 3 - Alpha
Environment
- Console
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Operating System
Programming Language
Topic
- Software Development
- Utilities

Project description

Ìrànlọ́wọ́

PyPI - Python Version

Ìrànlọ́wọ́ is a set of utilities to analyze & process Yorùbá text for NLP tasks. The focus is on helping software developers build large, clean text datasets for (further) diacritic restoration and machine translation tasks.

Features

ADR tools

Strip all diacritics from word-types
Verify that text is NFC or NFD
Normalize a corpus (from MS Word or elsewhere) → NFC
Split long sentences on certain characters like ;,:, etc
Automatically restore correct diacritics using a pre-trained model
Find all variants of all word-type in a given corpus
Partially strip diacritics from word-types

Ready to use webpage scrapers

Bíbélì Mímọ́ (Biblica, Bible Society of Nigeria)
Yorùbá Blog
BBC Yorùbá

Corpus analysis tools

Dataset character distribution
Dataset ambuiguity statistics → Lexdif, etc for a given corpus
Dataset scoring (proximity to correctly diacritized text, LM perplexity, KL divergence)

Installation

Obtainable from the Python Package Index (PyPI) → pip install iranlowo

Example

Show computing environment and installation process

Diacritize a phrase

$ python
Python 3.7.3 (default, Mar 27 2019, 16:54:48)
[Clang 4.0.1 (tags/RELEASE_401/final)] :: Anaconda, Inc. on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import iranlowo.adr as ránlọ
>>> ránlọ.diacritize_text("lootoo ni pe ojo gbogbo ni ti ole")
PRED AVG SCORE: -0.0037, PRED PPL: 1.0037
'lóòtóọ́ ni pé ọjọ́ gbogbo ni ti olè'

Diacritize phrases, note we use ipython only because it renders nicer, easy-to-read text-colours in the terminal!

Disclaimer

This is beta software, if you pass the diacritizer out-of-domain text, English, pidgin or any other non-Yorùbá text, you will experience very marvelous, black-box results.

Since this a work-in-progress and we are steadily improving, if you encounter any problems with correctness or performance, please submit pull-requests with corrections or file an issue.

License

This project is licensed under the MIT License.

Project details

These details have not been verified by PyPI

Project links

Development Status
- 3 - Alpha
Environment
- Console
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Operating System
Programming Language
Topic
- Software Development
- Utilities

Release history Release notifications | RSS feed

This version

0.0.8.3

Jul 7, 2019

0.0.8

Jun 28, 2019

0.0.7

Jun 19, 2019

0.0.6

May 28, 2019

0.0.5.4

May 27, 2019

0.0.4

May 22, 2019

0.0.0

Apr 6, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

iranlowo-0.0.8.3.tar.gz (87.9 MB view details)

Uploaded Jul 7, 2019 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

iranlowo-0.0.8.3-py3-none-any.whl (87.9 MB view details)

Uploaded Jul 7, 2019 Python 3

File details

Details for the file iranlowo-0.0.8.3.tar.gz.

File metadata

Download URL: iranlowo-0.0.8.3.tar.gz
Upload date: Jul 7, 2019
Size: 87.9 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.6.7

File hashes

Hashes for iranlowo-0.0.8.3.tar.gz
Algorithm	Hash digest
SHA256	`ae62ea57b96b9d27bcd3e768655f7faffb3df7a1fd4f78f49db1ac9402dca619`
MD5	`22e2aa01ff4918ff850ada8fa482c76d`
BLAKE2b-256	`b0e37516f763688cc1bae9e71db3b33c53d5313e16a52caeb2a89a2774e203a1`

See more details on using hashes here.

File details

Details for the file iranlowo-0.0.8.3-py3-none-any.whl.

File metadata

Download URL: iranlowo-0.0.8.3-py3-none-any.whl
Upload date: Jul 7, 2019
Size: 87.9 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.6.7

File hashes

Hashes for iranlowo-0.0.8.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5679c3421f4092033bd86c60efeebf0273910c2b2a8c5fb3358518efb2ba72df`
MD5	`e19836c57f28ca0a929c9fd9641bd1a1`
BLAKE2b-256	`3984fb9e39f146f3128c4976b851b92d230ef0de47fab051c92f56f5e69e762a`

See more details on using hashes here.

iranlowo 0.0.8.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Ìrànlọ́wọ́

Features

ADR tools

Ready to use webpage scrapers

Corpus analysis tools

Installation

Example

Disclaimer

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes