Skip to main content

Trinidad dialect to standard english

Project description

Caribe

This python library takes trinidadian dialect and converts it to standard english. Future updates would include the conversion of other caribbean dialects to standard english and additional natural language processing methods.


Installation

Use the below command to install package/library

pip install Caribe 


Usage

Sample 1: Checks the dialect input against existing known phrases before decoding the sentence into a more standardized version of English language. A corrector is used to check and fix small grammatical errors.

# Sample 1
import Caribe as cb
from Caribe import trinidad_decode, trinidad_decode_split, caribe_corrector

sentence = "Ah wah mi modda phone"
standard = cb.phrase_decode(sentence)
standard = cb.trinidad_decode(standard)
fixed = cb.caribe_corrector(standard)
print(fixed) #Output: I want my mother phone

Sample 2: Checks the dialect input against existing known phrases

# Sample 2 
import Caribe as cb
from Caribe import trinidad_decode, trinidad_decode_split, caribe_corrector

sentence = "Waz de scene"
standard = cb.phrase_decode(sentence)

print(standard) # Outputs: How are you

Sample 3: Checks the sentence for any grammatical errors or incomplete words and corrects it.

#Sample 3
import Caribe as cb
from Caribe import trinidad_decode, trinidad_decode_split, caribe_corrector

sentence = "I am playin fotball outsde"
standard = cb.caribe_corrector(sentence)

print(standard) # Outputs: I am playing football outside

Sample 4: Makes parts of speech tagging on dialect words.

#Sample 4
import Caribe as cb
from Caribe import trinidad_decode, trinidad_decode_split, caribe_corrector

sentence = "wat iz de time there"
analyse = cb.nlp()
output = analyse.caribe_pos(sentence)

print(output) # Outputs: ["('wat', 'PRON')", "('iz', 'VERB')", "('de', 'DET')", "('time', 'NOUN')", "('there', 'ADV')"]

Sample 5: Remove punctuation marks.

#Sample 5
import Caribe as cb
from Caribe import trinidad_decode, trinidad_decode_split, caribe_corrector

sentence = "My aunt, Shelly is a lawyer!"
analyse = cb.remove_signs(sentence)


print(analyse) # Outputs: My aunt Shelly is a lawyer

  • Additional Information

    • trinidad_decode() : Decodes the sentence as a whole string.
    • trinidad_decode_split(): Decodes the sentence word by word.
    • phrase_decode(): Decodes the sentence against known dialect phrases.
    • caribe_corrector(): Corrects grammatical errors in a sentence.
    • trinidad_encode(): Encodes a sentence to trinidadian dialect.
    • caribe_pos(): Generates parts of speech tagging on dialect.
    • pos_report(): Generates parts of speech tagging on english words.
    • remove_signs(): Takes any sentence and remove inefficient punctuation marks.

  • File Encodings on NLP datasets

Caribe introduces file encoding (Beta) in version 0.1.0. This allows a dataset or any supported filetype to be creolised in trinidad dialect.

  • Usage of File Encodings:

import Caribe as cb

convert = cb.file_encode("test.txt", "text")
# Generates a translated text file
convert = cb.file_encode("test.json", "json")
# Generates a translated json file
convert = cb.file_encode("test.csv", "csv")
# Generates a translated csv file

  • Contact

For any concerns, issues with this library or want to become a collaborator to this project.

Email: keston.smith@my.uwi.edu


CHANGELOG =======================================

Version 0.0.1 (16/09/2021)

  • Initial Release

Version 0.0.2 (16/09/2021)

  • Minor bugs fixed
  • More words added

Version 0.0.3 (16/09/2021)

  • Minor bugs fixed
  • More words added
  • phase decode method created

Version 0.0.4 (17/09/2021)

  • More words added
  • caribe corrector method created

Version 0.0.5 (17/09/2021)

  • Minor Dependency issues resolved

Version 0.0.6 (17/09/2021)

  • More Words and phrases added

Version 0.0.7 (21/09/2021)

  • Major bug fixed where individual letters in words were translated randomly
  • More words added to the corpus.

Version 0.0.8 (30/09/2021)

  • caribe_pos tagging method introduced.
  • pos_report method introduced.
  • remove_signs method introduced.

Version 0.1.0 (14/10/2021)

  • trinidad_encode method converts standard english sentence to a creolised form.
  • Caribe introduces dialect file encoding on text, json and csv files. This has the ability to creolised nlp datasets.

Version 0.1.1 (20/10/2021)

  • More words added to the corpus.

Version 0.1.2 (27/10/2021)

  • caribe_pos members converted from string to tuple.
  • More words added to the corpus.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Caribe-0.1.2.tar.gz (6.9 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page