Skip to main content

A library for processing Code Mixed Text. Still in development!

Project description

forthebadge made-with-python


code style: blackCompatibility

CMTT is a wrapper library that makes code-mixed text processing more efficient than ever. More documentation incoming!

Installation

pip install code-mixed-text-toolkit

Get started

How to use this library:

import code_mixed_text_toolkit.data as cmtt_data
import code_mixed_text_toolkit.preprocessing as cmtt_pp

# Loading json files
result_json = cmtt_data.load('https://world.openfoodfacts.org/api/v0/product/5060292302201.json')

# Loading csv files
result_csv = cmtt_data.load('https://gist.githubusercontent.com/rnirmal/e01acfdaf54a6f9b24e91ba4cae63518/raw/b589a5c5a851711e20c5eb28f9d54742d1fe2dc/datasets.csv')

# List all datasets available
cmtt_data.list_datasets(show_key="url")

# Download specific datasets
cmtt_data.download("openfoodfacts")
cmtt_data.download("rnirmal")

# Load and preprocess txt dataset
result_txt = cmtt_data.load('https://www.w3.org/TR/PNG/iso_8859-1.txt')
result_txt_tokenized = cmtt_pp.tokenizer.word_tokenize(result_txt)

# Search target word in txt corpus
cmtt_pp.search.search_word(result_txt, 'with', tokenize = True, width = 3)

Contributors

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

code_mixed_text_toolkit-0.3.5.tar.gz (7.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

code_mixed_text_toolkit-0.3.5-py3-none-any.whl (9.0 kB view details)

Uploaded Python 3

File details

Details for the file code_mixed_text_toolkit-0.3.5.tar.gz.

File metadata

  • Download URL: code_mixed_text_toolkit-0.3.5.tar.gz
  • Upload date:
  • Size: 7.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.13

File hashes

Hashes for code_mixed_text_toolkit-0.3.5.tar.gz
Algorithm Hash digest
SHA256 9949688ab306c7d146778a33b49be2663f9c5670bc60af60513c96fd43adf81f
MD5 c6aa7aa3ead56e22ae4578e35dda28d3
BLAKE2b-256 56abf503c734b411e3728621a21a8772f7cb6990aa47dcd6b598430a7e76e64b

See more details on using hashes here.

File details

Details for the file code_mixed_text_toolkit-0.3.5-py3-none-any.whl.

File metadata

File hashes

Hashes for code_mixed_text_toolkit-0.3.5-py3-none-any.whl
Algorithm Hash digest
SHA256 b52bb39e4a64592869008801d76f567812738a6905dd8c27db0562add78da7c3
MD5 3635b5eb91ba489b4be876e0f83f549e
BLAKE2b-256 31e0d48959fbddfc04f102e5ecc99f2f7d94b25412d079b9b76fe8b9e489412d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page