Skip to main content

De-concatenate strings that do not have white-spaces.

Project description

Decat

thisisawesome --> ['this', 'is', 'awesome']



Decat is a Python package capable of de-concatenating strings that do not have white-spaces in them, or in other words, it allows the user to infer spaces programmatically. This is a simple utility that comes in handy with various modern Natural Language Processing(NLP) tasks such as cleaning, exploration or even manipulation of text. Zipf's Law is at the core of this project, the aim is to provide an easy interface for programmers to extract meaningful information out of deformed pieces of texts.

Get Started

Install It

>> pip install decat

Play With It

>> decat -i someweirdtext
>> ['some', 'weird', 'text']

or

>> python -m decat -i justanotherstring
>> ['just', 'another', 'string']

Use It In Your Projects

Sample Code

from decat import decat


weird_text = '“AnyfoolcanwritecodethatacomputercanunderstandGoodprogrammerswritecodethathumanscanunderstand.”–MartinFowler'
weird_text_simplified = decat(weird_text)
print(weird_text_simplified)

Console

['any', 'fool', 'can', 'write', 'code', 'that', 'a', 'computer', 'can', 'understand', 'good', 'programmers', 'write', 'code', 'that', 'humans', 'can', 'understand', 'martin', 'fowler']

Features

🪶 A light weight package, built around the features available in standard library

📚 An ever-expanding vocabulary, knows more than 300K English words

🪃 Simplistic design, allows for easy expansion to new languages and custom vocabulary sets

Dependencies

⭕️ None 🎉

Limitations

❗ Requires Python >= 3.6

❗ ️All input will be treated as lower-case

>> ATitleCaseString --> ['a', 'title', 'case', 'string']

❗️ Punctuation marks, numbers and special characters will be stripped from the input and will not be preserved in the output

>>  dummy.email1234@gmail.com --> ['dummy', 'email', 'gmail', 'com']

Credits

Generic Human

Rachael Tatman

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

decat-1.0.2.tar.gz (2.0 MB view details)

Uploaded Source

Built Distribution

decat-1.0.2-py3-none-any.whl (2.0 MB view details)

Uploaded Python 3

File details

Details for the file decat-1.0.2.tar.gz.

File metadata

  • Download URL: decat-1.0.2.tar.gz
  • Upload date:
  • Size: 2.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.62.3 importlib-metadata/4.11.0 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.1

File hashes

Hashes for decat-1.0.2.tar.gz
Algorithm Hash digest
SHA256 694dd4e498438243e3f78ffe7b1b20c6c046caa6264f87b7ee481f6bc059026a
MD5 a5f026caa60e59cfb50c59ba2963e0f9
BLAKE2b-256 278373e7b2e1e7d67631597c993e47cad04af04b4e1e985a3b28dbbfafe0c560

See more details on using hashes here.

File details

Details for the file decat-1.0.2-py3-none-any.whl.

File metadata

  • Download URL: decat-1.0.2-py3-none-any.whl
  • Upload date:
  • Size: 2.0 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.62.3 importlib-metadata/4.11.0 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.1

File hashes

Hashes for decat-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 1ddc25f8db26c3f24f8bda53a8301cae972aa797fc4e43b5a2c0ff31b3c7cbfa
MD5 1343c22b4fa6234eb3e760469e13f8d3
BLAKE2b-256 f37a5a303feb360fd48eea2ae67da46ce77cae61c571883c23bf3a0f908e3aab

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page