Skip to main content

De-concatenate strings that do not have white-spaces.

Project description

Decat

thisisawesome --> ['this', 'is', 'awesome']



Decat is a Python package capable of de-concatenating strings that do not have white-spaces in them, or in other words, it allows the user to infer spaces programmatically. This is a simple utility that comes in handy with various modern Natural Language Processing(NLP) tasks such as cleaning, exploration or even manipulation of text. Zipf's Law is at the core of this project, the aim is to provide an easy interface for programmers to extract meaningful information out of deformed pieces of texts.

Get Started

Install It

>> pip install decat

Play With It

>> decat -i someweirdtext
>> ['some', 'weird', 'text']

or

>> python -m decat -i justanotherstring
>> ['just', 'another', 'string']

Use It In Your Projects

Sample Code

from decat import decat


weird_text = '“AnyfoolcanwritecodethatacomputercanunderstandGoodprogrammerswritecodethathumanscanunderstand.”–MartinFowler'
weird_text_simplified = decat(weird_text)
print(weird_text_simplified)

Console

['any', 'fool', 'can', 'write', 'code', 'that', 'a', 'computer', 'can', 'understand', 'good', 'programmers', 'write', 'code', 'that', 'humans', 'can', 'understand', 'martin', 'fowler']

Features

🪶 A light weight package, built around the features available in standard library

📚 An ever-expanding vocabulary, knows more than 300K English words

🪃 Simplistic design, allows for easy expansion to new languages and custom vocabulary sets

Dependencies

⭕️ None 🎉

Limitations

❗ Requires Python >= 3.6

❗ ️All input will be treated as lower-case

>> ATitleCaseString --> ['a', 'title', 'case', 'string']

❗️ Punctuation marks, numbers and special characters will be stripped from the input and will not be preserved in the output

>>  dummy.email1234@gmail.com --> ['dummy', 'email', 'gmail', 'com']

Credits

Generic Human

Rachael Tatman

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

decat-1.0.1.tar.gz (2.0 MB view details)

Uploaded Source

Built Distribution

decat-1.0.1-py3-none-any.whl (2.0 MB view details)

Uploaded Python 3

File details

Details for the file decat-1.0.1.tar.gz.

File metadata

  • Download URL: decat-1.0.1.tar.gz
  • Upload date:
  • Size: 2.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.62.3 importlib-metadata/4.11.0 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.1

File hashes

Hashes for decat-1.0.1.tar.gz
Algorithm Hash digest
SHA256 e82a0b72a2bcc218b1824a66c9444c0bb764bb5004cf2cc74322441a84e42e08
MD5 7fcfe261b70269f60016ccb703c9bce2
BLAKE2b-256 f89bd8a510f5b81717455da1d537e177183a42facee8371c568d72b311ae6668

See more details on using hashes here.

File details

Details for the file decat-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: decat-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 2.0 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.62.3 importlib-metadata/4.11.0 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.1

File hashes

Hashes for decat-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 015614f4555b805cd515294a8457da74b80c2f5b0cdcc5afab18a209c7beb342
MD5 83ec95d58407467c05ed824e81d8fbe5
BLAKE2b-256 34b07337d3d1f09334aa9c652ce7fff8f87d41c145e451ae62da9bc0d54fa810

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page