Skip to main content

De-concatenate strings that do not have white-spaces.

Project description

Decat

thisisawesome --> ['this', 'is', 'awesome']



Decat is a Python package capable of de-concatenating strings that do not have white-spaces in them, or in other words, it allows the user to infer spaces programmatically. This is a simple utility that comes in handy with various modern Natural Language Processing(NLP) tasks such as cleaning, exploration or even manipulation of text. Zipf's Law is at the core of this project, the aim is to provide an easy interface for programmers to extract meaningful information out of deformed pieces of texts.

Get Started

Install It

>> pip install decat

Play With It

>> decat -i someweirdtext
>> ['some', 'weird', 'text']

or

>> python -m decat -i justanotherstring
>> ['just', 'another', 'string']

Use It In Your Projects

Sample Code

from decat import decat


weird_text = '“AnyfoolcanwritecodethatacomputercanunderstandGoodprogrammerswritecodethathumanscanunderstand.”–MartinFowler'
weird_text_simplified = decat(weird_text)
print(weird_text_simplified)

Console

['any', 'fool', 'can', 'write', 'code', 'that', 'a', 'computer', 'can', 'understand', 'good', 'programmers', 'write', 'code', 'that', 'humans', 'can', 'understand', 'martin', 'fowler']

Features

🪶 A light weight package, built around the features available in standard library

📚 An ever-expanding vocabulary, knows more than 300K English words

🪃 Simplistic design, allows for easy expansion to new languages and custom vocabulary sets

Dependencies

⭕️ None 🎉

Limitations

❗ Requires Python >= 3.6

❗ ️All input will be treated as lower-case

>> ATitleCaseString --> ['a', 'title', 'case', 'string']

❗️ Punctuation marks, numbers and special characters will be stripped from the input and will not be preserved in the output

>>  dummy.email1234@gmail.com --> ['dummy', 'email', 'gmail', 'com']

Credits

Generic Human

Rachael Tatman

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

decat-1.0.0.tar.gz (2.0 MB view details)

Uploaded Source

Built Distribution

decat-1.0.0-py3-none-any.whl (2.0 MB view details)

Uploaded Python 3

File details

Details for the file decat-1.0.0.tar.gz.

File metadata

  • Download URL: decat-1.0.0.tar.gz
  • Upload date:
  • Size: 2.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.9.1

File hashes

Hashes for decat-1.0.0.tar.gz
Algorithm Hash digest
SHA256 238f62cc118931f938d5d91372618f78acbe878e1e3b07a4067a9a035f944cb7
MD5 1d9a8c9d4d1f96951f1849ac19ef5256
BLAKE2b-256 5d613bf7702bb37451d9353ead470b4be4d540f209eff777c43df08a3d9e5f8a

See more details on using hashes here.

File details

Details for the file decat-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: decat-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 2.0 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.9.1

File hashes

Hashes for decat-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a57dfc3afa36a5a498a3ddbc48ac028d5ea019d933cc56e4c854a1a387db9585
MD5 85fd07450f0e7b16d44201b87fc7a9ab
BLAKE2b-256 281fde1254c68d8c487935ce3bf6eaa9f4984c82ee52022068fcbf5e4f58c2d1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page