Skip to main content

De-concatenate strings that do not have white-spaces.

Project description

Decat

thisisawesome --> ['this', 'is', 'awesome']



Decat is a Python package capable of de-concatenating strings that do not have white-spaces in them, or in other words, it allows the user to infer spaces programmatically. This is a simple utility that comes in handy with various modern Natural Language Processing(NLP) tasks such as cleaning, exploration or even manipulation of text. Zipf's Law is at the core of this project, the aim is to provide an easy interface for programmers to extract meaningful information out of deformed pieces of texts.

Get Started

Install It

>> pip install decat

Play With It

>> decat -i someweirdtext
>> ['some', 'weird', 'text']

or

>> python -m decat -i justanotherstring
>> ['just', 'another', 'string']

Use It In Your Projects

Sample Code

from decat import decat


weird_text = '“AnyfoolcanwritecodethatacomputercanunderstandGoodprogrammerswritecodethathumanscanunderstand.”–MartinFowler'
weird_text_simplified = decat(weird_text)
print(weird_text_simplified)

Console

['any', 'fool', 'can', 'write', 'code', 'that', 'a', 'computer', 'can', 'understand', 'good', 'programmers', 'write', 'code', 'that', 'humans', 'can', 'understand', 'martin', 'fowler']

Features

🪶 A light weight package, built around the features available in standard library

📚 An ever-expanding vocabulary, knows more than 300K English words

🪃 Simplistic design, allows for easy expansion to new languages and custom vocabulary sets

Dependencies

⭕️ None 🎉

Limitations

❗ Requires Python >= 3.6

❗ ️All input will be treated as lower-case

>> ATitleCaseString --> ['a', 'title', 'case', 'string']

❗️ Punctuation marks, numbers and special characters will be stripped from the input and will not be preserved in the output

>>  dummy.email1234@gmail.com --> ['dummy', 'email', 'gmail', 'com']

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

decat-1.0.3.tar.gz (2.0 MB view details)

Uploaded Source

Built Distribution

decat-1.0.3-py3-none-any.whl (2.0 MB view details)

Uploaded Python 3

File details

Details for the file decat-1.0.3.tar.gz.

File metadata

  • Download URL: decat-1.0.3.tar.gz
  • Upload date:
  • Size: 2.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.7

File hashes

Hashes for decat-1.0.3.tar.gz
Algorithm Hash digest
SHA256 f4a3c894a0d9ed5282c5c567605ce6fea354afc39250f85db02800aa7a80ff8b
MD5 5d8089595f2365e3cb9381ccca05d795
BLAKE2b-256 658915fcb3572e7ba49f98dc15ceb83108e5a85b5d6c286e4271a1c9b2bb3a8a

See more details on using hashes here.

File details

Details for the file decat-1.0.3-py3-none-any.whl.

File metadata

  • Download URL: decat-1.0.3-py3-none-any.whl
  • Upload date:
  • Size: 2.0 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.7

File hashes

Hashes for decat-1.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 26d9aa8de3e310ccaa4c669b3a4d6e6175c5258886c7f8620d055631299b2956
MD5 af491a9ec7dffa3967180a2b458cb66d
BLAKE2b-256 c55ecf9290c0fe206fc29be4d30a734b6592d0d4b2fa2cd88fabaaa3a05a032b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page