De-concatenate strings that do not have white-spaces.
Project description
Decat
thisisawesome --> ['this', 'is', 'awesome']
Decat is a Python package capable of de-concatenating strings that do not have white-spaces in them, or in other words, it allows the user to infer spaces programmatically. This is a simple utility that comes in handy with various modern Natural Language Processing(NLP) tasks such as cleaning, exploration or even manipulation of text. Zipf's Law is at the core of this project, the aim is to provide an easy interface for programmers to extract meaningful information out of deformed pieces of texts.
Get Started
Install It
>> pip install decatPlay With It
>> decat -i someweirdtext >> ['some', 'weird', 'text']or
>> python -m decat -i justanotherstring >> ['just', 'another', 'string']Use It In Your Projects
Sample Code
from decat import decat weird_text = '“AnyfoolcanwritecodethatacomputercanunderstandGoodprogrammerswritecodethathumanscanunderstand.”–MartinFowler' weird_text_simplified = decat(weird_text) print(weird_text_simplified)Console
['any', 'fool', 'can', 'write', 'code', 'that', 'a', 'computer', 'can', 'understand', 'good', 'programmers', 'write', 'code', 'that', 'humans', 'can', 'understand', 'martin', 'fowler']
Features
🪶 A light weight package, built around the features available in standard library
📚 An ever-expanding vocabulary, knows more than 300K English words
🪃 Simplistic design, allows for easy expansion to new languages and custom vocabulary sets
Dependencies
⭕️ None 🎉
Limitations
❗ Requires Python >= 3.6
❗ ️All input will be treated as lower-case
>> ATitleCaseString --> ['a', 'title', 'case', 'string']❗️ Punctuation marks, numbers and special characters will be stripped from the input and will not be preserved in the output
>> dummy.email1234@gmail.com --> ['dummy', 'email', 'gmail', 'com']
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file decat-1.0.3.tar.gz
.
File metadata
- Download URL: decat-1.0.3.tar.gz
- Upload date:
- Size: 2.0 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f4a3c894a0d9ed5282c5c567605ce6fea354afc39250f85db02800aa7a80ff8b |
|
MD5 | 5d8089595f2365e3cb9381ccca05d795 |
|
BLAKE2b-256 | 658915fcb3572e7ba49f98dc15ceb83108e5a85b5d6c286e4271a1c9b2bb3a8a |
File details
Details for the file decat-1.0.3-py3-none-any.whl
.
File metadata
- Download URL: decat-1.0.3-py3-none-any.whl
- Upload date:
- Size: 2.0 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 26d9aa8de3e310ccaa4c669b3a4d6e6175c5258886c7f8620d055631299b2956 |
|
MD5 | af491a9ec7dffa3967180a2b458cb66d |
|
BLAKE2b-256 | c55ecf9290c0fe206fc29be4d30a734b6592d0d4b2fa2cd88fabaaa3a05a032b |