A Python package to extract hindi characters.
Project description
About
A command line based solution to pre-process hindi dataset and its cleaning. The abilities of this package includes-
- pre-processing given file into hindi characters only
- splitting paragraphs into sentences
- removal of punctuations from the dataset (if required)
Usage
extract -l -p <y/yes (to keep punctuation)>
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
textcleaner_hi-1.0.0.tar.gz
(2.5 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file textcleaner_hi-1.0.0.tar.gz.
File metadata
- Download URL: textcleaner_hi-1.0.0.tar.gz
- Upload date:
- Size: 2.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.6.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bf0d097430b76df76cdab3990f51a0521ee6ba8334fbe271804f75756ff4d3ad
|
|
| MD5 |
e4062c36b6d959523db9c21433f8e22f
|
|
| BLAKE2b-256 |
f0145cf56001d68456e6d2106006eb033a09b44881df0b602ce5d24231683eba
|
File details
Details for the file textcleaner_hi-1.0.0-py3-none-any.whl.
File metadata
- Download URL: textcleaner_hi-1.0.0-py3-none-any.whl
- Upload date:
- Size: 3.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.6.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b9cff503e0ce7c43dfa58936c2d04ad9b9c054f3522b53ee20f1582afa0709fd
|
|
| MD5 |
a179ab56f8c29a92875372f7d06497e4
|
|
| BLAKE2b-256 |
f826c5d07ae4acbd936225dcd39fad6da92f3823b914633f23f44b3d2eb511b5
|