Skip to main content

A lib for Chinese text preprocessing

Project description

https://travis-ci.org/Momingcoder/cnprep.svg?branch=master

Chinese text preprocess

You can extract numbers, email, website, emoji, tex, and delete spaces, punctuations.

Install

>> pip install cnprep

Usage

from cnprep import Extractor
ext = Extractor(args=['email', 'number'], limit=5)
ext.extract(message)
args: option
    e.g. ['email', 'telephone'] or 'email, telephone'
    email
    telephone
    web
    QQ
    tex
    wechat
    message (without punctuation)
    blur (Ⅰ①壹...)
limit: parameter for get_number (blur)

Also, you can use ‘’ext.reset_param()’’ to reset the parameters.

Attention

The URL extractor only support ASCII

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

cnprep-0.1.11-py2.py3-none-any.whl (6.6 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file cnprep-0.1.11-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for cnprep-0.1.11-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 27dc1392d7f91c7d81b6eb95252ec1db3e5ad71cee30618ca3654a93ceb91713
MD5 aaf6dd1415323bfd70672abe98d892a5
BLAKE2b-256 f79cc92d5a0dd98e12d87a35ce281d6b20ed48bd71e6be01a17d06ab7b1e7ced

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page