Skip to main content

A lib for Chinese text preprocessing

Project description

https://travis-ci.org/Momingcoder/cnprep.svg?branch=master

Chinese text preprocess

You can extract numbers, email, website, emoji, tex, and delete spaces, punctuations.

Install

>> pip install cnprep

Usage

from cnprep import Extractor
ext = Extractor(args=['email', 'number'], limit=5)
ext.extract(message)
args: option
    e.g. ['email', 'telephone'] or 'email, telephone'
    email
    telephone
    web
    QQ
    tex
    wechat
    message (without punctuation)
    blur (Ⅰ①壹...)
limit: parameter for get_number (blur)

Also, you can use ‘’ext.reset_param()’’ to reset the parameters.

Attention

The URL extractor only support ASCII

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

cnprep-0.1.12-py2.py3-none-any.whl (6.6 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file cnprep-0.1.12-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for cnprep-0.1.12-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 483ae57599402fd4421ed1b436d3fe7e7a29c813bb54586d49b48b98f6ce5323
MD5 41a345c15399eb7c32d4dc6a37c3f596
BLAKE2b-256 a8f57940a7544d04cfbaf3f892e2e7b9019a60d54f5f1120916feae94bcebfd7

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page