Skip to main content

A lib for Chinese text preprocessing

Project description

Chinese text preprocess

You can extract numbers, email, website, emoji, tex, and delete spaces, punctuations.

Install

>> pip install cnprep

Usage

from cnprep import Extractor
ext = Extractor(delete=True, args=['email', 'number'], blur=True, limit=5)
ext.extract(message)
delete: delete the found info (except blur)
args: option
    e.g. ['email', 'telephone'] or 'email, telephone'
    email
    telephone
    web
    QQ
    tex
    wechat
    blur (Ⅰ①壹...)
limit: parameter for get_number (blur)

Also, you can use ‘’ext.reset_param()’’ to reset the parameters.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cnprep-0.1.0.tar.gz (4.7 kB view details)

Uploaded Source

File details

Details for the file cnprep-0.1.0.tar.gz.

File metadata

  • Download URL: cnprep-0.1.0.tar.gz
  • Upload date:
  • Size: 4.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for cnprep-0.1.0.tar.gz
Algorithm Hash digest
SHA256 13322cf4ca982792579a4d91e324f598836c54b55244936bbe25039d454da742
MD5 e6da1d20ca49b757a37e5293d69c9e6c
BLAKE2b-256 54cc1bb2efb9d2a44d5ad19b09e4d61ddb221a7e138e2cbb10961dfadeb08503

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page