Skip to main content

A lib for Chinese text preprocessing

Project description

Chinese text preprocess

You can extract numbers, email, website, emoji, tex, and delete spaces, punctuations.

Install

>> pip install cnprep

Usage

from cnprep import Extractor
ext = Extractor(args=['email', 'number'], limit=5)
ext.extract(message)
args: option
    e.g. ['email', 'telephone'] or 'email, telephone'
    email
    telephone
    web
    QQ
    tex
    wechat
    message (without punctuation)
    blur (Ⅰ①壹...)
limit: parameter for get_number (blur)

Also, you can use ‘’ext.reset_param()’’ to reset the parameters.

Attention

The URL extractor only support ASCII

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cnprep-0.1.10.tar.gz (6.4 kB view details)

Uploaded Source

File details

Details for the file cnprep-0.1.10.tar.gz.

File metadata

  • Download URL: cnprep-0.1.10.tar.gz
  • Upload date:
  • Size: 6.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for cnprep-0.1.10.tar.gz
Algorithm Hash digest
SHA256 6c430c74e543cbd2d26701259ffd317c33f3dec8087fe3bf2383526b01e8332e
MD5 9548ac9edbb9eff0e550eb713b2c96b7
BLAKE2b-256 b720cc06b655c1ecff150750b06b8cf0ac65ea8480fb26ead2aa9e587a37c2f0

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page