Skip to main content

A lib for Chinese text preprocessing

Project description

Chinese text preprocess

You can extract numbers, email, website, emoji, tex, and delete spaces, punctuations.

Install

>> pip install cnprep

Usage

from cnprep import Extractor
ext = Extractor(args=['email', 'number'], limit=5)
ext.extract(message)
args: option
    e.g. ['email', 'telephone'] or 'email, telephone'
    email
    telephone
    web
    QQ
    tex
    wechat
    message (without punctuation)
    blur (Ⅰ①壹...)
limit: parameter for get_number (blur)

Also, you can use ‘’ext.reset_param()’’ to reset the parameters.

Attention

The URL extractor only support ASCII

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cnprep-0.1.9.tar.gz (6.3 kB view details)

Uploaded Source

File details

Details for the file cnprep-0.1.9.tar.gz.

File metadata

  • Download URL: cnprep-0.1.9.tar.gz
  • Upload date:
  • Size: 6.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for cnprep-0.1.9.tar.gz
Algorithm Hash digest
SHA256 613180ddc03a17c0a55a0c3cd28059ce4ac2ee72aa6fdab7d7c137a70941bec2
MD5 119b822040bfb1259e0c3cb2f112c168
BLAKE2b-256 cc033b775d68f9ef8c57f0764f12aaf0812651a21f45be3245ce87ff10a9ea2e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page