Skip to main content

Word and sentence tokenization.

Project description

XML cleaner

Word and sentence tokenization in Python. Tested in Python 3.4.3 and 2.7.12.

[![PyPI version](https://badge.fury.io/py/xml-cleaner.svg)](https://badge.fury.io/py/xml-cleaner) ![Jonathan Raiman, author](https://img.shields.io/badge/Author-Jonathan%20Raiman%20-blue.svg)

[![License](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE.md)

Usage

Use this package to split up strings according to sentence and word boundaries. For instance, to simply break up strings into tokens:

` tokenize("Joey was a great sailor.") #=> ["Joey ", "was ", "a ", "great ", "sailor ", "."] `

To also detect sentence boundaries:

` sent_tokenize("Cat sat mat. Cat's named Cool.", keep_whitespace=True) #=> [["Cat ", "sat ", "mat", ". "], ["Cat ", "'s ", "named ", "Cool", "."]] `

sent_tokenize can keep the whitespace as-is with the flags keep_whitespace=True and normalize_ascii=False.

Installation

` pip3 install xml_cleaner `

Testing

Run nose2.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xml-cleaner-2.0.2.tar.gz (10.6 kB view details)

Uploaded Source

Built Distribution

xml_cleaner-2.0.2-py3-none-any.whl (13.4 kB view details)

Uploaded Python 3

File details

Details for the file xml-cleaner-2.0.2.tar.gz.

File metadata

  • Download URL: xml-cleaner-2.0.2.tar.gz
  • Upload date:
  • Size: 10.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for xml-cleaner-2.0.2.tar.gz
Algorithm Hash digest
SHA256 142ec9fd0f81f1e76a387fc869edfddfeb7768bdfd0d01cb8f63f3272a7cb80b
MD5 9fe188fefeb018fa1ef2bfd52da3b44c
BLAKE2b-256 65d430a6049c4ab46d7e4e5dcb18018648bea3a28696b5a91ef6aaab279bd284

See more details on using hashes here.

File details

Details for the file xml_cleaner-2.0.2-py3-none-any.whl.

File metadata

File hashes

Hashes for xml_cleaner-2.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 275bbd9662973a129204cc52393277d0d8e95927d8e4cbc1dc30fb1268ac5a87
MD5 cecd56ea21fc91d89fba7f9fa55aee0b
BLAKE2b-256 85687e1e588cbc8d0da753ec49f2cd83bf36e188f03390bf99dec7a2fdbf0f89

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page