Skip to main content

Treebank tokenizer for English

Project description

Penn Treebank tokenizer

This is a simple fork of the famous Penn Treebank tokenizer. It is forked from DetectorMorse via NLTK.

  • It is appropriate for English, but not other languages.
  • It is appropriate when applied one sentence at a time, but should not be applied to paragraphs or documents.

Unlike the NLTK equivalent, it has no (library or data) dependencies except the built-in re. Unlike the NLTK equivalent, it is not hostilely polymorphic.

Project details


Release history Release notifications | RSS feed

This version

0.1

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ptbtok-0.1.tar.gz (3.1 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page