Skip to main content

prepare text for statistical processing

Project description

Installation

pip install pnu-prep

PREP(1)

NAME

prep - prepare text for statistical processing

SYNOPSIS

prep [-a|--ascii] [-d|--number] [-h|--hyphen] [-i|--ignore FILE] [-o|--only FILE] [-p|--ponctuate] [--debug] [--help|-?] [--version] [--] [file] [...]

DESCRIPTION

prep reads each file in sequence and writes it on the standard output, one lowercase `word' per line. A word is a string of alphabetic characters and embedded apostrophes, delimited by space or punctuation. Hyphenated words are broken apart; hyphens at the end of lines are removed and the hyphenated parts are joined. Strings of digits are discarded.

When no files are given as arguments, standard input is read (until a Control-D (Unix) or Control-Z (Windows) character is sent).

The following option letters may appear in any order:

OPTIONS

Options Use
-a|--ascii Try to convert Unicode letters to ASCII.
-d|--number Print the word number (in the input stream) with each word.
-h|--hyphen Don't break words on hyphens.
-i|--ignore Take the next file as an `ignore' file. These words will not appear in the output. (They will be counted, for purposes of the -d numbering.)
-o|--only Take the next file as an `only' file. Only these words will appear in the output. (All other words will also be counted for the -d numbering.)
-p|--ponctuate Include punctuation marks (single nonalphanumeric characters from the "!(),.:;?" set) as separate output lines. The punctuation marks are not counted for the -d numbering.
--debug Enable debug mode
--help|-? Print usage and a short help message and exit
--version Print version and exit
-- Options processing terminator

FILES

Ignore and only files contain words, one per line.

The file /usr/local/etc/eign was originally provided in /usr/lib as an example or default ignore file.

EXIT STATUS

The prep utility exits 0 on success, and >0 if an error occurs.

SEE ALSO

deroff(1)

STANDARDS

The prep utility is a deprecated UNIX 7th edition command (it also appeared in Unix V7M, Ultrix 3.1, 2.9BSD and 2.11BSD).

Our implementation tries to follow the PEP 8 style guide for Python code.

PORTABILITY

Tested OK under Windows.

HISTORY

This utility was made for the PNU project, out of historical curiosity and for fun, though it doesn't seem very useful...

Some features were added compared to the original command:

  • Unicode letters are now supported by default (the original command predated Unicode by 12 years).
  • It is now possible to use the -i and -o options at the same time.
  • The -h option was added to avoid breaking word on hyphens, which makes sense in French.
  • The -a option was added to try to convert Unicode accented letters to their ASCII equivalent.

Several bugs from the original prep command were corrected:

  • A display bug on hyphenated words inside a line when used with the combined -d and -p options.
  • A bug with lines starting by an apostrophe.
  • A bug with the character following an apostrophe.

LICENSE

This utility is available under the 3-clause BSD license.

AUTHORS

Hubert Tournier

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pnu_prep-1.0.2.tar.gz (9.8 kB view details)

Uploaded Source

Built Distribution

pnu_prep-1.0.2-py3-none-any.whl (10.0 kB view details)

Uploaded Python 3

File details

Details for the file pnu_prep-1.0.2.tar.gz.

File metadata

  • Download URL: pnu_prep-1.0.2.tar.gz
  • Upload date:
  • Size: 9.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.1 CPython/3.8.10

File hashes

Hashes for pnu_prep-1.0.2.tar.gz
Algorithm Hash digest
SHA256 c00400c2f32b5c04ba76c73e0161a68982f6c35d3c43d9bcb751b24f9a8ac946
MD5 91d3f25ce199de72677cc25885dae4bd
BLAKE2b-256 b039afd9bf0d81462e607c85f27333d0ad03f75a996aacd301db665846f46a1f

See more details on using hashes here.

File details

Details for the file pnu_prep-1.0.2-py3-none-any.whl.

File metadata

  • Download URL: pnu_prep-1.0.2-py3-none-any.whl
  • Upload date:
  • Size: 10.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.1 CPython/3.8.10

File hashes

Hashes for pnu_prep-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 02f99331c371e8c027b5b9f66bc986f96afaf40c6be2bbd93f918d1b9c680f3f
MD5 7c3d0647f2df3d385add20c7f826c21c
BLAKE2b-256 a979884f6c0dbd6bd0a455fe941e436ada087dfa6811e3ec74efcb37f4d181f7

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page