Skip to main content

n-grams count

Project description

Installation

pip install pnu-ngc

ngc(1)

NAME

ngc - n-grams count

SYNOPSIS

ngc [-b|--block] [-c|--convert ARGS] [-d|--discard ARGS] [-l|--length ARG] [-p|--partial ARG] [-q|--quiet] [-s|--summary] [-t|--text] [-w|--word] [--debug] [--help|-?] [--version] [--] [filename ...]

DESCRIPTION

The ngc utility is used for counting the number of occurrences and computing the frequency of n-grams in cryptanalysis.

For n=1, the n-gram is simply a letter or character. For n=2, the n-gram is called a bigram or digraph. And so on...

The -l option is for setting the length of the n-gram (the default is 1 character), and the -b option for using a fixed-window instead of the default sliding one ("ABCD" giving "AB" and "CD" instead of "AB", "BC" and "CD").

The -c option is used to perform some prior conversions on your input data: Unicode characters removal (especially accented ones), upper to lower case conversions (or the reverse), extra spaces removal (this last one is performed after other conversions).

You can also use the -d option to discard selected categories of characters (for example if you only want to keep letters).

If you want to check your input data after these transformations, you can use the -t option to print it.

If you want to print only this, you can use the -q option.

And if you want to print some stats on the remaining characters, you can print a summary with the -s option. This summary also includes the coincidence index of your input text.

Finally, you can use the -w option to process your input word by word instead of line by line. If you selected the fixed-window -b option, you can decide what to do with partial blocks with the -p option: between keeping them as-is, discarding them, or filling them with spaces in order to have only n-grams of the same length.

The ngc utility processes all the indicated file names as one file. If none are provided, it processes the standard input, thus behaving as a filter.

OPTIONS

Options Use
-b|--block Use fixed- instead of sliding-windows blocks
-c|--convert ARGS Convert text input. A combination of:
  a / Unicode characters to ASCII (remove accents)
  l / Upper case letters to lower
  u / Lower case letters to upper
  s / Spaces-like characters to 1 space
  Warning: l and u can't be used at the same time
-d|--discard ARGS Discard characters. A combination of:
  U / Unicode characters
  u / Upper case letters
  l / Lower case letters
  L / All letters
  c / Connection symbols (apostrophe and hyphen)
  d / Digits
  p / Punctuation (.,;:?!)
  o / Other printable symbols
  s / Spaces (space, tab, return, formfeed, vtab)
  n / Non printable control characters
-l|--length ARG Length of the n-gram. Defaults to 1
-p|--partial ARGS What to do with partial blocks? One among:
  d / Discard
  k / Keep as-is (default)
  j / Keep but right-justify with spaces
-q|--quiet Don't show occurrences and frequency by n-gram
-s|--summary Show a summary of what was processed
-t|--text Show modified text input
-w|--word Respect Word boundaries (delimited by spaces)
--debug Enable debug mode
--help|-? Print usage and a short help message and exit
--version Print version and exit
-- Options processing terminator

ENVIRONMENT

The NGC_DEBUG environment variable can also be set to any value to enable debug mode.

EXIT STATUS

The ngc utility exits 0 on success, and >0 if an error occurs.

SEE ALSO

wc(1), caesar(1), Frequency analysis, Index of coincidence

STANDARDS

The ngc utility is not a standard UNIX/POSIX command.

It tries to follow the PEP 8 style guide for Python code.

HISTORY

This utility was made for The PNU project, while playing with a reimplementation of the caesar(1) utility.

LICENSE

This utility is available under the 3-clause BSD license.

AUTHORS

Hubert Tournier

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pnu-ngc-1.0.1.tar.gz (12.7 kB view details)

Uploaded Source

Built Distribution

pnu_ngc-1.0.1-py3-none-any.whl (11.2 kB view details)

Uploaded Python 3

File details

Details for the file pnu-ngc-1.0.1.tar.gz.

File metadata

  • Download URL: pnu-ngc-1.0.1.tar.gz
  • Upload date:
  • Size: 12.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.1 CPython/3.8.10

File hashes

Hashes for pnu-ngc-1.0.1.tar.gz
Algorithm Hash digest
SHA256 e2b420b284cca14d42798a61048d145c1aa81c8927b5882ebc8ad57fd73a9aa2
MD5 0c025ba765e0b623e612705010fc7c23
BLAKE2b-256 7723ec703d4c6ad1c61b7afdcfe05b3ad2e4188984e711e05596a349ba79de44

See more details on using hashes here.

File details

Details for the file pnu_ngc-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: pnu_ngc-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 11.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.1 CPython/3.8.10

File hashes

Hashes for pnu_ngc-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 100537d7b79a338620deaa4e64e43fc69c2a63423a6296fa11121f7ca6911f62
MD5 62deb6c9270f44d98e620a595a0e7e7c
BLAKE2b-256 014b10cf07cb928371d3f7de8a6c400907512cdfccf7e7d001647186bd68a497

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page