Skip to main content

Generate a suggestion deterministically based on mistyped input and possible targets.

Project description

pygensuggestions

Coverage

A backport of the _suggestions module, native to CPython since 3.12, to other versions of Python.

Quickstart

pip install pygensuggestions==1.2.0 # pip
uv pip install pygensuggestions==1.2.0 # uv

Usage

>>> import pygensuggestions # the module
>>> pygensuggestions.suggest( # the primary function
... ('foo', 'bar', 'baz'), # arg 1: the sized iterable over words from which suggestions are taken
... 'bar' # arg 2: the wrong word
... ) # notice that the exact word, if present in the sequence, is never returned
'baz'
>>> pygensuggestions.suggest(['abcd', 'efgh'], 'zyxd') # Returns None, because the target is too far from the candidates\
... # No output is produced
>>> pygensuggestions.suggest({b'red', b'blue', b'yellow'}, b'blew') # all bytes are also OK, as long as data is homogeneous
b'blue'

Command line usage

$ echo "thousand
> hundred" > moreopts.txt
$ pygensuggestions tousand million atousend @moreopts.txt --outfile res.out # read candidates from arguments and optionally a newline-delimited file
$ echo $? # check exit code; no suggestion was generated successfully if equal to 1
0
$ cat res.out # print the result
thousand

Background

The _suggestions module was implemented in C, at Modules/_suggestions.c, as part of an attempt to improve user experience by enriching the traceback dump of some errors regarding nonexistent attribute or module names closely resembling a known module or attribute, as well as apparently mistyped keywords. It contains a helper internal to the Python interpreter, as signified by the underscore-prefixed name, called _generate_suggestions, which takes an exact instance of a list as the first argument and a string as the second, and returns the string most similar to the target in the list, or None if a certain threshold determined by a Levenshtein distance-based metric with weighted move and case costs is not reached, or there are too many strings in the list.

What does this library do?

This library provides a faithful translation of that sophisticated deterministic engine to pure Python, along with a simple command-line interface to call this core function from the shell. It is incredibly simple and custom wrappers should be built upon it for it to really shine.

It also boasts maximum portability. It supports Python 3.8 and above out-of-the-box, and is implementation-agnostic. It also can take any sized iterable of candidates, which can be str or bytes as long as it is consistent with the type of the target, as opposed to typical implementations that only accept sequences.

Since this is not exactly a long-running algorithm, I decided against writing it in C, which would boil down to blatant copying of the Python source, probably fail in alternate Python realizations and suffer the same pitfalls described in the next section.

The two functions used to assist in the implementation, lev_dist and sub_cost, are also exposed in the lib submodule. One may find this unorthodox procedure derived from the well-known 'edit distance' recipe accounting for case and storing only one row of the traditional 2D dynamic programming style array to avoid the memory overhead, albeit slight, particularly valuable.

Why this module?

Indeed, alternatives to this module exist. Their shortfalls are detailed below.

The _suggestions module itself is, of course, a contender. However, it only accepts strict instances of list for the first argument, and both arguments cannot have bytes. Though it is the fastest because it is written in C, it is not public and not available on older Python versions.

While the traceback module does implement this in pure Python as a fallback, it is again in the form of an unstable, private function, and is not available on all versions of Python. Worse still, it is difficult to separate out the logic because the helper function in question is made to handle an exception traceback. Predictable; after all, it is the traceback module!

I have yet to find modules on the Python Package Index providing comparable functionality to this module. It is not to say they are too simple; in fact, the algorithms used may be overkill for some, or wrapped in unrelated logic. The routine actually used by CPython, the reference implementation of Python, is likely only accessible with such transparency here.

Situations in which suggest returns None

  1. When you pass too many candidates (>750), since this library mirrors the Python source. If you must bypass this, set lib.MAX_CANDIDATE_ITEMS to float('inf'), suppressing type checker complaints.
  2. When the length of the target string exceeds 40 characters, for the same reason. Set lib.MAX_STRING_SIZE to float('inf') to alter this behaviour.
  3. More than one-third of characters require modification for any candidate chosen.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pygensuggestions-1.2.0.tar.gz (7.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pygensuggestions-1.2.0-py3-none-any.whl (8.9 kB view details)

Uploaded Python 3

File details

Details for the file pygensuggestions-1.2.0.tar.gz.

File metadata

  • Download URL: pygensuggestions-1.2.0.tar.gz
  • Upload date:
  • Size: 7.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.17 {"installer":{"name":"uv","version":"0.11.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for pygensuggestions-1.2.0.tar.gz
Algorithm Hash digest
SHA256 792f1e9e78cdfa4bdb820edf0f582cc89d31308218ad03aac5456cc15b8b6259
MD5 31968ca05a0f77fe8c67757f6ee6ae71
BLAKE2b-256 8a95cb5aed2c73c54e680ff0b61bd3e2f0f9fcd434a26e35051d86452ff99069

See more details on using hashes here.

File details

Details for the file pygensuggestions-1.2.0-py3-none-any.whl.

File metadata

  • Download URL: pygensuggestions-1.2.0-py3-none-any.whl
  • Upload date:
  • Size: 8.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.17 {"installer":{"name":"uv","version":"0.11.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for pygensuggestions-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d1e64692ae38f9b80db6cbdbca8b0d45efb526915178041a02e4e9d7f42535a5
MD5 f3dfd64b40ba50551773455854f7c327
BLAKE2b-256 98e29d07ff3665c97a272f04685754ce068954f5bb5afb9aae323839cfa731bb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page