Skip to main content

For filtering words by letters and type.

Project description

wordconstraints

A small python module for filtering a list of words by letter inclusion and exclusion (over a whole word or at paticular positions), and by word type (e.g verb, plural noun).

About

Originally made for use in word based puzzle solvers for things like crosswords or wordle. Processing of word types heavily uses LemmInflect.

The only front facing function is find_words. To use:

pip install wordconstraints
import wordconstraints as wc
wc.find_words()

Example Uses

Wordle Filtering

This example demonstrates filtering by letter inclusion and exclusion, both over a whole word and at paticular positions. Here we have a wordle game (spoilers for 30/08/2022):

At this point we know quite a lot of information about the target word:

  • H,A,R,P,I,L,F,U and D are excluded from the word.
  • An E is included at the 4th position.
  • S,E,O and N are included somewhere in the word but those letters are not present at that position.

Using find_words we can find words that meet all these constraints:

alt text

And we find that there is only one word (in the default word list) that meets these constraints:

['onset']

And sure enough that's the answer!

Crossword Clues

These examples demonstrates the use of the universal parts of speech tags and penn tags.

General Word Type (Noun, Verb e.t.c)

Universal parts of speech tags (or upos tags) like "NOUN", "ADJ", or "VERB" can be used to get words only of that type.

Using the upos_tag "NOUN" we can further narrow down the potential words that might be the answer to the crossword clue "Melee".

Using the default word list and the information we have, plus the "NOUN" upos tag we get the following possible words ("brawl" seems like a pretty likely candidate here)

['Snows', 'blows', 'brawl', 'brawn', 'brews', 'brown', 'brows', 'chews', 'claws', 'clews', 'clown', 'crawl', 'craws', 'crews', 'crowd', 'crown', 'crows', 'draws', 'flaws', 'flows', 'frown', 'growl', 'known', 'plows', 'prawn', 'prowl', 'prows', 'scowl', 'shawl', 'shows', 'snows', 'spawn', 'stews', 'thaws', 'trawl', 'views']

More Specific Word Type (Singular vs Plural, Tense e.t.c)

Penn tags are like more specific versions of upos tags. Rather than just filtering for verbs or nouns, we can filter more specifically for things like verbs of a certain tense. This can be quite useful for situations like crossword puzzles where single word clues are conventionally given in the same tense/plurality as the answer.

Here the clue "covers" is a non 3rd person singular present verb. So we can expect the answer will be the same.

Using the corresponding penn_tag "VBP" we can narrow down the number of potential answers in the default word list from 56 to 10 ('coats' seems like the right answer here).

['chaps', 'chars', 'chats', 'clads', 'clams', 'claps', 'claws', 'coats', 'crabs', 'crams']

Full Details

The find_words function takes a few different parameters, all optional.

  • word_list
    • The list of words to filter with constraints.
    • Underscores, hyphens and apostrophes in provided strings will be removed.
    • If no word list is given filtering will be done over the nltk list of around 23,000 words. (See section 4.1 here for nltk word list docs and source)
  • num_letters
    • If an integer is provided, words must have this length.
    • If a list of integers is provided, then words must match one of the provided lengths.
  • includes
    • If provided, words must include ALL of the listed letters somewhere in the word.
    • Letters should be lowercase.
    • Letters are only counted 'once', i.e you currently can't filter specifically for words that have multiple copies of a letter.
  • excludes
    • If provided, words must not include ANY of the listed letters anywhere in the word.
    • Letters should be lowercase.
  • includes_at_idxs
    • If provided, words must include one of the listed letters at each keyed index.
    • Values should be a list containing lowercase letters.
    • Keys should be integer indexes.
  • excludes_at_idxs
    • If provided, words must exclude all of the listed letters at each keyed index.
    • Values should be a list containing lowercase letters.
    • Keys should be integer indexes.
  • upos_tag
    • If provided, words must have this universal part of speech tag.
    • Valid tags are "NOUN", "VERB", "ADJ", "ADV", "PROPN" (proper noun), and "AUX" (auxilliary verb).
    • Universal parts of speech info
    • Uses LemmInflect.
  • penn_tag
    • If provided, words must have this Penn Treebank tag.
    • Some helpful tags are "NN" and "NNS" for singular and plural nouns, and the verb tags "VBD", "VBG", "VBN", "VBP", and "VBZ" for various tenses.
    • More info on Penn Treebank tags here
    • Uses LemmInflect.
  • shorter_than
    • If provided, words must have a length which is less than provided integer.
  • longer_than
    • If provided, words must have a length which is more than provided integer.

Issues

  • Filtering of strings with upper case letters might be buggy, so its possible that filtering of proper nouns might not work correctly at the moment.

Possible Features to Add

  • Add the ability to match the type of a provided word (useful for crosswords, where for example, a plural noun clue means the answer is also a plural noun)
  • Make a more user friendly way to interact with universal parts of speech tags and penn tags. For example, a more user friendly way to get all penn tags for past tense verbs (without having to know what a past participle is)
  • Add ability to require that letter have multiple copies of a single letter. e.g the word must contain two of the letter 'e'.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wordconstraints-0.0.2.tar.gz (7.5 kB view details)

Uploaded Source

Built Distribution

wordconstraints-0.0.2-py3-none-any.whl (8.1 kB view details)

Uploaded Python 3

File details

Details for the file wordconstraints-0.0.2.tar.gz.

File metadata

  • Download URL: wordconstraints-0.0.2.tar.gz
  • Upload date:
  • Size: 7.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.0

File hashes

Hashes for wordconstraints-0.0.2.tar.gz
Algorithm Hash digest
SHA256 445f29384cb79f71cfe11c42b5676a77ee0706740fbedfd6d041c837a4d63955
MD5 a7ef177f64b385202834678ea3cd0896
BLAKE2b-256 9947f5784d5acf2c990b36cf38f1a64d9052a1154d65bddd814222f344c22993

See more details on using hashes here.

File details

Details for the file wordconstraints-0.0.2-py3-none-any.whl.

File metadata

File hashes

Hashes for wordconstraints-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 b9bcb286d313f413dd0e920a1e598ac4ce959cd18da00c3011507068437b8b16
MD5 845bd2d143215f95ac37c781b838b86d
BLAKE2b-256 08ecea59cd6aa68b6a0d5f7a399eaa7c6b8172240f0bc5183fcd8ecbcf561b30

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page