Skip to main content

For filtering words by letters and type.

Project description

wordconstraints

A small python module for filtering a list of words by letter inclusion and exclusion (over a whole word or at paticular positions), and by word type (e.g verb, plural noun).

About

Originally made for use in word based puzzle solvers for things like crosswords or wordle.

Processing of word types heavily uses LemmInflect.

Example Uses

Wordle Filtering

This example demonstrates filtering by letter inclusion and exclusion, both over a whole word and at paticular positions. Here we have a wordle game (spoilers for 30/08/2022):

At this point we know quite a lot of information about the target word:

  • H,A,R,P,I,L,F,U and D are excluded from the word.
  • An E is included at the 4th position.
  • S,E,O and N are included somewhere in the word but those letters are not present at that position.

Using find_words we can find words that meet all these constraints:

alt text

And we find that there is only one word (in the default word list) that meets these constraints:

['onset']

And sure enough that's the answer!

Crossword Clues

These examples demonstrates the use of the universal parts of speech tags and penn tags.

General Word Type (Noun, Verb e.t.c)

Universal parts of speech tags (or upos tags) like "NOUN", "ADJ", or "VERB" can be used to get words only of that type.

Using the upos_tag "NOUN" we can further narrow down the potential words that might be the answer to the crossword clue "Melee".

Using the default word list and the information we have, plus the "NOUN" upos tag we get the following possible words ("brawl" seems like a pretty likely candidate here)

['Snows', 'blows', 'brawl', 'brawn', 'brews', 'brown', 'brows', 'chews', 'claws', 'clews', 'clown', 'crawl', 'craws', 'crews', 'crowd', 'crown', 'crows', 'draws', 'flaws', 'flows', 'frown', 'growl', 'known', 'plows', 'prawn', 'prowl', 'prows', 'scowl', 'shawl', 'shows', 'snows', 'spawn', 'stews', 'thaws', 'trawl', 'views']

More Specific Word Type (Singular vs Plural, Tense e.t.c)

Penn tags are like more specific versions of upos tags. Rather than just filtering for verbs or nouns, we can filter more specifically for things like verbs of a certain tense. This can be quite useful for situations like crossword puzzles where single word clues are conventionally given in the same tense/plurality as the answer.

Here the clue "covers" is a non 3rd person singular present verb. So we can expect the answer will be the same.

Using the corresponding penn_tag "VBP" we can narrow down the number of potential answers in the default word list from 56 to 10 ('coats' seems like the right answer here).

['chaps', 'chars', 'chats', 'clads', 'clams', 'claps', 'claws', 'coats', 'crabs', 'crams']

Full Details

The only front facing function is find_words. To use:

import wordconstraints as wc
wc.find_words()

The find_words function takes a few different parameters, all optional.

  • word_list
    • The list of words to filter with constraints.
    • Underscores, hyphens and apostrophes in provided strings will be removed.
    • If no word list is given filtering will be done over the nltk list of around 23,000 words. (See section 4.1 here for nltk word list docs and source)
  • num_letters
    • If an integer is provided, words must have this length.
    • If a list of integers is provided, then words must match one of the provided lengths.
  • includes
    • If provided, words must include ALL of the listed letters somewhere in the word.
    • Letters should be lowercase.
    • Letters are only counted 'once', i.e you currently can't filter specifically for words that have multiple copies of a letter.
  • excludes
    • If provided, words must not include ANY of the listed letters anywhere in the word.
    • Letters should be lowercase.
  • includes_at_idxs
    • If provided, words must include one of the listed letters at each keyed index.
    • Values should be a list containing lowercase letters.
    • Keys should be integer indexes.
  • excludes_at_idxs
    • If provided, words must exclude all of the listed letters at each keyed index.
    • Values should be a list containing lowercase letters.
    • Keys should be integer indexes.
  • upos_tag
    • If provided, words must have this universal part of speech tag.
    • Valid tags are "NOUN", "VERB", "ADJ", "ADV", "PROPN" (proper noun), and "AUX" (auxilliary verb).
    • Universal parts of speech info
    • Uses LemmInflect.
  • penn_tag
    • If provided, words must have this Penn Treebank tag.
    • Some helpful tags are "NN" and "NNS" for singular and plural nouns, and the verb tags "VBD", "VBG", "VBN", "VBP", and "VBZ" for various tenses.
    • More info on Penn Treebank tags here
    • Uses LemmInflect.
  • shorter_than
    • If provided, words must have a length which is less than provided integer.
  • longer_than
    • If provided, words must have a length which is more than provided integer.

Issues

  • Filtering of strings with upper case letters might be buggy, so its possible that filtering of proper nouns might not work correctly at the moment.

Possible Features to Add

  • Add the ability to match the type of a provided word (useful for crosswords, where for example, a plural noun clue means the answer is also a plural noun)
  • Make a more user friendly way to interact with universal parts of speech tags and penn tags. For example, a more user friendly way to get all penn tags for past tense verbs (without having to know what a past participle is)
  • Add ability to require that letter have multiple copies of a single letter. e.g the word must contain two of the letter 'e'.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wordconstraints-0.0.1.tar.gz (7.1 kB view details)

Uploaded Source

Built Distribution

wordconstraints-0.0.1-py3-none-any.whl (7.7 kB view details)

Uploaded Python 3

File details

Details for the file wordconstraints-0.0.1.tar.gz.

File metadata

  • Download URL: wordconstraints-0.0.1.tar.gz
  • Upload date:
  • Size: 7.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.0

File hashes

Hashes for wordconstraints-0.0.1.tar.gz
Algorithm Hash digest
SHA256 fd060457e262e95d8a13938ed65385ae6e0171b1aa7a4580d83ffbb22fe37f33
MD5 d0849a8a6e9a2d0bcd1cc8608158080a
BLAKE2b-256 1022d8e3439b717934ed5174b37cee04d0c4723d788f97b4d572efd9d0a65bea

See more details on using hashes here.

File details

Details for the file wordconstraints-0.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for wordconstraints-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e74bd9520526329e5378d85d399cd8a8c644fbdb489932d7e72389d2c7a06473
MD5 71b20a197f4338251c15778b616d7f28
BLAKE2b-256 b9937fd5fb2a7e86451ba40a357609b1c5ab054e3e436dd5b81768d407109162

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page