Skip to main content

Static Dictionaries for Rapid Wordnet Lookups

Project description

WordNet Lookup

PyPI version PyPI downloads Python versions License Code style: ruff Pre-commit Tests

Is this token a word? O(1) answer. No setup. No dependencies.

A simple question deserves a simple answer. This library gives you instant yes/no validation against 88,000 common English words from the Princeton WordNet lexicon.

Quick Start

pip install wordnet-lookup
from wordnet_lookup import is_wordnet_term

# That's it. Start validating.
is_wordnet_term('alpha')        # True
is_wordnet_term('waddling')     # True
is_wordnet_term('myxovirus')    # True
is_wordnet_term('nonexistent')  # False

# Handles plurals automatically
is_wordnet_term('computers')    # True

# Case insensitive
is_wordnet_term('ALPHA')        # True

Features

  • Zero Dependencies - Pure Python, no external packages
  • Zero I/O - No filesystem access, no database queries
  • Zero Setup - No corpus downloads or configuration
  • Microsecond Lookups - O(1) dictionary access
  • Smart Plurals - Automatically checks singular forms
  • Simple API - One function does it all

The Problem This Solves

In NLP, you frequently need to answer the question: "Is this token a real word?"

Not "what does it mean?" Not "give me synonyms." Just: is this a word?

is_wordnet_term('computer') is_wordnet_term('asdfgh')
True False

That's it. O(1) response. No ambiguity.

Why WordNet?

WordNet isn't the OED (too academic). It's not Urban Dictionary (too ephemeral). It's not a web scrape (too noisy).

It's a curated lexicon of ~88,000 common English words maintained by Princeton linguists. The kind of words that appear in newspapers, textbooks, and everyday conversation. If a token passes the WordNet test, you can be confident it's a legitimate, widely-recognized English word.

When to Use This

  • Tokenization filtering: Keep real words, discard garbage
  • Input validation: Reject nonsense in user input
  • NLP preprocessing: Filter candidates before expensive operations
  • Spell-check pre-filtering: Quick reject obvious non-words before fuzzy matching
  • Data cleaning: Identify malformed or corrupted text

What This Doesn't Do

  • No definitions, synonyms, or semantic relationships (use spaCy for that)
  • No slang, proper nouns, or recent coinages (WordNet is from 2006)
  • No spell-checking or suggestions (just yes/no)

Documentation

For detailed usage, performance benchmarks, and advanced features, see the API Documentation.

How It Works

WordNet terms are stored as MD5 hash suffixes in 256 frozenset buckets (by first two hex characters of the hash). Lookups hash the input, route to the correct bucket, and perform O(1) set membership. Modules are lazy-loaded on first access per bucket.

For the gory details, see Implementation Notes.

Development

git clone https://github.com/craigtrim/wordnet-lookup.git
cd wordnet-lookup
make install  # Install dependencies
make test     # Run tests
make all      # Full build pipeline

See API Documentation for detailed development information.

License

This package is dual-licensed:

  • Software: MIT License
  • WordNet Data: Princeton WordNet License

See LICENSE for complete terms.

Attribution

This package contains data derived from Princeton WordNet 3.0 (2006):

WordNet 3.0 Copyright 2006 by Princeton University. All rights reserved.

Note: This is a static snapshot of WordNet 3.0. The data is not automatically updated with newer WordNet releases.

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wordnet_lookup-1.2.1.tar.gz (1.7 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

wordnet_lookup-1.2.1-py3-none-any.whl (1.7 MB view details)

Uploaded Python 3

File details

Details for the file wordnet_lookup-1.2.1.tar.gz.

File metadata

  • Download URL: wordnet_lookup-1.2.1.tar.gz
  • Upload date:
  • Size: 1.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.11.9 Darwin/24.6.0

File hashes

Hashes for wordnet_lookup-1.2.1.tar.gz
Algorithm Hash digest
SHA256 3044177443e8b1abb55d6faaa27affb3a7f765cdd5f9377b0352de3dc5bb202f
MD5 5662f85b98c38b7ad9d4b713c359fd91
BLAKE2b-256 55d8414b06d2e7187a459cd614a8c3fb411fcf94d8cc59e8f28643a7d677fdd5

See more details on using hashes here.

File details

Details for the file wordnet_lookup-1.2.1-py3-none-any.whl.

File metadata

  • Download URL: wordnet_lookup-1.2.1-py3-none-any.whl
  • Upload date:
  • Size: 1.7 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.11.9 Darwin/24.6.0

File hashes

Hashes for wordnet_lookup-1.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 d35a5fb33a6e3907bf4f6e6f1b2038292a70832a568d6bad44e905e8b3ec507f
MD5 517fcf964e4e02808024d41e49b9eb28
BLAKE2b-256 773f513cff7a1d496e4676fd0ea3efd2081ac76349990f3c040fc5fb325daf7d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page