Skip to main content

Validator for BIP39 wordlists

Project description

Docs Github Actions Build Status pyversions Bitcointalk thread

bip39validator sample run

BIP39 Validator is a small program for checking BIP39 wordlists for Latin languages. It supports checking wordlists for semantic errors and implements three different tests:

  • A minimum Levenshtein distance test

  • A minimum unique prefix length

  • A maximum length test

It also has a Python API for running each test programmatically and interactively exploring the results.

Description

BIP39 Validator checks that wordlists use the best practices written in the BIP39 standard. These checks are ones that maintainers frequently ask submitters for compliance before merging the wordlist. By using this tool, you avoid having to manually verify the technical rules of the list.

Note that there is no support for validating with rules such as “Words cannot sound too similar” or “Wordlists cannot contain words from any other languages’ wordslists”. There is also no support for wordlists in non-Latin languages such as Arabic, Hebrew or CJK languages.

Installing

You can install BIP39 Validator either from PyPI or directly from source on Github.

To install from PyPI:

pip3 install bip39validator

Alternatively, to install BIP39 Validator from source, head over to the Releases page, and download the version you want to install. Unzip the package, change into the newly created directory and then run:

python3 setup.py install

Running

You invoke BIP39 Validator like this:

bip39validator [OPTIONS] {INPUT_FILE | URL_OF_TEXT_FILE}

One, and only one of INPUT_FILE and URL_OF_TEXT_FILE should be specified, where INPUT_FILE is a file in your local filesystem, while URL_OF_TEXT_FILE is an HTTP or HTTPS URL pointing to the wordlist file with a mimetype of text/plain. In both cases, the input must be a plain text file.

BIP39 Validator displays rich formatted status messages as it progresses with validation, however it is also possible to run BIP39 Validator with minimum diagnostic messages, or to log status messages to a file. The complete list of command-line arguments is below:

Command-line options

Option

Description

-d, –min-levenshtein-distance

set the minimum required Levenshtein distance between words (default: 2)

-u, –min-initial-unique

set the minimum required unique initial characters between words (default: 4)

-l, –max-length

set the maximum length of each word (default: 8)

-D, –no-levenshtein-distance

do not run the Levenshtein distance test

-U, –no-initial-unique

do not run the unique initial characters test

-L, –no-max-length

do not run the maximum length test

-o <FILE>, –output-file <FILE>

log all console output to an additional file

-a, –ascii

turn off rich text formatting and progress bars for console output

-q, –quiet

do not display details of test failures, only whether they succeeded or failed

–nosane

Suppress wordlist sanity check. This might cause other tests to fail.

-v, –version

print the version number and exit

BIP39 Validator displays which validation tests succeeded and the total number of tests that succeeded.

Using the API

BIP39 Validator comes with a powerful API for querying the result of validation tests. The most basic class provided is BIP39WordList. It is responsible for creating a word list object from a file, string buffer or even a URL. BIP39WordList objects are immutable and words can’t be changed, added or removed from the object one they are loaded. To alter the wordlist, you’d need to change it on file and then create a BIP39WordList from it again.

When a test fails, it throws a ValidationFailed exception. This contains a member called status_obj that contains a class with diagnostic information about the test that threw the exception. This object is also returned by the validation test if it succceeds, but the reason there are two different ways to capture the test state is because it’s most common for users to look at the state only if a test fails.

API Examples

Here are some of the anticipated uses of the BIP39 Validator API.

  • Validate that Levenshtein distances >= 2, then find all the word pairs with Levenshtein distance less than 2:

from bip39validator import BIP39WordList, InvalidWordList, ValidationFailed

f = open('wordlist-en.txt')
try:
  wordlist = BIP39Wordlist('English wordlist', handle=f)
  wordlist.test_lev_distance(2)
  # At this point, no word pairs have Levenshtein distance < 2.
except ValidationFailed as e:
  dists = e.status_obj.getwordpairs_lt(2)
  for wordpair in dists:
    word1 = wordpair[0]
    word2 = wordpair[1]
    # Do something with word1 and word2...
except InvalidWordList as e:
  print("Wordlist file is not well-formed")
  • Validate that Levenshtein distances >= 2, then calculate the number and percentage of word pairs with Levenshtein distance less than 2 (assume 2048-word list):

from bip39validator import BIP39WordList, InvalidWordList, ValidationFailed

f = open('wordlist-en.txt')
try:
  wordlist = BIP39Wordlist('English wordlist', handle=f)
  wordlist.test_lev_distance(2)
  # At this point, the percentage and number of
  # words fulfilling the condition are 0.
except ValidationFailed as e:
  dists = e.status_obj.getwordpairs_lt(2)
  n = len(dists)
  prct = n/(2048*2048)
except InvalidWordList as e:
  print("Wordlist file is not well-formed")
  • Validate that words are unique in at least 4 initial characters, then find all the words beginning with “str” (prefix-3 group “str”):

from bip39validator import BIP39WordList, InvalidWordList, ValidationFailed

f = open('wordlist-en.txt')
try:
  wordlist = BIP39Wordlist('English wordlist', handle=f)
  wordlist.test_initial_chars(4)
  # At this point, all words are unique in at least 4 initial characters
except ValidationFailed as e:
  words = e.status_obj.similar_wordgroup("str")
  for word in words:
    # Do something with word...
except InvalidWordList as e:
  print("Wordlist file is not well-formed")
  • Validate that words are unique in at least 4 initial characters, then calculate the number and percentage of word prefix-4 groups with at least two words in them:

from bip39validator import BIP39WordList, InvalidWordList, ValidationFailed

f = open('wordlist-en.txt')
try:
  wordlist = BIP39Wordlist('English wordlist', handle=f)
  wordlist.test_initial_chars(4)
  # At this point, the percentage and number of
  # words fulfilling the condition are 0.
except ValidationFailed as e:
  groups = e.status_obj.similar_wordgroup_all(4)
  n = sum([c for c in groups.values() if len(c) >= 2])
  denom = len(groups.values())
  perc = n/denom
except InvalidWordList as e:
  print("Wordlist file is not well-formed")
  • Validate that words are no longer than 8 characters, then find all of the words longer than 8 characters:

from bip39validator import BIP39WordList, InvalidWordList, ValidationFailed

f = open('wordlist-en.txt')
try:
  wordlist = BIP39Wordlist('English wordlist', handle=f)
  wordlist.test_max_length(8)
  # At this point, all words are no longer than 8 characters
except ValidationFailed as e:
  words = e.status_obj.getwords_gt(8)
  lines = e.status_obj.getlines_gt(8)
  for word, line in [*zip(words, lines)]:
    # Do something with word and line...
except InvalidWordList as e:
  print("Wordlist file is not well-formed")
  • Validate that words are no longer than 8 characters, then calculate the number and percentage of words longer than 8 characters:

from bip39validator import BIP39WordList, InvalidWordList, ValidationFailed

f = open('wordlist-en.txt')
try:
  wordlist = BIP39Wordlist('English wordlist', handle=f)
  wordlist.test_max_length(8)
  # At this point, the percentage and number of
  # words fulfilling the condition are 0.
except ValidationFailed as e:
  words = e.status_obj.getwords_gt(8)
  n = sum([w for w in words if len(w) > 8])
  perc = n/len(words)
except InvalidWordList as e:
  print("Wordlist file is not well-formed")

Local Development

First, clone the master branch of this repository, and then make a new virtualenv:

python3 -m venv env-bip39validator
source env-bip39validator/bin/activate

Then install the module dependencies using:

pip3 install -r requirements.txt -r dev-requirements.txt

Contributing

See CONTRIBUTING.md for details on how to contribute issues and pull requests to this project.

License

BIP39 Validator is provided under the MIT license that can be found in the LICENSE file. By using, distributing, or contributing to this project, you agree to the terms and conditions of this license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bip39validator-1.0.7.tar.gz (26.9 kB view details)

Uploaded Source

Built Distribution

bip39validator-1.0.7-py3-none-any.whl (33.4 kB view details)

Uploaded Python 3

File details

Details for the file bip39validator-1.0.7.tar.gz.

File metadata

  • Download URL: bip39validator-1.0.7.tar.gz
  • Upload date:
  • Size: 26.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.9.1

File hashes

Hashes for bip39validator-1.0.7.tar.gz
Algorithm Hash digest
SHA256 3f84e18dd15c451232007a0ddbc455b0ea495e005a3c2afe2c5933e9ca81338c
MD5 afb69c83bc40ee5ee4684d9186135cd5
BLAKE2b-256 cc0c72d6ce41df525e04a1a347dfbcf2e8a18b26785e8e9419767ca04f48e0b2

See more details on using hashes here.

File details

Details for the file bip39validator-1.0.7-py3-none-any.whl.

File metadata

  • Download URL: bip39validator-1.0.7-py3-none-any.whl
  • Upload date:
  • Size: 33.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.9.1

File hashes

Hashes for bip39validator-1.0.7-py3-none-any.whl
Algorithm Hash digest
SHA256 f1fbe67041d6e9f542e583d2b4dee8ebb85236cc335e1e66507940d4dd81b013
MD5 fdd64a4852e3d489a23c1b238edd9be8
BLAKE2b-256 a45f3f089cb5b957367a914109f869d790f56c0b7d922b0d763bb323a025ab94

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page