A command-line regex search engine for the English language
Project description
Lexitron
A command-line regex search engine for the English language.
Requirements
The only major requirement is Python.
I don't actually know which versions of Python this package will work on, I've only tested on my own system which is using Python 3.11. Any feedback about what works and doesn't would be helpful.
I did not write Lexitron to work on Windows, although it is a simple enough package that I don't see why it shouldn't.
If you try to install Lexitron and something goes wrong, let me know what your system details are and I'll try to get it fixed.
Installation
Lexitron is available on the Python Package Index (pip). To install, simply type
$ pip install lexitron
at the command line.
Once the install is complete, you can access Lexitron with the lx
command at
the terminal.
Usage
Usage syntax is
$ lx [options] expression
where expression
is a regular expression and [options]
are as follows.
option | function |
---|---|
-d |
Append start and end delimiters ^...$ to search query |
-n |
Print only the number of matches |
-u |
Search only for lowercase/common/uncapitalized words |
-U |
Search only for uppercase/proper/capitalized words |
-v |
Show version and exit |
-x |
Print unformatted output, one word per line |
Type $ lx -h
for full help text.
If you aren't familiar with regular expressions, it isn't too hard to learn the basics. There are many resources online. A good starting point is the Wikipedia article.
Output
By default, Lexitron will output a well-formatted (potentially multi-column) list of words, along with a header describing the results.
The results are separated into "proper" words (capitalized, like "France") and "common" words (lowercase, like "boat").
Using the -x
flag will return a more machine-readable output with one word
per line.
Examples
Example 1
A list of English words ending with "icide".
$ lx icide$
---------------------------------------------------------------------------
53 results for /.*icide/
0 proper ~ 53 common
---------------------------------------------------------------------------
aborticide foeticide matricide pesticide stillicide
acaricide fratricide medicide prolicide suicide
agricide fungicide menticide pulicide tyrannicide
algicide germicide miticide raticide uxoricide
aphicide giganticide molluscicide regicide vaticide
aphidicide herbicide nematicide rodenticide verbicide
bacillicide homicide ovicide scabicide vermicide
bactericide infanticide parasiticide silicide viricide
deicide insecticide parasuicide sororicide vulpicide
feticide larvicide parricide spermicide
filicide liberticide patricide sporicide
Example 2
A list of English words that contain the substring "rdb".
$ lx rdb
---------------------------------------------------------------------------
21 results for /rdb/
1 proper ~ 20 common
---------------------------------------------------------------------------
Standardbred
birdbath herdbook
birdbrain herdboy
cardboard leopardbane
hardback recordbook
hardbake standardbearer
hardball standardbred
hardbeam swordbill
hardboard thirdborough
hardboot wordbook
hardbound yardbird
Example 3
The number of lowercase English words that end in "tion".
$ lx -nxu ".*tion"
3837
(This number should be taken with a grain of salt, since no dictionary is perfect, and it depends on what you count as a valid english word, and which technical or niche jargons are included; etc etc.)
Example 4
A list of English words with the same double letter appearing twice, except
for those whose double letter is a vowel or the letter s
(to ignore
words of the form *lessness
).
$ lx "([^aeious])\1.*\1\1"
---------------------------------------------------------------------------
45 results for /([^aeious])\1.*\1\1/
9 proper ~ 36 common
---------------------------------------------------------------------------
Allhallowmas
Allhallows
Allhallowtide
Armillariella
Chancellorsville
Dullsville
Gallirallus
Hunnemannia
Llullaillaco
acciaccatura hillbilly pellmell shillyshally
bellpull huggermugger pizzazz skillfully
chiffchaff hullaballoo pralltriller snippersnapper
dillydallier jellyroll razzamatazz villanelle
dillydally kinnikinnic razzmatazz volleyball
dullsville kinnikinnick riffraff volleyballer
flibbertigibbet millefeuille rollcollar whippersnapper
granddaddy niffnaff rollerball willfully
hallalling parallelling scuttlebutt yellowbelly
Example 5
Compare the number of lowercase/uncapitalized words that end in "woman" with the number that end in "man".
$ lx -nxu ".*woman"
107
$ lx -nxu ".*(?<\!wo)man"
1145
Acknowledgements
For its dictionary, Lexitron uses the Automatically Generated Inflection Database (AGID) by Kevin Atkinson. See http://wordlist.sourceforge.net/.
License
Lexitron is licensed under GNU GPL Version 2.
Contact
Questions, bug reports, and feature requests can be filed on the Github issues tracker.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file Lexitron-2.0.4.tar.gz
.
File metadata
- Download URL: Lexitron-2.0.4.tar.gz
- Upload date:
- Size: 873.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | cdd8a7fb4ece6efb9c5253a0aaf456df22d6124b088699cb3179f3a7c62daaa9 |
|
MD5 | d80c9a786059c60809920fb48b47cce7 |
|
BLAKE2b-256 | 3dbaba3e386c319342d8c7007521e6ac3c26aef0588adb35d4986c07da575f01 |
File details
Details for the file Lexitron-2.0.4-py3-none-any.whl
.
File metadata
- Download URL: Lexitron-2.0.4-py3-none-any.whl
- Upload date:
- Size: 872.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 13c6210749054aa996bf97e4987339bd5fb509d3935272f26d2459e8d898fcbe |
|
MD5 | 7bfa1115fe3ef10fde490451d80b4bc5 |
|
BLAKE2b-256 | 8b39606aaba0079c90679f1cc076489c1953027260e6184268e578a1bca1750d |