A pip-installable library version of hfst-optimized-lookup from https://hfst.github.io/
Project description
hfst-optimized-lookup
A pip-installable library version of hfst-optimized-lookup, originally built for itwêwina.
Install
pip install hfst-optimized-lookup
This requires that the machine running pip
have a working C++ compiler. If
enough people ask for us to upload binary ‘wheels’ so that you don’t need a
compiler at install time, we could start doing so.
Usage
Import the library:
>>> import hfst_optimized_lookup
Then load an FST!
>>> fst = hfst_optimized_lookup.TransducerFile('crk-relaxed-analyzer-for-dictionary.hfstol')
Hint: Download
crk-relaxed-analyzer-for-dictionary.hfstol
to follow along!
Do an ordinary lookup, to get a list of concatenated analyses for a wordform:
>>> fst.lookup('atim')
['atim+N+A+Sg', 'atimêw+V+TA+Imp+Imm+2Sg+3SgO']
Or get each parsed analysis from the wordform
>>> analysis = fst.lookup_lemma_with_affixes('atim')[0]
>>> analysis.lemma
'atim'
>>> analysis.suffixes
('+N', '+A', '+Sg')
You can also lookup the analyses with symbols separated:
>>> fst.lookup_symbols('atim')
[['a', 't', 'i', 'm', '+N', '+A', '+Sg'], ['a', 't', 'i', 'm', 'ê', 'w', '+V', '+TA', '+Imp', '+Imm', '+2Sg', '+3SgO']]
hfst is a great toolkit with all sorts of functionality, and is indispensable for building FSTs, but for Python applications that just want to do hfst lookups, this package may be easier to use.
The hfst-optimized-lookup
binary is actually a standalone C++ program
that doesn’t include or link against any other code in hfst, which makes it
much easier to repackage as a small Python library.
Among other benefits, this package can return lists of individual symbols, including Multichar_Symbols, so that you don’t have to guess or try to parse out which parts of the analysis are tags.
Acknowledgements
Thank you to:
- The authors of the Helsinki Finite-State Technology library and application suite
Releasing
(The script that automates the following is still a work in progress.)
Prepare release:
- Remove
.dev0
suffix from__version__
inhfst_optimized_lookup/__init__.py
- Update
CHANGELOG.md
, changing “Unreleased” to release version and adding date
Release:
- run tests
python3 setup.py sdist
- Commit, tag, and push
python3 -m twine upload dist/hfst-optimized-lookup-$VERSION.tar.gz
Prepare for further development
- Increment
__version__
, adding.dev0
suffix - Add “Unreleased” header in
CHANGELOG.md
- Commit and push
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Hashes for hfst-optimized-lookup-0.0.11.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | ebab5614b91d33953a49567ffea67f5a0f82cd867e09ea898088d4edbeaee5f3 |
|
MD5 | ad2ab8e5c1617151e6c8bd5c85377eb2 |
|
BLAKE2b-256 | 27b5d598899f3adca2b2dff24bced39627c5b7c2891e0bdf705bb69ef58d4ffe |