hfst optimised lookup reimplemented in python. Including a wrapper to the original hfst-optimized-lookup
Project description
hfstol
hfst-optimized-lookup in python
pip install hfstol
All below examples are based on two .hfstol
files
respectively: crk-descriptive-analyzer.hfstol crk-normative-generator.hfstol
Use
example with crk-descriptive-analyzer.hfstol
:
from hfstol import HFSTOL
hfst = HFSTOL.from_file('crk-descriptive-analyzer.hfstol')
hfst.feed('niska')
# returns:
# (('niska', '+N', '+A', '+Sg'), ('niska', '+N', '+A', '+Obv'))
hfst.feed_in_bulk(['niska', 'kinipânânaw'])
# returns:
# {'niska': {('niska', '+N', '+A', '+Obv'), ('niska', '+N', '+A', '+Sg')}, 'kinipânânaw': {('nipâw', '+V', '+AI', '+Ind', '+Prs', '+12Pl')}}
hfst.feed_in_bulk_fast(['niska', 'kinipânânaw'], multi_process=4)
# returns:
# {'niska': {'niska+N+A+Obv', 'niska+N+A+Sg'}, 'kinipânânaw': {'nipâw+V+AI+Ind+Prs+12Pl'}}
example with crk-normative-generator.hfstol
:
from hfstol import HFSTOL
hfst = HFSTOL.from_file('crk-normative-generator.hfstol')
hfst.feed('niska+N+A+Pl')
# returns:
# (('niskak',),)
hfst.feed_in_bulk(["niska+N+A+Pl", 'nipâw+V+AI+Ind+Prs+12Pl'])
# returns:
# {'niska+N+A+Pl': {('niskak',)}, 'nipâw+V+AI+Ind+Prs+12Pl': {('kinipânânaw',), ('kinipânaw',)}}
hfst.feed_in_bulk_fast(["niska+N+A+Pl", 'nipâw+V+AI+Ind+Prs+12Pl'], multi_process=4)
# returns:
# {'niska+N+A+Pl': {'niskak'}, 'nipâw+V+AI+Ind+Prs+12Pl': {'kinipânânaw', 'kinipânaw'}}
to see a comprehensive API behaviour including edge cases, see this test file (what if I feed('absolute garbage')
)
API signatures
# HFSTOL.from_file
@classmethod
def from_file(cls, filename: Union[str, pathlib.Path]):
"""
:param filename: the `.hfstol` file
:return: an `HFSTOL` instance, which you can use to convert surface forms to deep forms
"""
pass
# HFSTOL.feed
def feed(self, surface_form: str, concat: bool = True) -> Tuple[Tuple[str, ...], ...]:
"""
feed surface form to hfst
:param surface_form: the surface form
:param concat: whether to concatenate single characters
example output for `surface_form` = 'niskak', with `crk-descriptive-analyzer.hfstol`
- True: (('niska', '+N', '+A', '+Pl'), ('nîskâw', '+V', '+II', '+II', '+Cnj', '+Prs', '+3Sg'))
- False: (('n', 'i', 's', 'k', 'a', '+N', '+A', '+Pl'), ('n', 'î', 's', 'k', 'â', 'w', '+V', '+II', '+II', '+Cnj', '+Prs', '+3Sg'))
example output for `surface_form` = 'niska+N+A+Pl' with `crk-normative-generator.hfstol`
- True: (('niskak',),)
- False: (('n', 'i', 's', 'k', 'a', 'k'),)
example output for `surface_form` = 'niska+N+A+Pl' with `crk-normative-generator.hfstol` (an inflection that has two spellings)
- True: (('kinipânaw',), ('kinipânânaw',))
-False: (('k', 'i', 'n', 'i', 'p', 'â', 'n', 'a', 'w'), ('k', 'i', 'n', 'i', 'p', 'â', 'n', 'â', 'n', 'a', 'w'))
"""
pass
# HFSTOL.feed_in_bulk
def feed_in_bulk(self, surface_forms: List[str], concat=True) -> Dict[str, Set[Tuple[str, ...]]]:
"""
feed a multiple of surface forms to hfst at once
:param surface_forms:
:return: a dictionary with keys being each surface form fed in, values being their corresponding deep forms
"""
pass
# HFSTOL.feed_in_bulk_fast
def feed_in_bulk_fast(self, strings: Iterable[str], multi_process: int = 1) -> Dict[str, Set[str]]:
"""
calls `hfstol-optimized-lookup`. Evaluation is magnitudes faster. Note the generated symbols will all be all concatenated.
e.g. instead of ['n', 'i', 's', 'k', 'a', '+N', '+A', '+Pl'] it returns ['niska+N+A+Pl']
:keyword multi_process: Defaults to 1. Specify how many parallel processes you want to speed up computation. A rule is to have processes at most your machine core count.
"""
To Use feed_in_bulk_fast
feed_in_bulk_fast
calls compiled C code, which can be 100 times faster than feed_in_bulk
.
It requires hfst-optimized-lookup
installed. Version 1.2 is tested to work. For linux system, installing can be as easy as sudo apt install hfst
. For other systems see installation guide
If hfst-optimized-lookup
is not found, calling feed_in_bulk_fast
throws ImportError
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file hfstol-1.2.11.tar.gz
.
File metadata
- Download URL: hfstol-1.2.11.tar.gz
- Upload date:
- Size: 14.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.8.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e49dd03aa3d16734356e948fa02a41a838d0c666910c58b405811e4ab2f41df4 |
|
MD5 | 7a8118ca30d971050ea14a12f35ea691 |
|
BLAKE2b-256 | d3915ee83dfd9db2518967b460c213234d069823c7243625663035b2d3c7fa0d |
File details
Details for the file hfstol-1.2.11-py3-none-any.whl
.
File metadata
- Download URL: hfstol-1.2.11-py3-none-any.whl
- Upload date:
- Size: 12.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.8.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 61c6858cf8f40f0389862685a4510f5ea2c2874ceac4697770b5f6240ae2c6e7 |
|
MD5 | 70728290368f7b27c87ed670d2e74453 |
|
BLAKE2b-256 | 0d1644101af8ee63368275657d373e07d8ed8e9e54e8d959b5a50ea7273b858d |