hfst optimised lookup reimplemented in python. Including a wrapper to the original hfst-optimized-lookup
Project description
hfstol
hfst-optimized-lookup in python
pip install hfstol
All below examples are based on two .hfstol
files
respectively: crk-descriptive-analyzer.hfstol crk-normative-generator.hfstol
Use
example with crk-descriptive-analyzer.hfstol
:
from hfstol import HFSTOL hfst = HFSTOL.from_file('crk-descriptive-analyzer.hfstol') hfst.feed('niska') # returns: # (('niska', '+N', '+A', '+Sg'), ('niska', '+N', '+A', '+Obv')) hfst.feed_in_bulk(['niska', 'kinipânânaw']) # returns: # {'niska': {('niska', '+N', '+A', '+Obv'), ('niska', '+N', '+A', '+Sg')}, 'kinipânânaw': {('nipâw', '+V', '+AI', '+Ind', '+Prs', '+12Pl')}} hfst.feed_in_bulk_fast(['niska', 'kinipânânaw']) # returns: # {'niska': {'niska+N+A+Obv', 'niska+N+A+Sg'}, 'kinipânânaw': {'nipâw+V+AI+Ind+Prs+12Pl'}}
example with crk-normative-generator.hfstol
:
from hfstol import HFSTOL hfst = HFSTOL.from_file('crk-normative-generator.hfstol') hfst.feed('niska+N+A+Pl') # returns: # (('niskak',),) hfst.feed_in_bulk(["niska+N+A+Pl", 'nipâw+V+AI+Ind+Prs+12Pl']) # returns: # {'niska+N+A+Pl': {('niskak',)}, 'nipâw+V+AI+Ind+Prs+12Pl': {('kinipânânaw',), ('kinipânaw',)}} hfst.feed_in_bulk_fast(["niska+N+A+Pl", 'nipâw+V+AI+Ind+Prs+12Pl'], multi_process=4) # returns: # {'niska+N+A+Pl': {'niskak'}, 'nipâw+V+AI+Ind+Prs+12Pl': {'kinipânânaw', 'kinipânaw'}}
to see a comprehensive API behaviour including edge cases, see this test file (what if I feed('absolute garbage')
)
API signatures
# HFSTOL.from_file @classmethod def from_file(cls, filename: Union[str, pathlib.Path]): """ :param filename: the `.hfstol` file :return: an `HFSTOL` instance, which you can use to convert surface forms to deep forms """ pass # HFSTOL.feed def feed(self, surface_form: str, concat: bool = True) -> Tuple[Tuple[str, ...], ...]: """ feed surface form to hfst :param surface_form: the surface form :param concat: whether to concatenate single characters example output for `surface_form` = 'niskak', with `crk-descriptive-analyzer.hfstol` - True: (('niska', '+N', '+A', '+Pl'), ('nîskâw', '+V', '+II', '+II', '+Cnj', '+Prs', '+3Sg')) - False: (('n', 'i', 's', 'k', 'a', '+N', '+A', '+Pl'), ('n', 'î', 's', 'k', 'â', 'w', '+V', '+II', '+II', '+Cnj', '+Prs', '+3Sg')) example output for `surface_form` = 'niska+N+A+Pl' with `crk-normative-generator.hfstol` - True: (('niskak',),) - False: (('n', 'i', 's', 'k', 'a', 'k'),) example output for `surface_form` = 'niska+N+A+Pl' with `crk-normative-generator.hfstol` (an inflection that has two spellings) - True: (('kinipânaw',), ('kinipânânaw',)) -False: (('k', 'i', 'n', 'i', 'p', 'â', 'n', 'a', 'w'), ('k', 'i', 'n', 'i', 'p', 'â', 'n', 'â', 'n', 'a', 'w')) """ pass # HFSTOL.feed_in_bulk def feed_in_bulk(self, surface_forms: List[str], concat=True) -> Dict[str, Set[Tuple[str, ...]]]: """ feed a multiple of surface forms to hfst at once :param surface_forms: :return: a dictionary with keys being each surface form fed in, values being their corresponding deep forms """ pass # HFSTOL.feed_in_bulk_fast def feed_in_bulk_fast(self, strings: Iterable[str], multi_process: int = 1) -> Dict[str, Set[str]]: """ calls `hfstol-optimized-lookup`. Evaluation is magnitudes faster. Note the generated symbols will all be all concatenated. e.g. instead of ['n', 'i', 's', 'k', 'a', '+N', '+A', '+Pl'] it returns ['niska+N+A+Pl'] :keyword multi_process: Defaults to 1. Specify how many parallel processes you want to speed up computation. A rule is to have processes at most your machine core count. """
To Use feed_in_bulk_fast
feed_in_bulk_fast
calls compiled C code, which can be 100 times faster than feed_in_bulk
.
It requires hfst-optimized-lookup
installed. Version 1.2 is tested to work. For linux system, installing can be as easy as sudo apt install hfst
. For other systems see installation guide
If hfst-optimized-lookup
is not found, calling feed_in_bulk_fast
throws ImportError
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Filename, size | File type | Python version | Upload date | Hashes |
---|---|---|---|---|
Filename, size hfstol-1.2.4.tar.gz (13.6 kB) | File type Source | Python version None | Upload date | Hashes View |