Skip to main content

hfst optimised lookup reimplemented in python. Including a wrapper to the original hfst-optimized-lookup

Project description


travis-badge code-cov PyPI pyversions

hfst-optimized-lookup in python

pip install hfstol

All below examples are based on two .hfstol files

respectively: crk-descriptive-analyzer.hfstol crk-normative-generator.hfstol


example with crk-descriptive-analyzer.hfstol :

from hfstol import HFSTOL

hfst = HFSTOL.from_file('crk-descriptive-analyzer.hfstol')

# returns: 
# (('niska', '+N', '+A', '+Sg'), ('niska', '+N', '+A', '+Obv'))

hfst.feed_in_bulk(['niska', 'kinipânânaw'])
# returns: 
# {'niska': {('niska', '+N', '+A', '+Obv'), ('niska', '+N', '+A', '+Sg')}, 'kinipânânaw': {('nipâw', '+V', '+AI', '+Ind', '+Prs', '+12Pl')}}

hfst.feed_in_bulk_fast(['niska', 'kinipânânaw'], multi_process=4)
# returns:
# {'niska': {'niska+N+A+Obv', 'niska+N+A+Sg'}, 'kinipânânaw': {'nipâw+V+AI+Ind+Prs+12Pl'}}

example with crk-normative-generator.hfstol :

from hfstol import HFSTOL

hfst = HFSTOL.from_file('crk-normative-generator.hfstol')

# returns: 
# (('niskak',),)

hfst.feed_in_bulk(["niska+N+A+Pl", 'nipâw+V+AI+Ind+Prs+12Pl'])
# returns: 
# {'niska+N+A+Pl': {('niskak',)}, 'nipâw+V+AI+Ind+Prs+12Pl': {('kinipânânaw',), ('kinipânaw',)}}

hfst.feed_in_bulk_fast(["niska+N+A+Pl", 'nipâw+V+AI+Ind+Prs+12Pl'], multi_process=4)
# returns:
# {'niska+N+A+Pl': {'niskak'}, 'nipâw+V+AI+Ind+Prs+12Pl': {'kinipânânaw', 'kinipânaw'}}

to see a comprehensive API behaviour including edge cases, see this test file (what if I feed('absolute garbage'))

API signatures

# HFSTOL.from_file

def from_file(cls, filename: Union[str, pathlib.Path]): 
    :param filename: the `.hfstol` file
    :return: an `HFSTOL` instance, which you can use to convert surface forms to deep forms

# HFSTOL.feed

def feed(self, surface_form: str, concat: bool = True) -> Tuple[Tuple[str, ...], ...]:
    feed surface form to hfst

    :param surface_form: the surface form
    :param concat: whether to concatenate single characters

        example output for `surface_form` = 'niskak', with `crk-descriptive-analyzer.hfstol`
        - True: (('niska', '+N', '+A', '+Pl'), ('nîskâw', '+V', '+II', '+II', '+Cnj', '+Prs', '+3Sg'))
        - False: (('n', 'i', 's', 'k', 'a', '+N', '+A', '+Pl'), ('n', 'î', 's', 'k', 'â', 'w', '+V', '+II', '+II', '+Cnj', '+Prs', '+3Sg'))

        example output for `surface_form` = 'niska+N+A+Pl' with `crk-normative-generator.hfstol`
        - True: (('niskak',),)
        - False: (('n', 'i', 's', 'k', 'a', 'k'),)

        example output for `surface_form` = 'niska+N+A+Pl' with `crk-normative-generator.hfstol` (an inflection that has two spellings)
        - True: (('kinipânaw',), ('kinipânânaw',))
        -False: (('k', 'i', 'n', 'i', 'p', 'â', 'n', 'a', 'w'), ('k', 'i', 'n', 'i', 'p', 'â', 'n', 'â', 'n', 'a', 'w'))

# HFSTOL.feed_in_bulk   

def feed_in_bulk(self, surface_forms: List[str], concat=True) -> Dict[str, Set[Tuple[str, ...]]]:
    feed a multiple of surface forms to hfst at once

    :param surface_forms:
    :return: a dictionary with keys being each surface form fed in, values being their corresponding deep forms

# HFSTOL.feed_in_bulk_fast

def feed_in_bulk_fast(self, strings: Iterable[str], multi_process: int = 1) -> Dict[str, Set[str]]:
    calls `hfstol-optimized-lookup`. Evaluation is magnitudes faster. Note the generated symbols will all be all concatenated.
    e.g. instead of ['n', 'i', 's', 'k', 'a', '+N', '+A', '+Pl'] it returns ['niska+N+A+Pl']

    :keyword multi_process: Defaults to 1. Specify how many parallel processes you want to speed up computation. A rule is to have processes at most your machine core count.

To Use feed_in_bulk_fast

feed_in_bulk_fast calls compiled C code, which can be 100 times faster than feed_in_bulk.

It requires hfst-optimized-lookup installed. Version 1.2 is tested to work. For linux system, installing can be as easy as sudo apt install hfst. For other systems see installation guide

If hfst-optimized-lookup is not found, calling feed_in_bulk_fast throws ImportError

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hfstol-1.2.11.tar.gz (14.3 kB view hashes)

Uploaded source

Built Distribution

hfstol-1.2.11-py3-none-any.whl (12.2 kB view hashes)

Uploaded py3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page