Skip to main content

hfst optimised lookup reimplemented in python. Including a wrapper to the original hfst-optimized-lookup

Project description

hfstol

travis-badge code-cov PyPI pyversions

hfst-optimized-lookup in python

pip install hfstol

All below examples are based on two .hfstol files

respectively: crk-descriptive-analyzer.hfstol crk-normative-generator.hfstol

Use

example with crk-descriptive-analyzer.hfstol :

from hfstol import HFSTOL

hfst = HFSTOL.from_file('crk-descriptive-analyzer.hfstol')

hfst.feed('niska')
# returns: 
# (('niska', '+N', '+A', '+Sg'), ('niska', '+N', '+A', '+Obv'))

hfst.feed_in_bulk(['niska', 'kinipânânaw'])
# returns: 
# {'niska': {('niska', '+N', '+A', '+Obv'), ('niska', '+N', '+A', '+Sg')}, 'kinipânânaw': {('nipâw', '+V', '+AI', '+Ind', '+Prs', '+12Pl')}}

hfst.feed_in_bulk_fast(['niska', 'kinipânânaw'], multi_process=4)
# returns:
# {'niska': {'niska+N+A+Obv', 'niska+N+A+Sg'}, 'kinipânânaw': {'nipâw+V+AI+Ind+Prs+12Pl'}}

example with crk-normative-generator.hfstol :

from hfstol import HFSTOL

hfst = HFSTOL.from_file('crk-normative-generator.hfstol')

hfst.feed('niska+N+A+Pl')
# returns: 
# (('niskak',),)

hfst.feed_in_bulk(["niska+N+A+Pl", 'nipâw+V+AI+Ind+Prs+12Pl'])
# returns: 
# {'niska+N+A+Pl': {('niskak',)}, 'nipâw+V+AI+Ind+Prs+12Pl': {('kinipânânaw',), ('kinipânaw',)}}

hfst.feed_in_bulk_fast(["niska+N+A+Pl", 'nipâw+V+AI+Ind+Prs+12Pl'], multi_process=4)
# returns:
# {'niska+N+A+Pl': {'niskak'}, 'nipâw+V+AI+Ind+Prs+12Pl': {'kinipânânaw', 'kinipânaw'}}

to see a comprehensive API behaviour including edge cases, see this test file (what if I feed('absolute garbage'))

API signatures

# HFSTOL.from_file

@classmethod
def from_file(cls, filename: Union[str, pathlib.Path]): 
    """
    :param filename: the `.hfstol` file
    :return: an `HFSTOL` instance, which you can use to convert surface forms to deep forms
    """
    pass


# HFSTOL.feed

def feed(self, surface_form: str, concat: bool = True) -> Tuple[Tuple[str, ...], ...]:
    """
    feed surface form to hfst

    :param surface_form: the surface form
    :param concat: whether to concatenate single characters

        example output for `surface_form` = 'niskak', with `crk-descriptive-analyzer.hfstol`
        - True: (('niska', '+N', '+A', '+Pl'), ('nîskâw', '+V', '+II', '+II', '+Cnj', '+Prs', '+3Sg'))
        - False: (('n', 'i', 's', 'k', 'a', '+N', '+A', '+Pl'), ('n', 'î', 's', 'k', 'â', 'w', '+V', '+II', '+II', '+Cnj', '+Prs', '+3Sg'))

        example output for `surface_form` = 'niska+N+A+Pl' with `crk-normative-generator.hfstol`
        - True: (('niskak',),)
        - False: (('n', 'i', 's', 'k', 'a', 'k'),)

        example output for `surface_form` = 'niska+N+A+Pl' with `crk-normative-generator.hfstol` (an inflection that has two spellings)
        - True: (('kinipânaw',), ('kinipânânaw',))
        -False: (('k', 'i', 'n', 'i', 'p', 'â', 'n', 'a', 'w'), ('k', 'i', 'n', 'i', 'p', 'â', 'n', 'â', 'n', 'a', 'w'))
    """
    pass
    
# HFSTOL.feed_in_bulk   

def feed_in_bulk(self, surface_forms: List[str], concat=True) -> Dict[str, Set[Tuple[str, ...]]]:
    """
    feed a multiple of surface forms to hfst at once

    :param surface_forms:
    :return: a dictionary with keys being each surface form fed in, values being their corresponding deep forms
    """
    pass

# HFSTOL.feed_in_bulk_fast

def feed_in_bulk_fast(self, strings: Iterable[str], multi_process: int = 1) -> Dict[str, Set[str]]:
    """
    calls `hfstol-optimized-lookup`. Evaluation is magnitudes faster. Note the generated symbols will all be all concatenated.
    e.g. instead of ['n', 'i', 's', 'k', 'a', '+N', '+A', '+Pl'] it returns ['niska+N+A+Pl']

    :keyword multi_process: Defaults to 1. Specify how many parallel processes you want to speed up computation. A rule is to have processes at most your machine core count.
    """

To Use feed_in_bulk_fast

feed_in_bulk_fast calls compiled C code, which can be 100 times faster than feed_in_bulk.

It requires hfst-optimized-lookup installed. Version 1.2 is tested to work. For linux system, installing can be as easy as sudo apt install hfst. For other systems see installation guide

If hfst-optimized-lookup is not found, calling feed_in_bulk_fast throws ImportError

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for hfstol, version 1.2.7
Filename, size File type Python version Upload date Hashes
Filename, size hfstol-1.2.7.tar.gz (12.3 kB) File type Source Python version None Upload date Hashes View hashes

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page