Skip to main content

hfst optimised lookup reimplemented in python. Including a wrapper to the original hfst-optimized-lookup

Project description

hfstol

travis-badge code-cov PyPI pyversions

hfst-optimized-lookup in python

pip install hfstol

All below examples are based on two .hfstol files

respectively: crk-descriptive-analyzer.hfstol crk-normative-generator.hfstol

Use

example with crk-descriptive-analyzer.hfstol :

from hfstol import HFSTOL

hfst = HFSTOL.from_file('crk-descriptive-analyzer.hfstol')

hfst.feed('niska')
# returns: 
# (('niska', '+N', '+A', '+Sg'), ('niska', '+N', '+A', '+Obv'))

hfst.feed_in_bulk(['niska', 'kinipânânaw'])
# returns: 
# {'niska': {('niska', '+N', '+A', '+Obv'), ('niska', '+N', '+A', '+Sg')}, 'kinipânânaw': {('nipâw', '+V', '+AI', '+Ind', '+Prs', '+12Pl')}}

hfst.feed_in_bulk_fast(['niska', 'kinipânânaw'], multi_process=4)
# returns:
# {'niska': {'niska+N+A+Obv', 'niska+N+A+Sg'}, 'kinipânânaw': {'nipâw+V+AI+Ind+Prs+12Pl'}}

example with crk-normative-generator.hfstol :

from hfstol import HFSTOL

hfst = HFSTOL.from_file('crk-normative-generator.hfstol')

hfst.feed('niska+N+A+Pl')
# returns: 
# (('niskak',),)

hfst.feed_in_bulk(["niska+N+A+Pl", 'nipâw+V+AI+Ind+Prs+12Pl'])
# returns: 
# {'niska+N+A+Pl': {('niskak',)}, 'nipâw+V+AI+Ind+Prs+12Pl': {('kinipânânaw',), ('kinipânaw',)}}

hfst.feed_in_bulk_fast(["niska+N+A+Pl", 'nipâw+V+AI+Ind+Prs+12Pl'], multi_process=4)
# returns:
# {'niska+N+A+Pl': {'niskak'}, 'nipâw+V+AI+Ind+Prs+12Pl': {'kinipânânaw', 'kinipânaw'}}

to see a comprehensive API behaviour including edge cases, see this test file (what if I feed('absolute garbage'))

API signatures

# HFSTOL.from_file

@classmethod
def from_file(cls, filename: Union[str, pathlib.Path]): 
    """
    :param filename: the `.hfstol` file
    :return: an `HFSTOL` instance, which you can use to convert surface forms to deep forms
    """
    pass


# HFSTOL.feed

def feed(self, surface_form: str, concat: bool = True) -> Tuple[Tuple[str, ...], ...]:
    """
    feed surface form to hfst

    :param surface_form: the surface form
    :param concat: whether to concatenate single characters

        example output for `surface_form` = 'niskak', with `crk-descriptive-analyzer.hfstol`
        - True: (('niska', '+N', '+A', '+Pl'), ('nîskâw', '+V', '+II', '+II', '+Cnj', '+Prs', '+3Sg'))
        - False: (('n', 'i', 's', 'k', 'a', '+N', '+A', '+Pl'), ('n', 'î', 's', 'k', 'â', 'w', '+V', '+II', '+II', '+Cnj', '+Prs', '+3Sg'))

        example output for `surface_form` = 'niska+N+A+Pl' with `crk-normative-generator.hfstol`
        - True: (('niskak',),)
        - False: (('n', 'i', 's', 'k', 'a', 'k'),)

        example output for `surface_form` = 'niska+N+A+Pl' with `crk-normative-generator.hfstol` (an inflection that has two spellings)
        - True: (('kinipânaw',), ('kinipânânaw',))
        -False: (('k', 'i', 'n', 'i', 'p', 'â', 'n', 'a', 'w'), ('k', 'i', 'n', 'i', 'p', 'â', 'n', 'â', 'n', 'a', 'w'))
    """
    pass

# HFSTOL.feed_in_bulk   

def feed_in_bulk(self, surface_forms: List[str], concat=True) -> Dict[str, Set[Tuple[str, ...]]]:
    """
    feed a multiple of surface forms to hfst at once

    :param surface_forms:
    :return: a dictionary with keys being each surface form fed in, values being their corresponding deep forms
    """
    pass

# HFSTOL.feed_in_bulk_fast

def feed_in_bulk_fast(self, strings: Iterable[str], multi_process: int = 1) -> Dict[str, Set[str]]:
    """
    calls `hfstol-optimized-lookup`. Evaluation is magnitudes faster. Note the generated symbols will all be all concatenated.
    e.g. instead of ['n', 'i', 's', 'k', 'a', '+N', '+A', '+Pl'] it returns ['niska+N+A+Pl']

    :keyword multi_process: Defaults to 1. Specify how many parallel processes you want to speed up computation. A rule is to have processes at most your machine core count.
    """

To Use feed_in_bulk_fast

feed_in_bulk_fast calls compiled C code, which can be 100 times faster than feed_in_bulk.

It requires hfst-optimized-lookup installed. Version 1.2 is tested to work. For linux system, installing can be as easy as sudo apt install hfst. For other systems see installation guide

If hfst-optimized-lookup is not found, calling feed_in_bulk_fast throws ImportError

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hfstol-1.2.11.tar.gz (14.3 kB view details)

Uploaded Source

Built Distribution

hfstol-1.2.11-py3-none-any.whl (12.2 kB view details)

Uploaded Python 3

File details

Details for the file hfstol-1.2.11.tar.gz.

File metadata

  • Download URL: hfstol-1.2.11.tar.gz
  • Upload date:
  • Size: 14.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.8.2

File hashes

Hashes for hfstol-1.2.11.tar.gz
Algorithm Hash digest
SHA256 e49dd03aa3d16734356e948fa02a41a838d0c666910c58b405811e4ab2f41df4
MD5 7a8118ca30d971050ea14a12f35ea691
BLAKE2b-256 d3915ee83dfd9db2518967b460c213234d069823c7243625663035b2d3c7fa0d

See more details on using hashes here.

File details

Details for the file hfstol-1.2.11-py3-none-any.whl.

File metadata

  • Download URL: hfstol-1.2.11-py3-none-any.whl
  • Upload date:
  • Size: 12.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.8.2

File hashes

Hashes for hfstol-1.2.11-py3-none-any.whl
Algorithm Hash digest
SHA256 61c6858cf8f40f0389862685a4510f5ea2c2874ceac4697770b5f6240ae2c6e7
MD5 70728290368f7b27c87ed670d2e74453
BLAKE2b-256 0d1644101af8ee63368275657d373e07d8ed8e9e54e8d959b5a50ea7273b858d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page