Skip to main content

Python library to parse Apertium stream format

Project description

Apertium Streamparser

Build Status Coverage Status PyPI PyPI - Python Version PyPI - Implementation

Python 3 library to parse Apertium stream format, generating LexicalUnits.

Installation

Streamparser is available through PyPi:

$ pip install apertium-streamparser
$ apertium-streamparser
$^vino/vino<n><m><sg>/venir<vblex><ifi><p3><sg>$
[[SReading(baseform='vino', tags=['n', 'm', 'sg'])], [SReading(baseform='venir', tags=['vblex', 'ifi', 'p3', 'sg'])]]

Installation through PyPi will also install the streamparser module.

Usage

As a library

With string input

>>> from streamparser import parse
>>> lexical_units = parse('^hypercholesterolemia/*hypercholesterolemia$\[\]\^\$[^ignoreme/yesreally$]^a\/s/a\/s<n><nt>$^vino/vino<n><m><sg>/venir<vblex><ifi><p3><sg>$.eefe^dímelo/decir<vblex><imp><p2><sg>+me<prn><enc><p1><mf><sg>+lo<prn><enc><p3><nt>/decir<vblex><imp><p2><sg>+me<prn><enc><p1><mf><sg>+lo<prn><enc><p3><m><sg>$')
>>> for lexical_unit in lexical_units:
        print('%s (%s) → %s' % (lexical_unit.wordform, lexical_unit.knownness, lexical_unit.readings))
hypercholesterolemia (<class 'streamparser.unknown'>) → [[SReading(baseform='*hypercholesterolemia', tags=[])]]
a\/s (<class 'streamparser.known'>) → [[SReading(baseform='a\\/s', tags=['n', 'nt'])]]
vino (<class 'streamparser.known'>) → [[SReading(baseform='vino', tags=['n', 'm', 'sg'])], [SReading(baseform='venir', tags=['vblex', 'ifi', 'p3', 'sg'])]]
dímelo (<class 'streamparser.known'>) → [[SReading(baseform='decir', tags=['vblex', 'imp', 'p2', 'sg']), SReading(baseform='me', tags=['prn', 'enc', 'p1', 'mf', 'sg']), SReading(baseform='lo', tags=['prn', 'enc', 'p3', 'nt'])], [SReading(baseform='decir', tags=['vblex', 'imp', 'p2', 'sg']), SReading(baseform='me', tags=['prn', 'enc', 'p1', 'mf', 'sg']), SReading(baseform='lo', tags=['prn', 'enc', 'p3', 'm', 'sg'])]]

With file input

>>> from streamparser import parse_file
>>> lexical_units = parse_file(open('~/Downloads/analyzed.txt'))
>>> for lexical_unit in lexical_units:
        print('%s (%s) → %s' % (lexical_unit.wordform, lexical_unit.knownness, lexical_unit.readings))
Høgre (<class 'streamparser.known'>) → [[SReading(baseform='Høgre', tags=['np'])], [SReading(baseform='høgre', tags=['n', 'nt', 'sp'])], [SReading(baseform='høg', tags=['un', 'sint', 'sp', 'comp', 'adj'])], [SReading(baseform='høgre', tags=['f', 'n', 'ind', 'sg'])], [SReading(baseform='høgre', tags=['f', 'n', 'ind', 'sg'])], [SReading(baseform='høgre', tags=['sg', 'nt', 'ind', 'posi', 'adj'])], [SReading(baseform='høgre', tags=['mf', 'sg', 'ind', 'posi', 'adj'])], [SReading(baseform='høgre', tags=['un', 'ind', 'pl', 'posi', 'adj'])], [SReading(baseform='høgre', tags=['un', 'def', 'sp', 'posi', 'adj'])]]
kolonne (<class 'streamparser.known'>) → [[SReading(baseform='kolonne', tags=['m', 'n', 'ind', 'sg'])], [SReading(baseform='kolonne', tags=['m', 'n', 'ind', 'sg'])]]
Grunnprinsipp (<class 'streamparser.known'>) → [[SReading(baseform='grunnprinsipp', tags=['n', 'nt', 'ind', 'sg'])], S[Reading(baseform='grunnprinsipp', tags=['n', 'nt', 'pl', 'ind'])], [SReading(baseform='grunnprinsipp', tags=['n', 'nt', 'ind', 'sg'])], [SReading(baseform='grunnprinsipp', tags=['n', 'nt', 'pl', 'ind'])]]
7 (<class 'streamparser.known'>) → [[SReading(baseform='7', tags=['qnt', 'pl', 'det'])]]
px (<class 'streamparser.unknown'>) → []

From the terminal

With standard input

$ bzcat ~/corpora/nnclean2.txt.bz2 | apertium-deshtml | lt-proc -we /usr/share/apertium/apertium-nno/nno.automorf.bin | python3 streamparser.py
[[SReading(baseform='Høgre', tags=['np'])],
 [SReading(baseform='høgre', tags=['n', 'sp', 'nt'])],
 [SReading(baseform='høg', tags=['un', 'sp', 'adj', 'comp', 'sint'])],
 [SReading(baseform='høgre', tags=['n', 'f', 'ind', 'sg'])],
 [SReading(baseform='høgre', tags=['n', 'f', 'ind', 'sg'])],
 [SReading(baseform='høgre', tags=['posi', 'ind', 'adj', 'nt', 'sg'])],
 [SReading(baseform='høgre', tags=['posi', 'ind', 'adj', 'mf', 'sg'])],
 [SReading(baseform='høgre', tags=['posi', 'ind', 'adj', 'un', 'pl'])],
 [SReading(baseform='høgre', tags=['posi', 'def', 'sp', 'adj', 'un'])]]
[[SReading(baseform='kolonne', tags=['n', 'm', 'ind', 'sg'])],
 [SReading(baseform='kolonne', tags=['n', 'm', 'ind', 'sg'])]]
...

With file input in terminal

$ bzcat ~/corpora/nnclean2.txt.bz2 | apertium-deshtml | lt-proc -we /usr/share/apertium/apertium-nno/nno.automorf.bin > analyzed.txt
$ python3 streamparser.py analyzed.txt
[[SReading(baseform='Høgre', tags=['np'])],
 [SReading(baseform='høgre', tags=['n', 'sp', 'nt'])],
 [SReading(baseform='høg', tags=['un', 'sp', 'adj', 'comp', 'sint'])],
 [SReading(baseform='høgre', tags=['n', 'f', 'ind', 'sg'])],
 [SReading(baseform='høgre', tags=['n', 'f', 'ind', 'sg'])],
 [SReading(baseform='høgre', tags=['posi', 'ind', 'adj', 'nt', 'sg'])],
 [SReading(baseform='høgre', tags=['posi', 'ind', 'adj', 'mf', 'sg'])],
 [SReading(baseform='høgre', tags=['posi', 'ind', 'adj', 'un', 'pl'])],
 [SReading(baseform='høgre', tags=['posi', 'def', 'sp', 'adj', 'un'])]]
[[SReading(baseform='kolonne', tags=['n', 'm', 'ind', 'sg'])],
 [SReading(baseform='kolonne', tags=['n', 'm', 'ind', 'sg'])]]
...

Contributing

Streamparser uses TravisCI for continous integration. Locally, use make test to run the same checks it does. Use pip install -r requirements.txt to install the requirements required for development, e.g. linters.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for apertium-streamparser, version 5.0.2
Filename, size File type Python version Upload date Hashes
Filename, size apertium_streamparser-5.0.2-py3-none-any.whl (5.7 kB) File type Wheel Python version py3 Upload date Hashes View
Filename, size apertium-streamparser-5.0.2.tar.gz (19.3 kB) File type Source Python version None Upload date Hashes View

Supported by

Pingdom Pingdom Monitoring Google Google Object Storage and Download Analytics Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page