Python library to parse Apertium stream format
Project description
# Apertium Streamparser
[![Build Status](https://travis-ci.org/apertium/streamparser.svg)](https://travis-ci.org/apertium/streamparser)
[![Coverage Status](https://coveralls.io/repos/github/apertium/streamparser/badge.svg?branch=master)](https://coveralls.io/github/apertium/streamparser?branch=master)
Python library to parse [Apertium stream format](http://wiki.apertium.org/wiki/Apertium_stream_format), generating `LexicalUnit`s.
## Installation
Streamparser is available through [PyPi](https://pypi.org/project/apertium-streamparser/):
$ pip install apertium-streamparser
$ apertium-streamparser
$^vino/vino<n><m><sg>/venir<vblex><ifi><p3><sg>$
[[SReading(baseform='vino', tags=['n', 'm', 'sg'])], [SReading(baseform='venir', tags=['vblex', 'ifi', 'p3', 'sg'])]]
Installation through PyPi will also install the `streamparser` module.
## Usage
### As a library
#### With string input
```python
>>> from streamparser import parse
>>> lexical_units = parse('^hypercholesterolemia/*hypercholesterolemia$\[\]\^\$[^ignoreme/yesreally$]^a\/s/a\/s<n><nt>$^vino/vino<n><m><sg>/venir<vblex><ifi><p3><sg>$.eefe^dímelo/decir<vblex><imp><p2><sg>+me<prn><enc><p1><mf><sg>+lo<prn><enc><p3><nt>/decir<vblex><imp><p2><sg>+me<prn><enc><p1><mf><sg>+lo<prn><enc><p3><m><sg>$')
>>> for lexical_unit in lexical_units:
print('%s (%s) → %s' % (lexical_unit.wordform, lexical_unit.knownness, lexical_unit.readings))
```
hypercholesterolemia (<class 'streamparser.unknown'>) → [[SReading(baseform='*hypercholesterolemia', tags=[])]]
a\/s (<class 'streamparser.known'>) → [[SReading(baseform='a\\/s', tags=['n', 'nt'])]]
vino (<class 'streamparser.known'>) → [[SReading(baseform='vino', tags=['n', 'm', 'sg'])], [SReading(baseform='venir', tags=['vblex', 'ifi', 'p3', 'sg'])]]
dímelo (<class 'streamparser.known'>) → [[SReading(baseform='decir', tags=['vblex', 'imp', 'p2', 'sg']), SReading(baseform='me', tags=['prn', 'enc', 'p1', 'mf', 'sg']), SReading(baseform='lo', tags=['prn', 'enc', 'p3', 'nt'])], [SReading(baseform='decir', tags=['vblex', 'imp', 'p2', 'sg']), SReading(baseform='me', tags=['prn', 'enc', 'p1', 'mf', 'sg']), SReading(baseform='lo', tags=['prn', 'enc', 'p3', 'm', 'sg'])]]
#### With file input
```python
>>> from streamparser import parse_file
>>> lexical_units = parse_file(open('~/Downloads/analyzed.txt'))
>>> for lexical_unit in lexical_units:
print('%s (%s) → %s' % (lexical_unit.wordform, lexical_unit.knownness, lexical_unit.readings))
```
Høgre (<class 'streamparser.known'>) → [[SReading(baseform='Høgre', tags=['np'])], [SReading(baseform='høgre', tags=['n', 'nt', 'sp'])], [SReading(baseform='høg', tags=['un', 'sint', 'sp', 'comp', 'adj'])], [SReading(baseform='høgre', tags=['f', 'n', 'ind', 'sg'])], [SReading(baseform='høgre', tags=['f', 'n', 'ind', 'sg'])], [SReading(baseform='høgre', tags=['sg', 'nt', 'ind', 'posi', 'adj'])], [SReading(baseform='høgre', tags=['mf', 'sg', 'ind', 'posi', 'adj'])], [SReading(baseform='høgre', tags=['un', 'ind', 'pl', 'posi', 'adj'])], [SReading(baseform='høgre', tags=['un', 'def', 'sp', 'posi', 'adj'])]]
kolonne (<class 'streamparser.known'>) → [[SReading(baseform='kolonne', tags=['m', 'n', 'ind', 'sg'])], [SReading(baseform='kolonne', tags=['m', 'n', 'ind', 'sg'])]]
Grunnprinsipp (<class 'streamparser.known'>) → [[SReading(baseform='grunnprinsipp', tags=['n', 'nt', 'ind', 'sg'])], S[Reading(baseform='grunnprinsipp', tags=['n', 'nt', 'pl', 'ind'])], [SReading(baseform='grunnprinsipp', tags=['n', 'nt', 'ind', 'sg'])], [SReading(baseform='grunnprinsipp', tags=['n', 'nt', 'pl', 'ind'])]]
7 (<class 'streamparser.known'>) → [[SReading(baseform='7', tags=['qnt', 'pl', 'det'])]]
px (<class 'streamparser.unknown'>) → []
### From the terminal
#### With standard input
```bash
$ bzcat ~/corpora/nnclean2.txt.bz2 | apertium-deshtml | lt-proc -we /usr/share/apertium/apertium-nno/nno.automorf.bin | python3 streamparser.py
[[SReading(baseform='Høgre', tags=['np'])],
[SReading(baseform='høgre', tags=['n', 'sp', 'nt'])],
[SReading(baseform='høg', tags=['un', 'sp', 'adj', 'comp', 'sint'])],
[SReading(baseform='høgre', tags=['n', 'f', 'ind', 'sg'])],
[SReading(baseform='høgre', tags=['n', 'f', 'ind', 'sg'])],
[SReading(baseform='høgre', tags=['posi', 'ind', 'adj', 'nt', 'sg'])],
[SReading(baseform='høgre', tags=['posi', 'ind', 'adj', 'mf', 'sg'])],
[SReading(baseform='høgre', tags=['posi', 'ind', 'adj', 'un', 'pl'])],
[SReading(baseform='høgre', tags=['posi', 'def', 'sp', 'adj', 'un'])]]
[[SReading(baseform='kolonne', tags=['n', 'm', 'ind', 'sg'])],
[SReading(baseform='kolonne', tags=['n', 'm', 'ind', 'sg'])]]
...
```
#### With file input in terminal
```bash
$ bzcat ~/corpora/nnclean2.txt.bz2 | apertium-deshtml | lt-proc -we /usr/share/apertium/apertium-nno/nno.automorf.bin > analyzed.txt
$ python3 streamparser.py analyzed.txt
[[SReading(baseform='Høgre', tags=['np'])],
[SReading(baseform='høgre', tags=['n', 'sp', 'nt'])],
[SReading(baseform='høg', tags=['un', 'sp', 'adj', 'comp', 'sint'])],
[SReading(baseform='høgre', tags=['n', 'f', 'ind', 'sg'])],
[SReading(baseform='høgre', tags=['n', 'f', 'ind', 'sg'])],
[SReading(baseform='høgre', tags=['posi', 'ind', 'adj', 'nt', 'sg'])],
[SReading(baseform='høgre', tags=['posi', 'ind', 'adj', 'mf', 'sg'])],
[SReading(baseform='høgre', tags=['posi', 'ind', 'adj', 'un', 'pl'])],
[SReading(baseform='høgre', tags=['posi', 'def', 'sp', 'adj', 'un'])]]
[[SReading(baseform='kolonne', tags=['n', 'm', 'ind', 'sg'])],
[SReading(baseform='kolonne', tags=['n', 'm', 'ind', 'sg'])]]
...
```
[![Build Status](https://travis-ci.org/apertium/streamparser.svg)](https://travis-ci.org/apertium/streamparser)
[![Coverage Status](https://coveralls.io/repos/github/apertium/streamparser/badge.svg?branch=master)](https://coveralls.io/github/apertium/streamparser?branch=master)
Python library to parse [Apertium stream format](http://wiki.apertium.org/wiki/Apertium_stream_format), generating `LexicalUnit`s.
## Installation
Streamparser is available through [PyPi](https://pypi.org/project/apertium-streamparser/):
$ pip install apertium-streamparser
$ apertium-streamparser
$^vino/vino<n><m><sg>/venir<vblex><ifi><p3><sg>$
[[SReading(baseform='vino', tags=['n', 'm', 'sg'])], [SReading(baseform='venir', tags=['vblex', 'ifi', 'p3', 'sg'])]]
Installation through PyPi will also install the `streamparser` module.
## Usage
### As a library
#### With string input
```python
>>> from streamparser import parse
>>> lexical_units = parse('^hypercholesterolemia/*hypercholesterolemia$\[\]\^\$[^ignoreme/yesreally$]^a\/s/a\/s<n><nt>$^vino/vino<n><m><sg>/venir<vblex><ifi><p3><sg>$.eefe^dímelo/decir<vblex><imp><p2><sg>+me<prn><enc><p1><mf><sg>+lo<prn><enc><p3><nt>/decir<vblex><imp><p2><sg>+me<prn><enc><p1><mf><sg>+lo<prn><enc><p3><m><sg>$')
>>> for lexical_unit in lexical_units:
print('%s (%s) → %s' % (lexical_unit.wordform, lexical_unit.knownness, lexical_unit.readings))
```
hypercholesterolemia (<class 'streamparser.unknown'>) → [[SReading(baseform='*hypercholesterolemia', tags=[])]]
a\/s (<class 'streamparser.known'>) → [[SReading(baseform='a\\/s', tags=['n', 'nt'])]]
vino (<class 'streamparser.known'>) → [[SReading(baseform='vino', tags=['n', 'm', 'sg'])], [SReading(baseform='venir', tags=['vblex', 'ifi', 'p3', 'sg'])]]
dímelo (<class 'streamparser.known'>) → [[SReading(baseform='decir', tags=['vblex', 'imp', 'p2', 'sg']), SReading(baseform='me', tags=['prn', 'enc', 'p1', 'mf', 'sg']), SReading(baseform='lo', tags=['prn', 'enc', 'p3', 'nt'])], [SReading(baseform='decir', tags=['vblex', 'imp', 'p2', 'sg']), SReading(baseform='me', tags=['prn', 'enc', 'p1', 'mf', 'sg']), SReading(baseform='lo', tags=['prn', 'enc', 'p3', 'm', 'sg'])]]
#### With file input
```python
>>> from streamparser import parse_file
>>> lexical_units = parse_file(open('~/Downloads/analyzed.txt'))
>>> for lexical_unit in lexical_units:
print('%s (%s) → %s' % (lexical_unit.wordform, lexical_unit.knownness, lexical_unit.readings))
```
Høgre (<class 'streamparser.known'>) → [[SReading(baseform='Høgre', tags=['np'])], [SReading(baseform='høgre', tags=['n', 'nt', 'sp'])], [SReading(baseform='høg', tags=['un', 'sint', 'sp', 'comp', 'adj'])], [SReading(baseform='høgre', tags=['f', 'n', 'ind', 'sg'])], [SReading(baseform='høgre', tags=['f', 'n', 'ind', 'sg'])], [SReading(baseform='høgre', tags=['sg', 'nt', 'ind', 'posi', 'adj'])], [SReading(baseform='høgre', tags=['mf', 'sg', 'ind', 'posi', 'adj'])], [SReading(baseform='høgre', tags=['un', 'ind', 'pl', 'posi', 'adj'])], [SReading(baseform='høgre', tags=['un', 'def', 'sp', 'posi', 'adj'])]]
kolonne (<class 'streamparser.known'>) → [[SReading(baseform='kolonne', tags=['m', 'n', 'ind', 'sg'])], [SReading(baseform='kolonne', tags=['m', 'n', 'ind', 'sg'])]]
Grunnprinsipp (<class 'streamparser.known'>) → [[SReading(baseform='grunnprinsipp', tags=['n', 'nt', 'ind', 'sg'])], S[Reading(baseform='grunnprinsipp', tags=['n', 'nt', 'pl', 'ind'])], [SReading(baseform='grunnprinsipp', tags=['n', 'nt', 'ind', 'sg'])], [SReading(baseform='grunnprinsipp', tags=['n', 'nt', 'pl', 'ind'])]]
7 (<class 'streamparser.known'>) → [[SReading(baseform='7', tags=['qnt', 'pl', 'det'])]]
px (<class 'streamparser.unknown'>) → []
### From the terminal
#### With standard input
```bash
$ bzcat ~/corpora/nnclean2.txt.bz2 | apertium-deshtml | lt-proc -we /usr/share/apertium/apertium-nno/nno.automorf.bin | python3 streamparser.py
[[SReading(baseform='Høgre', tags=['np'])],
[SReading(baseform='høgre', tags=['n', 'sp', 'nt'])],
[SReading(baseform='høg', tags=['un', 'sp', 'adj', 'comp', 'sint'])],
[SReading(baseform='høgre', tags=['n', 'f', 'ind', 'sg'])],
[SReading(baseform='høgre', tags=['n', 'f', 'ind', 'sg'])],
[SReading(baseform='høgre', tags=['posi', 'ind', 'adj', 'nt', 'sg'])],
[SReading(baseform='høgre', tags=['posi', 'ind', 'adj', 'mf', 'sg'])],
[SReading(baseform='høgre', tags=['posi', 'ind', 'adj', 'un', 'pl'])],
[SReading(baseform='høgre', tags=['posi', 'def', 'sp', 'adj', 'un'])]]
[[SReading(baseform='kolonne', tags=['n', 'm', 'ind', 'sg'])],
[SReading(baseform='kolonne', tags=['n', 'm', 'ind', 'sg'])]]
...
```
#### With file input in terminal
```bash
$ bzcat ~/corpora/nnclean2.txt.bz2 | apertium-deshtml | lt-proc -we /usr/share/apertium/apertium-nno/nno.automorf.bin > analyzed.txt
$ python3 streamparser.py analyzed.txt
[[SReading(baseform='Høgre', tags=['np'])],
[SReading(baseform='høgre', tags=['n', 'sp', 'nt'])],
[SReading(baseform='høg', tags=['un', 'sp', 'adj', 'comp', 'sint'])],
[SReading(baseform='høgre', tags=['n', 'f', 'ind', 'sg'])],
[SReading(baseform='høgre', tags=['n', 'f', 'ind', 'sg'])],
[SReading(baseform='høgre', tags=['posi', 'ind', 'adj', 'nt', 'sg'])],
[SReading(baseform='høgre', tags=['posi', 'ind', 'adj', 'mf', 'sg'])],
[SReading(baseform='høgre', tags=['posi', 'ind', 'adj', 'un', 'pl'])],
[SReading(baseform='høgre', tags=['posi', 'def', 'sp', 'adj', 'un'])]]
[[SReading(baseform='kolonne', tags=['n', 'm', 'ind', 'sg'])],
[SReading(baseform='kolonne', tags=['n', 'm', 'ind', 'sg'])]]
...
```
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for apertium-streamparser-5.0.1.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 88dd0c4b0334610092f2fe275d1b42e6ad58349c4149de0a0f623a1750d6f2b8 |
|
MD5 | 95e67adfc4a012946b824067b188d9b5 |
|
BLAKE2b-256 | 1aeb75a801d0735d7d6832efe9055272c934bc110892ab8971dc827382a20e58 |
Close
Hashes for apertium_streamparser-5.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a864c2ada76a96a0645f0c3db2b99e0ccf6d92a80cad33215c3eea71242cbd28 |
|
MD5 | cfe263db91f5718f0bf4c9f6bf82bcaf |
|
BLAKE2b-256 | 31ff73e2d3d78a050d3a594c1ebc0bae9dbe061af3ec32658e16768a7021730e |