Skip to main content

Rule-based facts extraction for Russian language

Project description

CI codecov

Yargy is an Earley parser similar to Tomita parser. Yargy uses rules and dictionaries to extract structured information from Russian texts.

Install

Yargy supports Python 3.5+, PyPy 3, depends only on Pymorphy2.

$ pip install yargy

Usage

from yargy import Parser, rule, and_, not_
from yargy.interpretation import fact
from yargy.predicates import gram
from yargy.relations import gnc_relation
from yargy.pipelines import morph_pipeline


Name = fact(
    'Name',
    ['first', 'last'],
)
Person = fact(
    'Person',
    ['position', 'name']
)

LAST = and_(
    gram('Surn'),
    not_(gram('Abbr')),
)
FIRST = and_(
    gram('Name'),
    not_(gram('Abbr')),
)

POSITION = morph_pipeline([
    'управляющий директор',
    'вице-мэр'
])

gnc = gnc_relation()
NAME = rule(
    FIRST.interpretation(
        Name.first
    ).match(gnc),
    LAST.interpretation(
        Name.last
    ).match(gnc)
).interpretation(
    Name
)

PERSON = rule(
    POSITION.interpretation(
        Person.position
    ).match(gnc),
    NAME.interpretation(
        Person.name
    )
).interpretation(
    Person
)

parser = Parser(PERSON)

match = parser.match('управляющий директор Иван Ульянов')
print(match)

Person(
    position='управляющий директор',
    name=Name(
        first='Иван',
        last='Ульянов'
)

Documentation

All materials are in Russian:

Support

Development

Test:

make test

Package:

make version
git push
git push --tags

make clean wheel upload

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for yargy, version 0.15.0
Filename, size File type Python version Upload date Hashes
Filename, size yargy-0.15.0-py3-none-any.whl (41.1 kB) File type Wheel Python version py3 Upload date Hashes View
Filename, size yargy-0.15.0.tar.gz (69.3 kB) File type Source Python version None Upload date Hashes View

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring DigiCert DigiCert EV certificate Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page