Skip to main content

Rule-based facts extraction for Russian language

Project description

CI

Yargy uses rules and dictionaries to extract structured information from Russian texts. Yargy is similar to Tomita parser.

Install

Yargy supports Python 3.7+, PyPy 3, depends only on Pymorphy2.

$ pip install yargy

Usage

from yargy import Parser, rule, and_, not_
from yargy.interpretation import fact
from yargy.predicates import gram
from yargy.relations import gnc_relation
from yargy.pipelines import morph_pipeline


Name = fact(
    'Name',
    ['first', 'last'],
)
Person = fact(
    'Person',
    ['position', 'name']
)

LAST = and_(
    gram('Surn'),
    not_(gram('Abbr')),
)
FIRST = and_(
    gram('Name'),
    not_(gram('Abbr')),
)

POSITION = morph_pipeline([
    'управляющий директор',
    'вице-мэр'
])

gnc = gnc_relation()
NAME = rule(
    FIRST.interpretation(
        Name.first
    ).match(gnc),
    LAST.interpretation(
        Name.last
    ).match(gnc)
).interpretation(
    Name
)

PERSON = rule(
    POSITION.interpretation(
        Person.position
    ).match(gnc),
    NAME.interpretation(
        Person.name
    )
).interpretation(
    Person
)

parser = Parser(PERSON)

match = parser.match('управляющий директор Иван Ульянов')
print(match)

Person(
    position='управляющий директор',
    name=Name(
        first='Иван',
        last='Ульянов'
    )
)

Documentation

All materials are in Russian:

Support

Development

Dev env

brew install graphviz

python -m venv ~/.venvs/natasha-yargy
source ~/.venvs/natasha-yargy/bin/activate

pip install -r requirements/dev.txt
pip install -e .

python -m ipykernel install --user --name natasha-yargy

Test + lint

make test

Update docs

make exec-docs

# Manually check git diff docs/, commit

Release

# Update setup.py version

git commit -am 'Up version'
git tag v0.16.0

git push
git push --tags

# Github Action builds dist and publishes to PyPi

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

yargy-0.16.0.tar.gz (68.2 kB view details)

Uploaded Source

Built Distribution

yargy-0.16.0-py3-none-any.whl (34.0 kB view details)

Uploaded Python 3

File details

Details for the file yargy-0.16.0.tar.gz.

File metadata

  • Download URL: yargy-0.16.0.tar.gz
  • Upload date:
  • Size: 68.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for yargy-0.16.0.tar.gz
Algorithm Hash digest
SHA256 c917eefb32a40c23c46b6ca88d68927072dd00ab94e90fd5dc6ab0a62b59b593
MD5 4d60e6f3ebc5567a69e85c752a61d29b
BLAKE2b-256 87ff0ac3b2ae6aca6026e1acc872c1c371182662e94b1c1ab0b9c68854472670

See more details on using hashes here.

File details

Details for the file yargy-0.16.0-py3-none-any.whl.

File metadata

  • Download URL: yargy-0.16.0-py3-none-any.whl
  • Upload date:
  • Size: 34.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for yargy-0.16.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7ca469fa47b336367fab49e8f33ccc195584f69ab758e8196f2fdaa7492adf22
MD5 5ccec641d27d5fc53207666a83f2159f
BLAKE2b-256 b755d065a9812c619889fbe01a1863743ee45f7c60c462fc95b19576972ee9e4

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page