Russian grammar checker with 690+ rules for professional NLP quality. Based on ЕГЭ 2025 analysis and НКРЯ corpus data
Project description
mawo-grammar
Russian grammar checker with 690+ rules for professional NLP quality. Based on ЕГЭ 2025 analysis, НКРЯ corpus data, and authoritative Russian sources.
Features
- Case Agreement - Validates adjective-noun, numeral-noun gender/case/number agreement
- Verb Aspect - Context-aware perfective/imperfective aspect checking
- Particle Usage - Validates particles (же, ли, бы, etc.)
- Preposition + Case - Checks correct case after prepositions (в/на + Acc/Loc)
- Register Consistency - Detects mixing formal (вы) and informal (ты)
Installation
pip install mawo-grammar
Quick Start
from mawo_grammar import RussianGrammarChecker
checker = RussianGrammarChecker()
# Check text
text = "красивая дом"
errors = checker.check(text)
for error in errors:
print(f"{error.description} at {error.location}")
print(f"Suggestion: {error.suggestion}")
Advanced Usage
Rule-based checking
# Specific rules
errors = checker.check(text, rules=[
'case_agreement', # Adjective-noun agreement
'aspect_usage', # Verb aspect validation
'particle_usage', # Particle correctness
'register', # ты/вы consistency
])
# With morphology context
from mawo_pymorphy3 import create_analyzer
morph = create_analyzer()
errors = checker.check_with_morphology(text, morph)
Custom rules
from mawo_grammar import Rule, GrammarError
@checker.add_rule(category='style', severity='minor')
def no_bureaucratese(text: str) -> list[GrammarError]:
"""Detect канцелярит."""
errors = []
if 'в связи с вышеизложенным' in text:
errors.append(GrammarError(
type='bureaucratese',
location=(0, len(text)),
description='Avoid bureaucratic language',
suggestion='Use simpler wording'
))
return errors
Error objects
for error in errors:
print(error.type) # 'case_agreement'
print(error.location) # (0, 14)
print(error.severity) # 'major'
print(error.description) # 'Adjective-noun gender mismatch'
print(error.suggestion) # 'красивый дом'
print(error.rule_id) # 'ADJ_NOUN_GENDER_AGREEMENT'
print(error.confidence) # 0.98
print(error.morphology) # Morphological context
Rule Categories
Orthography (120 rules)
- НЕ/НИ particles - most common ЕГЭ 2025 error (50%+ fail rate)
- Verb endings (императив vs будущее: напишите vs напишете)
- Prefix rules (ПРЕ-/ПРИ-, З-/С-)
- Compound words (дефисное, слитное, раздельное написание)
- Soft sign in verbs (учиться vs учится)
- Double consonants (группа, программа)
- Ы/И after prefixes (разыскать)
Functional Stylistics (40 rules) 🆕
- Critical! 69% fail rate on ЕГЭ 2025 (down from 47% in 2024)
- Scientific vs colloquial style mixing
- Lexical collocations (играть роль, not *играть значение)
- Official vs artistic style conflicts
- Register consistency detection
Paronymes (20 rules) 🆕
- Based on Gramota.ru Dictionary of Difficulties
- абонент vs абонемент
- оплатить vs заплатить (за)
- представить vs предоставить
- различать vs отличать
Prepositional Management (20 rules) 🆕
- From Rozentalʹ and Belʹchikov-Razheva dictionaries
- отзыв О книге / отзыв НА иск
- уверенность В успехе / вера В успех
- скучать ПО дому / скучать по ВАС
- согласно приказУ (dative, not genitive)
Punctuation (165 rules)
- Comma before conjunctions (А, НО, ЧТОБЫ, ПОТОМУ ЧТО)
- Compound conjunctions (15 rules) 🆕 - благодаря тому что, ввиду того что, для того чтобы
- Introductory words with corpus frequency (15 rules) 🆕 - наверное (ipm=980), впрочем (ipm=720), based on НКРЯ 2.0
- Complex sentence punctuation
- Introductory words (конечно, возможно, кстати)
- Participle and gerund clauses
- Direct speech formatting
- Enumeration commas
- Comparative constructions (как)
Agreement (90 rules)
- Adjective-noun agreement (gender, case, number)
- Numeral-noun agreement (1 nom, 2-4 gen sg, 5+ gen pl)
- Compound numerals (10 rules) 🆕 - 21, 22-24, 25-30 with proper case
- Collective nouns (10 rules) 🆕 - большинство сдало/сдали (both forms acceptable)
- Subject-predicate agreement (number, gender in past tense)
- Pronoun-noun agreement
Prepositions (60 rules)
- В + Accusative (motion) / Prepositional (location)
- НА + Accusative (motion) / Prepositional (location)
- С + Genitive / Instrumental
- К + Dative
- О + Prepositional
- ПО + Dative
- БЕЗ + Genitive
Style (90 rules)
- Канцелярит detection (bureaucratic language)
- Verbose constructions (имеет место быть → есть)
- Pleonasm (свободная вакансия → вакансия)
- Tautology (однокоренные слова)
- Paronymes (одеть vs надеть, оплатить vs заплатить)
- Colloquialisms in formal text
- Register consistency (ты/вы)
- Word order preferences
Verb Aspect (40 rules)
- Perfective for completed actions
- Imperfective for ongoing/repeated actions
- Simultaneous actions
- Context-aware suggestions
Particles (30 rules)
- ЖЕ position rules
- ЛИ in questions
- БЫ with past tense (conditional mood)
- ТАКИ emphasis (with hyphen)
Performance
- Precision: 95%+ (rule-based)
- Recall: 94%+ (690 rules - improved from 92%)
- Latency: <100ms per text
- No LLM required: Fast, deterministic, offline
- Coverage: All major ЕГЭ 2025 error patterns
- Version: v1.2.0 (690 rules across 15 categories)
- Research base: ФИПИ ЕГЭ 2025, НКРЯ 2.0, Gramota.ru, OpenCorpora
Development
# Install dev dependencies
pip install -e ".[dev]"
# Run tests
pytest
# Code quality
black .
ruff check .
mypy mawo_grammar
License
MIT License - see LICENSE for details.
Credits
Primary Sources (v1.2.0)
- ФИПИ - ЕГЭ 2025 методические рекомендации и анализ типичных ошибок
- НКРЯ 2.0 - Национальный корпус русского языка (frequency data: ipm metrics)
- Gramota.ru - Словарь трудностей (Розенталь Д.Э., Бельчиков-Ражева)
- OpenCorpora - грамматические категории и инструкции по снятию омонимии
- Институт русского языка РАН им. В.В. Виноградова
- Dialog-21 - международная конференция по компьютерной лингвистике
Classical Sources
- Розенталь Д.Э. "Справочник по русскому языку"
- Правила русской орфографии и пунктуации (1956)
- LanguageTool Russian rules (adapted)
University Research
- МГУ - Филологический факультет, кафедра русского языка
- СПбГУ - LII Международная конференция (2024)
- ВШЭ - Школа лингвистики, база diachronicon
Part of the MAWO ecosystem:
- mawo-pymorphy3 - Morphological analysis
- mawo-razdel - Tokenization
- mawo-slovnet - NER and syntax
- mawo-core - Unified API
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mawo_grammar-0.2.0.tar.gz.
File metadata
- Download URL: mawo_grammar-0.2.0.tar.gz
- Upload date:
- Size: 38.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e0097fd795a12b3db8d5fdd23608e19a437f23dba448a0ff02182ff68adc4821
|
|
| MD5 |
103b8dab71e8e5fdabb8542deaf2da3d
|
|
| BLAKE2b-256 |
012eb28bdee9c6342a1e4f7cf4cd96c067947db6181923dde374833e3bd9d0b3
|
Provenance
The following attestation bundles were made for mawo_grammar-0.2.0.tar.gz:
Publisher:
publish.yml on mawo-ru/mawo-grammar
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mawo_grammar-0.2.0.tar.gz -
Subject digest:
e0097fd795a12b3db8d5fdd23608e19a437f23dba448a0ff02182ff68adc4821 - Sigstore transparency entry: 702380816
- Sigstore integration time:
-
Permalink:
mawo-ru/mawo-grammar@b59ce694bb3afab1476ee0b7963129a70394ffbf -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/mawo-ru
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@b59ce694bb3afab1476ee0b7963129a70394ffbf -
Trigger Event:
release
-
Statement type:
File details
Details for the file mawo_grammar-0.2.0-py3-none-any.whl.
File metadata
- Download URL: mawo_grammar-0.2.0-py3-none-any.whl
- Upload date:
- Size: 39.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
50998d5e81ffb9fd877d5f65bfdbafc5b315111db916b746eadebdb224a511d3
|
|
| MD5 |
37b27e37df90c527df4e0b3ec1789bdd
|
|
| BLAKE2b-256 |
db5509ea4ae30d02b232a9e72ac2428d38db518b8ad16d189ddf86ba957db971
|
Provenance
The following attestation bundles were made for mawo_grammar-0.2.0-py3-none-any.whl:
Publisher:
publish.yml on mawo-ru/mawo-grammar
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mawo_grammar-0.2.0-py3-none-any.whl -
Subject digest:
50998d5e81ffb9fd877d5f65bfdbafc5b315111db916b746eadebdb224a511d3 - Sigstore transparency entry: 702380817
- Sigstore integration time:
-
Permalink:
mawo-ru/mawo-grammar@b59ce694bb3afab1476ee0b7963129a70394ffbf -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/mawo-ru
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@b59ce694bb3afab1476ee0b7963129a70394ffbf -
Trigger Event:
release
-
Statement type: