Checks to improve scientific writing. See https://github.com/markfullmer/grammark for more information.

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

This is a python port of Grammark. The grammar checker developed by Mark Fullmer. Visit https://github.com/markfullmer/grammark to learn more about Grammark. All credit goes to markfullmer.

The contributions of this work can be summarized as follows:

Key words, that are used by the grammar rules are collected in JSON files. This makes it hopefully easier to manage expressions and the JSON files can be used in other projects.
The grammar rules are defined in (informal) logic. It is presented in the next section of this README. This makes the workings of grammark more transparent. However, I reverse engineered those rules from the angular app and it is absolutely possible that I made mistakes.
The different checks implemented by grammark, e.g., passive voice, wordiness, academic style..., are provided as functions in the Python package. The functions return the ratings, as proposed by grammark and offsets indicating the problematic positions.

Usage

Install package with pip:

pip install py-grammark

Then import the different functions:

from grammark import check_wordiness, \
				check_nominalizations, \
				check_passive_voice, \
				check_sentences, \
				check_academic, \
				check_transitions, \
				check_grammar, \
				check_eggcorns

text = "This is some string."

check_wordiness(text)
check_normalizations(text)
...

Every function can be called with the text as parameter. Text must be of type string.

The return values look as follows:

{
	"findings": [
		{"start_pos": 10, "end_pos": 12, "remark": "Some remark or None, if there is no"},
		...
	],
	"score": 40
}

It returns a dictionary. The score is calculated as defined by Grammark (https://github.com/markfullmer/grammark). The findings contain the offsets, where the found problem, resides in the provided text.

The remarks are provided by Grammark. Again all credits goes to https://github.com/markfullmer/grammark. The remark can be None if there are no remarks for a certain check.

An (Informal) Definition of the Grammar Rules

In the following we define the workings of the different tools provided by grammark.

Thereby, $W$ represents the set of words, that is built by parsing the text provided by the user (see section Parsing the Text for a detailed discussion). Furthermore, $|w|$ represents the size of word $w \in W$ and we use $w[a:b]$ for $a,b \in \mathbb{N}$ to represent substrings, where $a,b$ represent positions in the string, where the substring starts and ends, respectively. $pre(w) \in W$ indicates the predecessing word, that occurs before $w$ in the original text provided by the user. We write $upper(w)$ for $w \in W$ to denote the word $w$ where the first letter is capitalized.

Note that $w \in W$ represents not necessarily a single word, but can be also a sequence of words if we try to match several consecutive words.

We use $s(w)$ to denote the sentence, that contains the word $w \in W$ and $s(w)[i]$ to select words in the sentence by index $i \in \mathbb{N}$.

Passive voice

Let $I$ be the set of irregulars and $H$ be the set of helpers with sets as defined in src/resources/passive_voice.json:

The passive voice check hits, if for a word $w \in W$

$(w[|w| - 1:|w|] = "ed" \lor w \in I) \land pre(w) \in H$

In text: Every word that ends with "ed" or is an irregular verb and the predecessing word is a helper word.

Wordiness

Let $K$ be the set of keywords. It is constructed by the first elements of the set keywords in file src/resources/wordiness.json

The wordiness check hits, if for a word $w \in W$

$\forall k \in K: w = k \lor w = upper(k)$

In text: We look if one of the elements in $K$ occurs in the text. We do this also for the situation, that it has a capitalized first letter.

Nominalizations

Let $E$ be the set of postfixes taken from the file src/resources/normalizations.json

The nominalization check hits if for a word $w \in W$

$\exists a,b \in \mathbb{N}: w[a:b] \in E \land |w| > 7$

In text: The rule checks if the word $w$ ends with a postfix contained in $E$ and if its length is greater than seven.

Sentences

Let $K$ be the set of keywords as defined in file src/resources/sentences.json

The sentences check hits if for a word $w \in W$

$|s(w)| > 50 \lor s(w)[0] \in K$

Here $|s(w)|$ denotes the number of words in the sentence.

Transitions

Let $K$ be the set of keywords from the file src/resources/transitions.json

The transition check hits, if for a word $w \in W$

$\forall k \in K: w = k \lor w = upper(k)$

In text: We look if one of the elements in $K$ occurs in the text. We do this also for the situation, that it has a capitalized first letter.

Academic

Let $K$ be the set of keywords from the file src/resources/academic.json

The academic check hits, if for a word $w \in W$

$\forall k \in K: w = k \lor w = upper(k)$

In text: We look if one of the elements in $K$ occurs in the text. We do this also for the situation, that it has a capitalized first letter.

Grammar

Let $K$ be the set of keywords from the file src/resources/grammar.json

The grammar check hits, if for a word $w \in W$

$\forall k \in K: w = k \lor w = upper(k)$

In text: We look if one of the elements in $K$ occurs in the text. We do this also for the situation, that it has a capitalized first letter.

Egcorns

Let $K$ be the set of keywords from the file src/resources/eggcorns.json

The grammar check hits, if for a word $w \in W$

$\forall k \in K: w = k \lor w = upper(k)$

In text: We look if one of the elements in $K$ occurs in the text. We do this also for the situation, that it has a capitalized first letter.

Parsing the Text

Basically, we use two variants to work with the text. Either we check word-wise, thereby, the text is split based on the following chars " ,.!?:-\n'")({}". That means word are limited by these chars and will be identified as single words.

The other variant is based on regex in the hope that this is for certain operations more efficient.

Development

Build

python3 -m build

Run tests

python3 -m unittest tests.test_grammark.TestGrammark

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

This version

0.0.2

Dec 16, 2021

0.0.1

Dec 14, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

py-grammark-0.0.2.tar.gz (34.9 kB view details)

Uploaded Dec 16, 2021 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

py_grammark-0.0.2-py3-none-any.whl (34.3 kB view details)

Uploaded Dec 16, 2021 Python 3

File details

Details for the file py-grammark-0.0.2.tar.gz.

File metadata

Download URL: py-grammark-0.0.2.tar.gz
Upload date: Dec 16, 2021
Size: 34.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.7.1 importlib_metadata/4.8.2 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.7

File hashes

Hashes for py-grammark-0.0.2.tar.gz
Algorithm	Hash digest
SHA256	`4f8be74b4b350ea61aced145abbe8f14173b190d917dd49c6cc001cfb8402092`
MD5	`deea634dc692f6730d14e6c8ced0952b`
BLAKE2b-256	`79f475e496538761d6e561e5ffb671e252f1ad71f9def9e5388cc71e5abb5a75`

See more details on using hashes here.

File details

Details for the file py_grammark-0.0.2-py3-none-any.whl.

File metadata

Download URL: py_grammark-0.0.2-py3-none-any.whl
Upload date: Dec 16, 2021
Size: 34.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.7.1 importlib_metadata/4.8.2 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.7

File hashes

Hashes for py_grammark-0.0.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8a7f791aa704c63e9636838b0361957e51f65a1374d0f4fd6cedc1fb5e910ad9`
MD5	`94021b2085b0baeba645c3fdcf6711f1`
BLAKE2b-256	`d4c73ecebb483fe4304613ce743b087b048c92b7bd238537527cc76e08487f7d`

See more details on using hashes here.

py-grammark 0.0.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Usage

An (Informal) Definition of the Grammar Rules

Passive voice

Wordiness

Nominalizations

Sentences

Transitions

Academic

Grammar

Egcorns

Parsing the Text

Development

Build

Run tests

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes