Skip to main content

Checks to improve scientific writing. See https://github.com/markfullmer/grammark for more information.

Project description

This is a python port of Grammark. The grammar checker developed by Mark Fullmer. Visit https://github.com/markfullmer/grammark to learn more about Grammark. All credit goes to markfullmer.

The contributions of this work can be summarized as follows:

  • Key words, that are used by the grammar rules are collected in JSON files. This makes it hopefully easier to manage expressions and the JSON files can be used in other projects.
  • The grammar rules are defined in (informal) logic. It is presented in the next section of this README. This makes the workings of grammark more transparent. However, I reverse engineered those rules from the angular app and it is absolutely possible that I made mistakes.
  • The different checks implemented by grammark, e.g., passive voice, wordiness, academic style..., are provided as functions in the Python package. The functions return the ratings, as proposed by grammark and offsets indicating the problematic positions.

Usage

Install package with pip:

pip install py-grammark

Then import the different functions:

from grammark import check_wordiness, \
				check_nominalizations, \
				check_passive_voice, \
				check_sentences, \
				check_academic, \
				check_transitions, \
				check_grammar, \
				check_eggcorns

text = "This is some string."

check_wordiness(text)
check_normalizations(text)
...

Every function can be called with the text as parameter. Text must be of type string.

The return values look as follows:

{
	"findings": [
		{"start_pos": 10, "end_pos": 12, "remark": "Some remark or None, if there is no"},
		...
	],
	"score": 40
}

It returns a dictionary. The score is calculated as defined by Grammark (https://github.com/markfullmer/grammark). The findings contain the offsets, where the found problem, resides in the provided text.

The remarks are provided by Grammark. Again all credits goes to https://github.com/markfullmer/grammark. The remark can be None if there are no remarks for a certain check.

An (Informal) Definition of the Grammar Rules

In the following we define the workings of the different tools provided by grammark.

Thereby, $W$ represents the set of words, that is built by parsing the text provided by the user (see section Parsing the Text for a detailed discussion). Furthermore, $|w|$ represents the size of word $w \in W$ and we use $w[a:b]$ for $a,b \in \mathbb{N}$ to represent substrings, where $a,b$ represent positions in the string, where the substring starts and ends, respectively. $pre(w) \in W$ indicates the predecessing word, that occurs before $w$ in the original text provided by the user. We write $upper(w)$ for $w \in W$ to denote the word $w$ where the first letter is capitalized.

Note that $w \in W$ represents not necessarily a single word, but can be also a sequence of words if we try to match several consecutive words.

We use $s(w)$ to denote the sentence, that contains the word $w \in W$ and $s(w)[i]$ to select words in the sentence by index $i \in \mathbb{N}$.

Passive voice

Let $I$ be the set of irregulars and $H$ be the set of helpers with sets as defined in src/resources/passive_voice.json:

The passive voice check hits, if for a word $w \in W$

$(w[|w| - 1:|w|] = "ed" \lor w \in I) \land pre(w) \in H$

In text: Every word that ends with "ed" or is an irregular verb and the predecessing word is a helper word.

Wordiness

Let $K$ be the set of keywords. It is constructed by the first elements of the set keywords in file src/resources/wordiness.json

The wordiness check hits, if for a word $w \in W$

$\forall k \in K: w = k \lor w = upper(k)$

In text: We look if one of the elements in $K$ occurs in the text. We do this also for the situation, that it has a capitalized first letter.

Nominalizations

Let $E$ be the set of postfixes taken from the file src/resources/normalizations.json

The nominalization check hits if for a word $w \in W$

$\exists a,b \in \mathbb{N}: w[a:b] \in E \land |w| > 7$

In text: The rule checks if the word $w$ ends with a postfix contained in $E$ and if its length is greater than seven.

Sentences

Let $K$ be the set of keywords as defined in file src/resources/sentences.json

The sentences check hits if for a word $w \in W$

$|s(w)| > 50 \lor s(w)[0] \in K$

Here $|s(w)|$ denotes the number of words in the sentence.

Transitions

Let $K$ be the set of keywords from the file src/resources/transitions.json

The transition check hits, if for a word $w \in W$

$\forall k \in K: w = k \lor w = upper(k)$

In text: We look if one of the elements in $K$ occurs in the text. We do this also for the situation, that it has a capitalized first letter.

Academic

Let $K$ be the set of keywords from the file src/resources/academic.json

The academic check hits, if for a word $w \in W$

$\forall k \in K: w = k \lor w = upper(k)$

In text: We look if one of the elements in $K$ occurs in the text. We do this also for the situation, that it has a capitalized first letter.

Grammar

Let $K$ be the set of keywords from the file src/resources/grammar.json

The grammar check hits, if for a word $w \in W$

$\forall k \in K: w = k \lor w = upper(k)$

In text: We look if one of the elements in $K$ occurs in the text. We do this also for the situation, that it has a capitalized first letter.

Egcorns

Let $K$ be the set of keywords from the file src/resources/eggcorns.json

The grammar check hits, if for a word $w \in W$

$\forall k \in K: w = k \lor w = upper(k)$

In text: We look if one of the elements in $K$ occurs in the text. We do this also for the situation, that it has a capitalized first letter.

Parsing the Text

Basically, we use two variants to work with the text. Either we check word-wise, thereby, the text is split based on the following chars " ,.!?:-\n'")({}". That means word are limited by these chars and will be identified as single words.

The other variant is based on regex in the hope that this is for certain operations more efficient.

Development

Build

python3 -m build

Run tests

python3 -m unittest tests.test_grammark.TestGrammark

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

py-grammark-0.0.2.tar.gz (34.9 kB view details)

Uploaded Source

Built Distribution

py_grammark-0.0.2-py3-none-any.whl (34.3 kB view details)

Uploaded Python 3

File details

Details for the file py-grammark-0.0.2.tar.gz.

File metadata

  • Download URL: py-grammark-0.0.2.tar.gz
  • Upload date:
  • Size: 34.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.8.2 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.7

File hashes

Hashes for py-grammark-0.0.2.tar.gz
Algorithm Hash digest
SHA256 4f8be74b4b350ea61aced145abbe8f14173b190d917dd49c6cc001cfb8402092
MD5 deea634dc692f6730d14e6c8ced0952b
BLAKE2b-256 79f475e496538761d6e561e5ffb671e252f1ad71f9def9e5388cc71e5abb5a75

See more details on using hashes here.

File details

Details for the file py_grammark-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: py_grammark-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 34.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.8.2 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.7

File hashes

Hashes for py_grammark-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 8a7f791aa704c63e9636838b0361957e51f65a1374d0f4fd6cedc1fb5e910ad9
MD5 94021b2085b0baeba645c3fdcf6711f1
BLAKE2b-256 d4c73ecebb483fe4304613ce743b087b048c92b7bd238537527cc76e08487f7d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page