An ASD-STE100 (Simplified Technical English) parser.
Project description
biz.dfch.AsdSte100Parser
This library implements a:
- An EBNF grammar for Lark (earley)
- A multi-pass transformer
You must use a special structure of Markdown as the input text.
Format
- There a top-level tokens. These are tokens, that must be at the top-most hierarchical level of the text.
- There are tokens, that can only appear inside other tokens.
- A text must end with two
NEWLINEtokens.
Whitespace (WS)
- Whitespace is a sequence of either
\tortokens. \tis the same as eighttokens.
This is TEXT with whitespace.
This is TEXT with multiple whitespace.
And\tthis\tis\talso\ttext\twith\twhitespace.
Single space (SPACE)
- A
SPACEis a delimiter token that only is inside other `tokens. - For example, in
1) TexttheSPACEis the delimter after1).
1) A work step.
NEWLINE
- A
NEWLINEis a top-level token. - This is a
\r\nor\n.
TEXT
Any character sequence, that does not contain these characters: ^"'*_()\s` (regex).
APOSTROPHE
- An
APOSTROPHEis either'sor'when it comes directly afterTEXT. - You must not put an
APOSTROPHEin asquote.
Heading
- A
headingis a top-level token. - A
NEWLINEthat starts with a#(or a multiple of#) with one or moreTEXTtokens. - Two
NEWLINEtokens stop aheading.
# Heading level 1
## Heading level 2
### Heading level 3
#### Heading level 4
##### Heading level 5
Paragraph
- A
paragraphis a top-level token. - A
paragraphstarts after aNEWLINE, whenTEXTdirectly comes after theNEWLINEtoken. - Two
NEWLINEtokens stop aparagraph. - A
paragraphcan have aNEWLINEtoken betweenTEXTtokens.
This is a paragraph. This is still the paragraph.
This is another paragraph. This is still the second paragraph.
This is still the second paragraph (after a LINEBREAK).
This is a new and the last paragraph.
Procedure (list of work steps)
- A
procedureis a top-level token. - A
procedureis one or more work step (proc_item). - A
procedurestarts after aNEWLINEtoken, when[a-zA-Z0-9]+(proc_marker) and[.)](PROC_DELIMITER) directly come after theNEWLINEtoken. - A
proc_itemcan contain a vertical list. - A
proc_itemcan contain aNOTEor a safety instruction (WARNING,CAUTION). - In contrast to other markdown, there is no two
NEWLINEto stop the vertical list,NOTEor safety instruction. There is only a singleNEWLINEto stop one of these.
1. This is the first work step.
2. This is the second work step.
* This is a list item in a work step.
* Another list item in a work step.
3. This is the third work step.
NOTE: This is a note for the work step.
4. This is the fourth work step.
WARNING: This is a safety instruction for this work step of the type 'WARNING'.
4. This is the fifth work step.
CAUTION: This is a safety instruction for this work step of the type 'CAUTION'.
5. A work step can contain multiple:
* 'NOTE'
* 'WARNING'
* 'CAUTION'.
NOTE: This is a note for the work step.
WARNING: This is a safety instruction for this work step of the type 'WARNING'.
CAUTION: This is a safety instruction for this work step of the type 'CAUTION'.
6. This is the last work step.
Vertical list (list_item)
- A vertical list can occur in a
paragraphor a procedure (proc_item). - A vertical list is one or more
lite_item. - A
NEWLINEstarts alist_itemwhenWS+, alist_markerand aSPACEcome directly after theNEWLINEtoken. - Before the
list_item, there isTEXTthat has a:as the last token. - A numeric
list_markercannot contain a.or). This is only correct forproc_item. - A
list_markerhasWS(indentation). - You must not put a vertical list inside another vertical list.
This is a paragraph, that starts a list:
* Indented list item with "*" as the list marker
* Another list item.
This is another paragraph, that starts a list:
* More indented list item with "*" as the list marker
* Another list item.
This is a paragraph, that starts a list:
1 Indented list item with a numeric as the list marker
2 Another list item.
This is a paragraph, that starts a list:
a Indented list item with a lower alpha as the list marker
a Another list item.
This is a paragraph, that starts a list:
A Indented list item with an upper alpha as the list marker
B Another list item.
Quote and cite
Double quote (dquote)
- This formatter shows text in "double quote" (
dquote). - This token cannot contain
NEWLINE. - You must not nest
dquote. dquotecan containsquote.squotecan contain "formatters".
"this is text in double quote"
Single quote (squote)
- This formatter shows text in "souble quote" (
squote). - This token cannot contain
NEWLINE. - You must not nest
squote. squotecan containdquote.squotecan contain "formatters".
*this is text in single quote*
Citation (cite)
- A
citeis a top-level token. - This formatter shows text as a "citation" (
cite). - A
NEWLINEstarts acite, when a>comes directly after theNEWLINEtoken. - A
citemust not be empty. It must containTEXTorWS. - This token cannot contain
NEWLINE. - You must not nest
cite.
> This is a citation line.
> This is another citation line.
Formatters
Bold
- This formatter shows text is bold (
bold). - This token cannot contain
NEWLINE.
*this is text in bold*
Emphasis
- This formatter shows text is emphasis (
emph). - This token cannot contain
NEWLINE.
_this is text in emphasis_
Bold emphasis
- This formatter shows text is bold emphasis (
boldemph). - This token cannot contain
NEWLINE.
*_this is text in bold emphasis_*
Code
- This formatter shows text is
monospace(code). - This token can contain
NEWLINE.
`this is text in monospace`
Examples:
You find examples in ./test/test_data/.
Heading with paragraph
# This is a heading level 1
This is the start of a paragraph. And this is the end of the paragraph.
This is a new paragraph. A paragraph continues after a single NEWLINE.
This is still the same paragraph.
Paragraph with vertical lists
# This is a heading level 1
This is the start of a paragraph. This will start a new vertical list:
* Note, that the list delimiter '*' is indented by a minimum of one `WS`.
* The next list item.
This continues the paragraph. This is not standard 'Github'-flavored Markdown.
This is a new paragraph. This will start a new vertical list:
- This is another list delimiter.
- Another list item.
This is a new paragraph. This will start a new vertical list:
1 This is another list delimiter.
2 Another list item.
This is another paragraph.
Paragraph with formatters, quotes and cite
# This is a heading level 1
## Text in quotes
This is a paragraph. In *this* paragraph we have "text in double quotes".
> Here is a citation. This is similar to a full line in "double quotes".
This is another paragraph. In _that_ paragraph we have 'text in single quotes'.
At last, this is another paragraph. In *_that_* paragraph we have "text in 'double' quotes" that contains "'single' quotes".
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file biz_dfch_ste100parser-0.1.3.tar.gz.
File metadata
- Download URL: biz_dfch_ste100parser-0.1.3.tar.gz
- Upload date:
- Size: 31.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
514787d4bc764f4139fa95f43ffb7afa4daecf19ad3c6a7d018ca879480b97f8
|
|
| MD5 |
f30657e3b971774c03fa18d1c58ba227
|
|
| BLAKE2b-256 |
57ea7c3fa30d23c77d58205a0187b653eedca58bb084c096918ca73759502432
|
File details
Details for the file biz_dfch_ste100parser-0.1.3-py3-none-any.whl.
File metadata
- Download URL: biz_dfch_ste100parser-0.1.3-py3-none-any.whl
- Upload date:
- Size: 37.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
950f3ddbc3e8bdd07bc884fd72f73ef660ae77f8970961b855d2fa89936e078c
|
|
| MD5 |
c9ebcf8eb2ad7b1222539383eb9f6ba7
|
|
| BLAKE2b-256 |
cb948f9d29113b67e75c3daf5ba6cc8c8f0e4800fd27faa07be84922a0f0f9f0
|