A web type setter

These details have not been verified by PyPI

Project links

Project description

test

degrotesque — A web type setter.

Introduction

degrotesque beautifies the web.

degrotesque is a Python script. It loads a text/markdown/HTML/XML file from the disc — or several in batch — and for each, it replaces some commonly used non-typographic characters, such as ", ', -, etc. into their typographic representation for improving the pages' appearance.

E.g.:

"Well - that's not what I had expected."

will become:

“Well — that's not what I had expected.”

I think it looks much better.

The starting and ending quotes have been replaced by “ and ”, respectively, the ' by ' and the - by an —. Of course, this script omits HTML-elements. It keeps the complete format as-is, and replaces characters by their proper HTML entity name or the respective unicode character.

It is meant to be a relatively reliable post-processing step for web pages before releasing them. In version 3.0.0 the support of markdown files was added.

Background

I often write my texts and web pages using a plain editor. As such, the character " is always used for quotes, a dash is always a minus, etc.

I wanted to have a tool that automatically recognizes which characters should be replaced by their more typographic counterpart and applies the according rules.

I think it’s a pity that major Desktop Publishing applications do this on the fly but many and even major web sites still show us plain ASCII characters.

degrotesque does the job pretty fine. After writing / building my pages, the tool converts them to a prettier and typographically more correct form. The structure and format of the pages is completely remained. And as said, it works reliable.

If you need any consultations, please let me know. If you know better, too.

Download and Installation

The current version is degrotesque-3.0.0.

You may install degrotesque using

python -m pip install degrotesque

You may download a copy or fork the code at degrotesque's github page.

Besides, you may download the current release here:

License

degrotesque is licensed under BSD license.

Documentation

Usage

degrotesque is implemented in Python. It is started on the command line.

The option -i <PATH> / --input <PATH> tells the script which file(s) shall be read — you may name a file or a folder, here. If the option -r / --recursive is set, the given folder will be processed recursively.

The tool processes text files, HTML files, XML files, and their derivatives. Per default, all files are processed when -i points to a folder. You may limit the files to process by their extension using the -e <EXTENSION>[,<EXTENSION>]* / --extensions <EXTENSION>[,<EXTENSION>]* option. The files are assumed to be encoded using UTF-8 per default. You may change the encoding using the option -E <ENCODING> / --encoding <ENCODING>.

The files are read one by one and the replacement of plain ASCII chars by some nicer ones is based upon a chosen set of “actions”. Known and default actions are given in Appendix A. You may select the actions to apply using the -a <ACTION_NAME>[,<ACTION_NAME>]* / --actions <ACTION_NAME>[,<ACTION_NAME>]* option. The default actions are ‘masks’, ‘quotes.english’, ‘dashes’, ‘ellipsis’, ‘math’, ‘apostrophe’, and ‘commercial’.

Per default, Unicode entities are inserted (e.g. ‘–’ for an ‘—’). You may change this using the --format <FORMAT> / -f <FORMAT>. The following formats are currently supported:

‘unicode’: uses numeric entities (e.g. ‘–’ for an ‘—’);
‘html’: uses numeric entities (e.g. ‘—’ for an ‘—’);
‘text’: uses plain (utf-8) characters (e.g. ‘—’ for an ‘—’).

degrotesque tries to determine whether the read files are plain text files, markdown files, or XML or HTML derivatives using the files& extensions and contents. Appendix B lists the extensions by which files are recognized as HTML / markdown files. To be secure, one may set --html / -H when processing HTML files, --markdown / -M when processing markdown files, or --text / -T when processing plain text files.

When parsing XML/HTML files, the script does not change the quotation marks within elements, of course. As well, the contents of several elements, such as <code> or <pre>, are skipped. You may change the list of elements which contents shall not be processed using the option -s <ELEMENT_NAME>[,<ELEMENT_NAME>]* / --skip <ELEMENT_NAME>[,<ELEMENT_NAME>]*. The list of elements that are skipped per default is given in Appendix C.

When parsing markdown files, code — both indented and defined using ` — is skipped. Quotes as well.

After the actions have been applied to its contents, the file is saved. By default, a backup of the original file is saved under the same name, with the appendix “.orig”. You may omit the creation of these backup files using the option -B / --no-backup.

The option --help / -h prints a help screen. The option --version the degrotesque's version number.

Please note that “masks” is a special action set that disallows the application of some other actions so that, e.g., the dividers in ISBN numbers are not replaced by –. The masks action set is given in Appendix D.

Options

The script can be started on the command line with the following options:

--input/-i <PATH>: the file or the folder to process
--recursive/-r: Set if the folder — if given — shall be processed recursively
--extensions/-e <EXTENSION>[,<EXTENSION>]*: The extensions of files that shall be processed
--encoding/-E <ENCODING>: The assumed encoding of the files
--html/-H: Files are HTML/XML-derivatives
--text/-T: Files are plain text files
--markdown/-M: Files are markdown files
--format/-f <FORMAT>: Define the format of the replacements [‘html’, ‘unicode’, ‘text’]
--no-backup/-B: Set if no backup files shall be generated
--skip/-s <ELEMENT_NAME>[,<ELEMENT_NAME>]*: Elements which contents shall not be changed
--actions/-a <ACTION_NAME>[,<ACTION_NAME>]*: Name the actions that shall be applied
--help: Prints the help screen
--version: Prints the version

Usage Examples

degrotesque -i my_page.html -a quotes.german

Replaces single and double quotes within the file “my_page.html” by their typographic German counterparts.

degrotesque -i my_folder -r --no-backup

Applies the default actions to all files in the folder “my_folder” and all subfolders. No backup files are generated. The files format of each file is determined using the file's extension.

Application Programming Interface — API

You may as well embedd degrotesque within your own applications. The usage is very straightforward:

import degrotesque
# build the degrotesque instance with default values
degrotesque = degrotesque.Degrotesque()
# apply degroteque
plain = ' <script> if(i<0) echo "a"</script> "Hello World" '
pretty = degrotesque.prettify(plain, True)
plain = ' <script> if(i<0) echo "a"</script> "Hello World" '
pretty = degrotesque.prettify(plain, False)

The first call will deliver:

 <script> if(i<0) echo "a"</script> &ldquo;Hello World&rdquo;

while the second — as the string is interpreted as plain text, not HTML will deliver:

 <script> if(i<0) echo &ldquo;a&rdquo;</script> &ldquo;Hello World&rdquo;

what is probably not what you wished.

The default values can be replaced using some of the class' interfaces (methods):

# change the actions to apply (by naming them)
# here: apply french quotes and math symbols
degrotesque.setActions("quotes.french,math")
# change the elements which contents shall be skipped
# here: skip the contents of "code",
#  "script", and "style" elements
degrotesque.setToSkip("code,script,style")

You may as well consult the degrotesque pydoc code documentation.

Further Documentation

The complete documentation is located at:
- https://degrotesque.readthedocs.io/en/latest/ and
- https://krajzewicz.de/docs/degrotesque/index.html
Discussions are open at https://github.com/dkrajzew/degrotesque/discussions
The github repository is located at: https://github.com/dkrajzew/degrotesque
The issue tracker is located at: https://github.com/dkrajzew/degrotesque/issues
The PyPI page is located at: https://pypi.org/project/degrotesque/

Examples / Users

My own pages (https://www.krajzewicz.de/).
PaletteWB — a sophisticated palette editor for MS Windows.

Change Log

degrotesque-3.0.0 (26.03.2023)

Adding support for degrotesquing markdown files (contents of code and quotes are kept)
Added support for processing plain text files; The distinction whether a file is a plain text file or a HTML/XML derivative is done using the extension (see Appendix B for used extensions) and by evaluating the contents; Everything is replaced in text files. When processing a file as a XML/HTML derivative, elements are skipped. Introducing the options --text / -T, --markdown / -M, and --html / -H to explicitly set the file type.
Supporting different target encodings for the replacements using the --format / -f <FORMAT> option (the option --unicode / -u was removed):
- ‘unicode’: uses numeric entities (e.g. ‘–’ for an ‘—’);
- ‘html’: uses numeric entities (e.g. ‘—’ for an ‘—’);
- ‘text’: uses plain (utf-8) characters (e.g. ‘—’ for an ‘—’).
100 % test coverage :-)
renamed master branch to main

degrotesque-2.0.6 (05.02.2023)

Patched documentation (return types)
Set proper formatting for readthedocs
It's not 2.0.4 due to caching by readthedocs

degrotesque-2.0.2 (04.02.2023)

Corrected installation and execution as a console script

degrotesque-2.0 (05.01.2023)

Changed the license to BSD.
Using github actions for testing on push instead of using Travis CI
Cleaned up project tree
Adding an mkdocs documentation

Older Versions

see ChangeLog

Summary

Well, have fun. If you have any comments / ideas / issues, please submit them to degrotesque's issue tracker on github or drop me a mail.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

3.0.0

Mar 26, 2023

2.0.6

Feb 5, 2023

2.0.2

Feb 4, 2023

2.0

Jan 5, 2023

1.6

Jul 16, 2022

1.4

Jul 19, 2021

1.2

May 30, 2020

1.0

May 13, 2020

0.6

May 13, 2020

0.4

May 7, 2020

0.3

May 7, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

degrotesque-3.0.0.tar.gz (23.9 kB view details)

Uploaded Mar 26, 2023 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

degrotesque-3.0.0-py3-none-any.whl (14.6 kB view details)

Uploaded Mar 26, 2023 Python 3

File details

Details for the file degrotesque-3.0.0.tar.gz.

File metadata

Download URL: degrotesque-3.0.0.tar.gz
Upload date: Mar 26, 2023
Size: 23.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.4.1 importlib_metadata/5.2.0 pkginfo/1.7.1 requests/2.28.1 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.8.5

File hashes

Hashes for degrotesque-3.0.0.tar.gz
Algorithm	Hash digest
SHA256	`66344d3777876eb83bc997f23008913378bb9ca114494759344b55d4cab55c3c`
MD5	`fb7ee7120bd21e3e03baffb1153cf9a9`
BLAKE2b-256	`e380be93a02b1bdbfcaa8ce44b47001c521828d24c2492c61ae7e3ded4520c29`

See more details on using hashes here.

File details

Details for the file degrotesque-3.0.0-py3-none-any.whl.

File metadata

Download URL: degrotesque-3.0.0-py3-none-any.whl
Upload date: Mar 26, 2023
Size: 14.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.4.1 importlib_metadata/5.2.0 pkginfo/1.7.1 requests/2.28.1 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.8.5

File hashes

Hashes for degrotesque-3.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`51193550f8cc2197267674f14c7ae37a3f7574d7623bec1182de55454428f65a`
MD5	`bb195dc6648f66a0b365ac86af24de29`
BLAKE2b-256	`f4571d6e81143f1b37bd84a95e8e9bc8d5d1fde91d1561a8f7fb47eafeec6fb8`

See more details on using hashes here.

degrotesque 3.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Introduction

Background

Download and Installation

License

Documentation

Usage

Options

Usage Examples

Application Programming Interface — API

Further Documentation

Examples / Users

Change Log

degrotesque-3.0.0 (26.03.2023)

degrotesque-2.0.6 (05.02.2023)

degrotesque-2.0.2 (04.02.2023)

degrotesque-2.0 (05.01.2023)

Older Versions

Summary

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes