Skip to main content

Add soft hyphens to HTML documents

Project description

shyster

The problem this package is trying to solve is that while I can set hyphens: auto; in CSS, many browsers do a poor job of hyphenating Finnish. Even if they have Finnish hyphenation patterns, they often fail to recognise compound words, which should be hyphenated at compound boundaries (saippua-kauppias, not saip-pua-kaup-pias). One solution is to set hyphens: manual; and add soft hyphens at acceptable hyphenation spots.

Install

pip install shyster

How to use

One top-level function does it all:

import shyster
shyster.hyphenate_html_file('input.html', 'output.html', 'patterns/hyphen.tex')

If more control is needed:

hyph_fi = hyphenator('patterns/hyph-fi.tex', righthyphenmin=2)

[hyph_fi(word) for word in 
 'Jukolan talo, eteläisessä Hämeessä, seisoo erään mäen pohjaisella rinteellä, liki Toukolan kylää'\
 .replace(',','').split()]
['Ju-ko-lan',
 'ta-lo',
 'ete-läi-ses-sä',
 'Hä-mees-sä',
 'sei-soo',
 'erään',
 'mäen',
 'poh-jai-sel-la',
 'rin-teel-lä',
 'li-ki',
 'Tou-ko-lan',
 'ky-lää']
html = """
<!doctype html><title>Seitsemän veljestä</title>
<script>var veljekset = 7;</script>
<body>
<p style="margin-top: 2em">Jukolan talo, eteläisessä Hämeessä, seisoo erään mäen pohjaisella
rinteellä, liki Toukolan kylää. Sen läheisin ympäristö on kivinen
tanner, mutta alempana alkaa pellot, joissa, ennenkuin talo oli häviöön
mennyt, aaltoili teräinen vilja.</p>
</body>
"""
soup = BeautifulSoup(html, 'lxml')
hyphenate_soup(soup, hyph_fi)
print(str(soup))
<!DOCTYPE html>
<html><head><title>Seit-se-män vel-jes-tä</title>
<script>var veljekset = 7;</script>
</head><body>
<p style="margin-top: 2em">Ju-ko-lan ta-lo, ete-läi-ses-sä Hä-mees-sä, sei-soo erään mäen poh-jai-sel-la
rin-teel-lä, li-ki Tou-ko-lan ky-lää. Sen lä-hei-sin ym-pä-ris-tö on ki-vi-nen
tan-ner, mut-ta alem-pa-na al-kaa pel-lot, jois-sa, en-nen-kuin ta-lo oli hä-vi-öön
men-nyt, aal-toi-li te-räi-nen vil-ja.</p>
</body>
</html>
pat, ex = read_patterns(open('patterns/hyphen.tex').readlines())
trie = convert_patterns(pat)
ex = convert_exceptions(ex)
del ex['present'] # remove an exception
ex['shyster'] = ('shy', 'ster')  # add or alter an exception
ex['lawyer'] = ('l', 'a', 'w', 'y', 'e', 'r')  # exceptions even override {left,right}hyphenmin

hyph_en = hyphenator(None, hyphen='•')
hyph_en.trie = trie
hyph_en.exceptions = ex

import textwrap
textwrap.wrap(' '.join(hyph_en(match.group(0)) 
                       for match in re.finditer(r'[\w]+', '''
shyster: noun; 1. someone, possibly a lawyer, who behaves in an unscrupulous way;
2. the present Python library
''')))
['shy•ster noun 1 some•one pos•si•bly a l•a•w•y•e•r who be•haves in an',
 'un•scrupu•lous way 2 the pre•sent Python li•brary']

Copying

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see https://www.gnu.org/licenses/.

The above does not apply to the files in patterns/, which are distributed with this program as example input files. The Finnish patterns are covered by the terms “Patterns may be freely distributed” and the English ones by “Unlimited copying and redistribution of this file are permitted as long as this file is not modified.”

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

shyster-0.0.3.tar.gz (22.8 kB view details)

Uploaded Source

Built Distribution

shyster-0.0.3-py3-none-any.whl (21.7 kB view details)

Uploaded Python 3

File details

Details for the file shyster-0.0.3.tar.gz.

File metadata

  • Download URL: shyster-0.0.3.tar.gz
  • Upload date:
  • Size: 22.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.5

File hashes

Hashes for shyster-0.0.3.tar.gz
Algorithm Hash digest
SHA256 f8e7a47f6386886a931d1d3efdd43fd2de2870addcc2a9d38d7120d5dfca65db
MD5 9500697d2f7cb4203d5efb65e232b271
BLAKE2b-256 004c0ec11c77cb7a9c288e482e8c1492bf7b3f4599cb7c8bdb58e66ee7abaed9

See more details on using hashes here.

File details

Details for the file shyster-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: shyster-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 21.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.5

File hashes

Hashes for shyster-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 699fb5c92744845a9b9b0b31c6c9cdf5eb7fb6abeb14927d65d9226d20990a2d
MD5 79ce44159ee4fb03d784cdffe7d582b8
BLAKE2b-256 520b8bf5f4c649d26d6a3019b0dcb7b6adc519c0f5802f897b839a651f128328

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page