Skip to main content

The hyphenation library of LibreOffice and FireFox wrapped for Python

Project description

  1. 2008-2021 PyHyphen developers

Contact: fhaxbox66@gmail.com

Project home: https://github.com/dr-leo/PyHyphen

Mailing list: https://groups.google.com/group/pyhyphen

0. Quickstart

With Python 3.7 or higher and a current version of pip, issue:

$ pip install pyhyphen
$ python
>>> from hyphen import Hyphenator
>>> # Download and install the hyphenation dict for German, if needed
>>> h = Hyphenator('de_DE') # `language`defaults to 'en_US'
>>> s = 'Politikverdrossenheit'
>>> h.pairs(s)
[['Po', 'litikverdrossenheit'],
['Poli', 'tikverdrossenheit'],
['Politik', 'verdrossenheit'],
['Politikver', 'drossenheit'],
['Politikverdros', 'senheit'],
['Politikverdrossen', 'heit']]
>>> h.syllables(s)
['Po', 'li', 'tik', 'ver', 'dros', 'sen', 'heit']
>>> h.wrap(s, 5)
['Poli-', 'tikverdrossenheit']

1. Overview

Pyhyphen is a pythonic interface to the hyphenation library used in projects such as LibreOffice and the Mozilla suite. It comes with tools to download, install and uninstall hyphenation dictionaries from LibreOffice’s Git repository. PyHyphen provides the hyphen package.

hyphen.textwrap2 is a modified version of the familiar textwrap module which wraps a text with hyphenation given a specified width. See the code example below.

PyHyphen supports Python 3.7 or higher.

1.1 Content of the hyphen package

The ‘hyphen’ package contains the following:

  • the hyphen.Hyphenator class: each instance of it can hyphenate and wrap words using a dictionary compatible with the hyphenation feature of LibreOffice and Mozilla. Required dictionaries are automatically downloaded at runtime, if not already installed.

  • the dictools module contains useful functions such as for downloading and installing dictionaries from a configurable repository. After installation of PyHyphen, the LibreOffice repository is used by default. Dictionaries are storedin the platform-specific user’s app directory.

  • ‘hyphen.hnj’ is the C extension module that does all the ground work. It contains the high quality C library libhyphen. It supports hyphenation with replacements as well as compound words.

1.2 The ‘textwrap2’ module

This module is an enhanced, though backwards-compatible version of the module ‘textwrap’ from the Python standard library. Unsurprisingly, it adds hyphenation functionality to ‘textwrap’. To this end, a new key word parameter use_hyphenator has been added to the __init__ constructor of the TextWrapper class which defaults to None. It can be initialized with any hyphenator object.

2. Code examples

>>> from hyphen import Hyphenator
# Create some hyphenators
h_de = Hyphenator('de_DE')
h_en = Hyphenator('en_US')

# Now hyphenate some words
h_en.pairs('beautiful'
[['beau', 'tiful'], ['beauti', 'ful']]

h_en.wrap('beautiful', 6)
['beau-', 'tiful']

h_en.wrap('beautiful', 7)
['beauti-', 'ful']

h_en.syllables('beautiful')
['beau', 'ti', 'ful']

>>> from hyphen.textwrap2 import fill
>>> long_text = 'Lorem ipsum dolor sit amet, consectetur adipiscing elit. Fusce vehicula rhoncus nulla et vulputate. In et risus dignissim erat dapibus iaculis ac ut nunc. Etiam vestibulum elit eget purus fermentum, eu finibus velit eleifend.'
>>> print(fill(long_text, width=40, use_hyphenator=h_en))
Lorem ipsum dolor sit amet, consectetur
adipiscing elit. Fusce vehicula rhoncus
nulla et vulputate. In et risus dignis-
sim erat dapibus iaculis ac ut nunc.
Etiam vestibulum elit eget purus fermen-
tum, eu finibus velit eleifend.

Just by creating Hyphenator objects for a language, the corresponding dictionaries will be automatically downloaded. For the HTTP connection to the LibreOffice server, PyHyphen uses the familiar`requests <https://www.python-requests.org>`_ library. Requests are fully configurable to handle proxies etc. Alternatively, dictionaries may be manually installed and listed with the dictools module:

>>> from hyphen.dictools import *

# Download and install some dictionaries in the default directory using the default
# repository, usually the LibreOffice website
>>> for lang in ['de_DE', 'en_US']:
    install(lang) # provide kwargs to configure the HTTP request

# Show locales of installed dictionaries
>>> list_installed()
['de', 'de_DE', 'en_PH', 'en_US']

3. Installation

PyHyphen is pip-installable from PyPI. In most scenarios the easiest way to install PyHyphen is to type from the shell prompt:

$ pip install pyhyphen

Besides the source distribution, there is a wheel on PyPI for Windows. As the C extension uses the limited C API, the wheel should work on all Python versions >= 3.7.

Building PyHyphen from source under Linux or MacOS should be straightforward. On Windows, the wheel isinstalled by default, so no C compiler is needed.

4. Managing dictionaries

The dictools module contains a non-exhaustive list of available language strings that can be used to instantiate Hyphenator objects as shown above:

>>> from hyphen import dictools
>>> dictools.LANGUAGES
['af_ZA', 'an_ES', 'ar', 'be_BY', 'bg_BG', 'bn_BD', 'br_FR', 'ca', 'cs_C
Z', 'da_DK', 'de', 'el_GR', 'en', 'es_ES', 'et_EE', 'fr_FR', 'gd_GB', 'gl', 'gu_
IN', 'he_IL', 'hi_IN', 'hr_HR', 'hu_HU', 'it_IT', 'ku_TR', 'lt_LT', 'lv_LV', 'ne
_NP', 'nl_NL', 'no', 'oc_FR', 'pl_PL', 'prj', 'pt_BR', 'pt_PT', 'ro', 'ru_RU', '
si_LK', 'sk_SK', 'sl_SI', 'sr', 'sv_SE', 'sw_TZ', 'te_IN', 'th_TH', 'uk_UA', 'zu
_ZA']

The downloaded dictionary files are stored in a local data folder, along with a dictionaries.json file that lists the downloaded files and the associated locales:

$ ls ~/.local/share/pyhyphen
dictionaries.json  hyph_de_DE.dic  hyph_en_US.dic

$ cat ~/.local/share/pyhyphen/dictionaries.json
{
  "de": {
    "file": "hyph_de_DE.dic",
    "url": "http://cgit.freedesktop.org/libreoffice/dictionaries/plain/de/hyph_de_DE.dic"
  },
  "de_DE": {
    "file": "hyph_de_DE.dic",
    "url": "http://cgit.freedesktop.org/libreoffice/dictionaries/plain/de/hyph_de_DE.dic"
  },
  "en_PH": {
    "file": "hyph_en_US.dic",
    "url": "http://cgit.freedesktop.org/libreoffice/dictionaries/plain/en/hyph_en_US.dic"
  },
  "en_US": {
    "file": "hyph_en_US.dic",
    "url": "http://cgit.freedesktop.org/libreoffice/dictionaries/plain/en/hyph_en_US.dic"
  }
}

Each entry of the dictionaries.json file contains both the path to the dictionary file and the url from which it was downloaded.

5. Contributing and reporting bugs

Questions can be asked in the Google group (https://groups.google.com/group/pyhyphen). Or just send an e-mail to the authors.

Browse or fork the repository and report bugs at PyHyphen’s project site on Github.

Before submitting a PR, run the unit tests:

$ make test

6. License

Without prejudice to third party licenses, PyHyphen is distributed under the Apache 2.0 license. PyHyphen ships with third party code including the hyphenation library hyphen.c and a patched version of the Python standard module textwrap.

7. Changelog

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

PyHyphen-4.0.3.tar.gz (40.5 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

PyHyphen-4.0.3-cp37-abi3-win_amd64.whl (38.8 kB view details)

Uploaded CPython 3.7+Windows x86-64

PyHyphen-4.0.3-cp37-abi3-macosx_10_14_x86_64.whl (37.8 kB view details)

Uploaded CPython 3.7+macOS 10.14+ x86-64

File details

Details for the file PyHyphen-4.0.3.tar.gz.

File metadata

  • Download URL: PyHyphen-4.0.3.tar.gz
  • Upload date:
  • Size: 40.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/3.7.3 pkginfo/1.8.2 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.9.7

File hashes

Hashes for PyHyphen-4.0.3.tar.gz
Algorithm Hash digest
SHA256 8a15a4ffb9812f3eb8c6b75eb314e9fc54039107fa990ac9f9200f70a5c8fc9f
MD5 365ac0309a92f0b5d44010145d9403b3
BLAKE2b-256 ff7dba7934f7ca8ce1a8467172d492c13501ff31c5e875660f5536137c6d595f

See more details on using hashes here.

File details

Details for the file PyHyphen-4.0.3-cp37-abi3-win_amd64.whl.

File metadata

  • Download URL: PyHyphen-4.0.3-cp37-abi3-win_amd64.whl
  • Upload date:
  • Size: 38.8 kB
  • Tags: CPython 3.7+, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/3.7.3 pkginfo/1.8.2 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.9.7

File hashes

Hashes for PyHyphen-4.0.3-cp37-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 189ba54210ecccef9a1c3ec7a23f40cde59cffad3a4b0c9d6afa2edcb114eb30
MD5 4299785b65ee526c5dd4541a2d68f7d2
BLAKE2b-256 eefe2003cb74feb9641865d2cb826f41db404c72d21787a9105efafa7b93c85f

See more details on using hashes here.

File details

Details for the file PyHyphen-4.0.3-cp37-abi3-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: PyHyphen-4.0.3-cp37-abi3-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 37.8 kB
  • Tags: CPython 3.7+, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/3.7.3 pkginfo/1.8.2 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.9.7

File hashes

Hashes for PyHyphen-4.0.3-cp37-abi3-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 2c649c1557f35930f06e42fadbf975edd7c37baa5dc162c19a853ecbe720adaf
MD5 af22d1db7ea7eb6a5e8f699130f4c7bd
BLAKE2b-256 7814d3959c7ea818925e1f9de2b9c19597ec5aa063248ba9a9e4ea5120f11b3b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page