Sphinx "cjkspacer" extension
Project description
sphinxcontrib-cjkspacer
A Sphinx extension, which inserts spacer elements between the Chinese Japanese Korean (CJK) characters and the other characters.
Some of the word processors, e.g., Microsoft® Word and TeX (at least in the case of pTeX), adjust the distances (spaces) between the CJK characters and the others automatically (c.f. Requirements for Japanese Text Layout#spacing between characters).
Unfortunately, however, HTML with CSS does not have this function as of CSS3 (See the text-spacing property discussed in some old versions of W3C® Working Draft, e.g., 1 September 2011 and 19 January 2012).
This Sphinx extension provides an alternative function to adjust such distances.
Description for Japanese
異なる種類の文字種間の空き量(スペース)を調整する機能を持たないフォーマットに、日本語を含むCJK文字とその他の文字種の間での空き量調整機能を与えるSphinx拡張です。 この拡張とsphinxcontrib-trimblankなどを併用することで、HTML出力において、数字/英語と日本語の間への手動でのスペース挿入・除去を行うよりも自然な仕上がりを実現することを目指しています(日本語によるデモ)。
ただし、現状では組版処理の要件(日本語版)に記載されているような高度な調整は行っておらず、2種の判断基準による1種類の空き量しか導入していません。 CSS3で延期された
text-spacingが今後CSS4などで導入されればこの拡張は不要になることでしょう。
Note
This extension is inspired by sphinxcontrib-trimblank.
The combination betweeen sphinxcontrib-trimblank and sphinxcontrib-cjkspacer should work well for the html builders:
sphinxcontrib-trimblank removes redundant spaces caused by the limitation of the reStructuredText syntax, and then sphinxcontrib-cjkspacer adjusts distances among characters (See Japanese demo).
Install
pip install sphinxcontrib-cjkspacer
Usage
Add sphinxcontrib.cjkspacer in the extensions list in conf.py.
extensions += ['sphinxcontrib.cjkspacer']
Example
-
In
conf.pyextensions += ['sphinxcontrib.trimblank', 'sphinxcontrib.cjkspacer'] html_css_files = ['custom.css']
-
In
_static/custom.css.cjkspacer:after { content: '\0020'; font-size: 50%; }
Configuration
-
cjkspacer_spacer: (default:{'html':'<span class="cjkspacer"></span>'})A dictionary which has
format:spacer_stringpairs. The value ofspacer_stringwill be inserted between the CJK characters and the others when the format of the builder isformat.By using the default value, you can use
.cjkspacerclass in your custom css as follows:The width of this type of space should be 1/4 of the width of the CJK characters, at least in the cases of Japanese language (shibu aki "四分アキ", see
cl-19 ideographic characters:cl-27 Western charactersandcl-27:cl-19in Table 1 Spacing between characters). Hence, with the ordinary (half-width) space character\u0020,.cjkspacer:after { content: '\0020'; font-size: 50%; }
may be the most preferable solution. The use of the full-width space
\u3000,.cjkspacer:after { content: '\3000'; font-size: 25%; }
may be closer to the definition we need. In the most cases, however, we cannot specify such a small
font-sizevalue. Of course you can use other space characters, like the thin space character\u2009. If you need, you can specify the width numerically as the following example:.cjkspacer { padding-right: 0.15em; }
Note that the width of
\u0020depends on the font you use. For example,font-family width of \u0020(eye measurement)Lucida Sans Unicode 0.31em Verdana 0.34em sans-serif 0.33em Segoe UI 0.28em Helvetica 0.28em Arial 0.33em Finally, if you cannot edit css files, the following example may be possible solution (in
conf.py)cjkspacer_spacer_str = {'html': ' '}
This however causes selectable spaces in the text.
-
cjkspacer_cjk_characters -
cjkspacer_before_exceptions -
cjkspacer_after_exceptionsThese three elements decide the boundaries between the CJK characters and the other characters.
If regular expressions
f'(?<![{cjkspacer_before_exceptions}{cjkspacer_cjk_characters}])(?=[{cjkspacer_cjk_characters}'])
or
f'(?<=[{cjkspacer_cjk_characters}])(?![{cjkspacer_after_exceptions}{cjkspacer_cjk_characters}])'
match parts of texts, they are regarded as the boundaries.
Default values of cjkspacer_cjk_characters, cjkspacer_before_exceptions, and cjkspacer_after_exceptions
In the default configuration, we employ relatively simple rules.
If a CJK character is preceded by a space ( \t\f\v), newline (\n\r), or opening parenthesis (({[), we do not insert a spacer before the CJK character.
If a CJK character is followed by a space ( \t\f\v), newline (\n\r), closing parenthesis ()}]), or punctuation (,.:;!?), we do not insert a spacer after the CJK character.
Here, we do not use r'\s' instead of ' \t\f\v', because r'\s' also matches Ideographicl Space (\u3000, ).
The following Unicode blocks are adopted as the CJK characters in the default value of cjkspacer_cjk_characters:
-
CJK characters
from to Example Unicode block name \u2E80\u2EFF⺀ CJK Radicals Supplement \u2F00\u2FDF⼀ Kangxi Radicals \u2FF0\u2FFF⿰ Ideographic Description Characters \u3000\u303F々 CJK Symbols and Punctuation \u3040\u309Fあ Hiragana \u30A0\u30FFア Katakana \u3100\u312Fㄅ Bopomofo \u3130\u318Fㄱ Hangul Compatibility Jamo \u3190\u319F㆐ Kanbun \u31A0\u31BFㆠ Bopomofo Extended \u31C0\u31EF㇀ CJK Strokes \u31F0\u31FFㇰ Katakana Phonetic Extensions \u3200\u32FF㉑ Enclosed CJK Letters and Months \u3300\u33FF㎏ CJK Compatibility \u3400\u4DBF㐀 CJK Unified Ideographs Extension A \u4DC0\u4DFF䷀ Yijing Hexagram Symbols \u4E00\u9FFF一 CJK Unified Ideographs \uF900\uFAFF豈 CJK Compatibility Ideographs \uFF00\uFF60! Halfwidth and Fullwidth Forms (Full width Forms) \uFFE0\uFFE6¢ Halfwidth and Fullwidth Forms (Full width Forms) \U00020000\U0002A6DF𠀀 CJK Unified Ideographs Extension B \U0002A700\U0002B73F𪜀 CJK Unified Ideographs Extension C \U0002B740\U0002B81F𫝀 CJK Unified Ideographs Extension D \U0002B820\U0002CEAF𫠠 CJK Unified Ideographs Extension E \U0002CEB0\U0002EBEF𬺰 CJK Unified Ideographs Extension F \U0002F800\U0002FA1F丽 CJK Compatibility Ideographs Supplement \U00030000\U0003134F𰀀 CJK Unified Ideographs Extension G
The following block is also included into cjkspacer_cjk_characters for consistency with Enclosed CJK Letters and Months.
-
Treated as CJK characters
from to Example Unicode block name \u2460\u24FF① Enclosed Alphanumerics
The following characters are eliminated from cjkspacer_cjk_characters since they are spaces, punctuation, and parentheses.
Instead, they are included into cjkspacer_before_exceptions and cjkspacer_after_exceptions.
-
Exceptions among CJK symbols and punctuation (
\u3000-\u303F)Unicode Character Name \u3000Ideographicl Space \u3001、 Ideographic Comma \u3002。 Ideographic Full Stop \u3008〈 Left Angle Bracket \u3009〉 Right Angle Bracket \u300A《 Left Double Angle Bracket \u300B》 Right Double Angle Bracket \u300C「 Left Corner Bracket \u300D」 Right Corner Bracket \u300E『 Left White Corner Bracket \u300F』 Right White Corner Bracket \u3010【 Left Black Lenticular Bracket \u3011】 Right Black Lenticular Bracket \u3014〔 Left Tortoise Shell Bracket \u3015〕 Right Tortoise Shell Bracket \u3016〖 Left White Lenticular Bracket \u3017〗 Right White Lenticular Bracket \u3018〘 Left White Turtoise Shell Bracket \u3019〙 Right White Turtoise Shell Bracket \u301A〚 Left White Square Bracket \u301B〛 Right White Square Bracket -
Exceptions among Katakana (
\u30A0-\u30FF)Unicode Character Name \u30FB・ Katakana Middle Dot -
Exceptions among Halfwidth and Fullwidth Forms (
\uFF00-\uFF60,\uFFE0-\uFFE6)Unicode Character Name \uFF01! Fullwidth Exclamation Mark \uFF02" Fullwidth Quotation Mark \uFF07' Fullwidth Apostrophe \uFF08( Fullwidth Left Parenthesis \uFF09) Fullwidth RIght Parenthesis \uFF0C, Fullwidth Comma \uFF0E. Fullwidth Full Stop \uFF0F/ Fullwidth Solidus \uFF1A: Fullwidth Colon \uFF1B; Fullwidth Semicolon \uFF1F? Fullwidth Question Mark \uFF3B[ Fullwidth Left Square Bracket \uFF3C\ Fullwidth Reverse Solidus \uFF3D] Fullwidth Right Square Bracket \uFF5B{ Fullwidth Left Curly Bracket \uFF5C| Fullwidth Vertical Line \uFF5D} Fullwidth Right Curly Bracket \uFF5F⦅ Fullwidth Left White Parenthesis \uFF60⦆ Fullwidth Right White Parenthesis
Thus, we set the following as the default configuration.
cjkspacer_cjk_characters = r'\u2460-\u24FF\u2E80-\u2FFF\u3003-\u3007\u3012\u3013\u301C-\u30FA\u30FC-\u9FFF\uF900-\uFAFF\uFF00\uFF03-\uFF06\uFF0A\uFF0B\uFF0D\uFF10-\uFF19\uFF1C\uFF1D\uFF1E\uFF20-\uFF3A\uFF3E-\uFF5A\uFF5E\uFFE0-\uFFE6\U00020000-\U0002A6DF\U0002A700-\U0002EBEF\U0002F800-\U0002FA1F\U00030000-\U0003134F'
cjkspacer_before_exceptions = ' \t\f\v\n\r' + r'({\[\u3000\u3001\u3002\u3008-\u3011\u3014-\u301B\u30FB\uFF01\uFF02\uFF07\uFF08\uFF09\uFF0C\uFF0E\uFF0F\uFF1A\uFF1B\uFF1F\uFF3B\uFF3C\uFF3D\uFF5B\uFF5C\uFF5D\uFF5F\uFF60'
cjkspacer_after_exceptions = ' \t\f\v\n\r' + r')}\],.:;!?\u3000\u3001\u3002\u3008-\u3011\u3014-\u301B\u30FB\uFF01\uFF02\uFF07\uFF08\uFF09\uFF0C\uFF0E\uFF0F\uFF1A\uFF1B\uFF1F\uFF3B\uFF3C\uFF3D\uFF5B\uFF5C\uFF5D\uFF5F\uFF60'
License
MIT License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sphinxcontrib-cjkspacer-0.4.3.tar.gz.
File metadata
- Download URL: sphinxcontrib-cjkspacer-0.4.3.tar.gz
- Upload date:
- Size: 14.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.3.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.0 CPython/3.8.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
56b15fb3c03db6edce8606ed052a008e7f51aed876baefe5378a9ce96d3a94e5
|
|
| MD5 |
412f3676ef8c2011fd9f9f5051639169
|
|
| BLAKE2b-256 |
1f6da5273fe7c2bb66a6796ad712aab5c90603717aa0e11e19a566f41e5e4f1f
|
File details
Details for the file sphinxcontrib_cjkspacer-0.4.3-py3-none-any.whl.
File metadata
- Download URL: sphinxcontrib_cjkspacer-0.4.3-py3-none-any.whl
- Upload date:
- Size: 10.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.3.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.0 CPython/3.8.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7ff0c62f338fd6afe9ab4732ad397611e4764f8241223382c33d4c48a90f1bdd
|
|
| MD5 |
415b450a1f9de3c79a8cfbfbd90d2ab1
|
|
| BLAKE2b-256 |
f32c381798a32fada1cfebdf0deb641b6b03574a60eb6dcf0ca16276816b619c
|