Sphinx "cjkspacer" extension
Project description
sphinxcontrib-cjkspacer
A Sphinx extension, which inserts spacer elements between the Chinese Japanese Korean (CJK) characters and the other characters.
Some of the word processors, e.g., Microsoft® Word and TeX (at least in the case of pTeX), adjust the distances (spaces) between the CJK characters and the others automatically (c.f. Requirements for Japanese Text Layout#spacing between characters).
Unfortunately, however, HTML with CSS does not have this function as of CSS3 (See the text-spacing
property discussed in some old versions of W3C® Working Draft, e.g., 1 September 2011 and 19 January 2012).
This Sphinx extension provides an alternative function to adjust such distances.
Description for Japanese
異なる種類の文字種間の空き量(スペース)を調整する機能を持たないフォーマットに、日本語を含むCJK文字とその他の文字種の間での空き量調整機能を与えるSphinx拡張です。 この拡張とsphinxcontrib-trimblankなどを併用することで、HTML出力において、数字/英語と日本語の間への手動でのスペース挿入・除去を行うよりも自然な仕上がりを実現することを目指しています(日本語によるデモ)。
ただし、現状では組版処理の要件(日本語版)に記載されているような高度な調整は行っておらず、2種の判断基準による1種類の空き量しか導入していません。 CSS3で延期された
text-spacing
が今後CSS4などで導入されればこの拡張は不要になることでしょう。
Note
This extension is inspired by sphinxcontrib-trimblank.
The combination betweeen sphinxcontrib-trimblank
and sphinxcontrib-cjkspacer
should work well for the html
builders:
sphinxcontrib-trimblank
removes redundant spaces caused by the limitation of the reStructuredText syntax, and then sphinxcontrib-cjkspacer
adjusts distances among characters (See Japanese demo).
Install
pip install sphinxcontrib-cjkspacer
Usage
Add sphinxcontrib.cjkspacer
in the extensions
list in conf.py
.
extensions += ['sphinxcontrib.cjkspacer']
Example
-
In
conf.py
extensions += ['sphinxcontrib.trimblank', 'sphinxcontrib.cjkspacer'] html_css_files = ['custom.css']
-
In
_static/custom.css
.cjkspacer:after { content: '\0020'; font-size: 50%; }
Configuration
-
cjkspacer_spacer
: (default:{'html':'<span class="cjkspacer"></span>'}
)A dictionary which has
format
:spacer_string
pairs. The value ofspacer_string
will be inserted between the CJK characters and the others when the format of the builder isformat
.By using the default value, you can use
.cjkspacer
class in your custom css as follows:The width of this type of space should be 1/4 of the width of the CJK characters, at least in the cases of Japanese language (shibu aki "四分アキ", see
cl-19 ideographic characters
:cl-27 Western characters
andcl-27
:cl-19
in Table 1 Spacing between characters). Hence, with the ordinary (half-width) space character\u0020
,.cjkspacer:after { content: '\0020'; font-size: 50%; }
may be the most preferable solution. The use of the full-width space
\u3000
,.cjkspacer:after { content: '\3000'; font-size: 25%; }
may be closer to the definition we need. In the most cases, however, we cannot specify such a small
font-size
value. Of course you can use other space characters, like the thin space character\u2009
. If you need, you can specify the width numerically as the following example:.cjkspacer { padding-right: 0.15em; }
Note that the width of
\u0020
depends on the font you use. For example,font-family width of \u0020
(eye measurement)Lucida Sans Unicode 0.31em Verdana 0.34em sans-serif 0.33em Segoe UI 0.28em Helvetica 0.28em Arial 0.33em Finally, if you cannot edit css files, the following example may be possible solution (in
conf.py
)cjkspacer_spacer_str = {'html': ' '}
This however causes selectable spaces in the text.
-
cjkspacer_cjk_characters
-
cjkspacer_before_exceptions
-
cjkspacer_after_exceptions
These three elements decide the boundaries between the CJK characters and the other characters.
If regular expressions
f'(?<![{cjkspacer_before_exceptions}{cjkspacer_cjk_characters}])(?=[{cjkspacer_cjk_characters}'])
or
f'(?<=[{cjkspacer_cjk_characters}])(?![{cjkspacer_after_exceptions}{cjkspacer_cjk_characters}])'
match parts of texts, they are regarded as the boundaries.
Default values of cjkspacer_cjk_characters
, cjkspacer_before_exceptions
, and cjkspacer_after_exceptions
In the default configuration, we employ relatively simple rules.
If a CJK character is preceded by a space ( \t\f\v
), newline (\n\r
), or opening parenthesis (({[
), we do not insert a spacer before the CJK character.
If a CJK character is followed by a space ( \t\f\v
), newline (\n\r
), closing parenthesis ()}]
), or punctuation (,.:;!?
), we do not insert a spacer after the CJK character.
Here, we do not use r'\s'
instead of ' \t\f\v'
, because r'\s'
also matches Ideographicl Space (\u3000
, ).
The following Unicode blocks are adopted as the CJK characters in the default value of cjkspacer_cjk_characters
:
-
CJK characters
from to Example Unicode block name \u2E80
\u2EFF
⺀ CJK Radicals Supplement \u2F00
\u2FDF
⼀ Kangxi Radicals \u2FF0
\u2FFF
⿰ Ideographic Description Characters \u3000
\u303F
々 CJK Symbols and Punctuation \u3040
\u309F
あ Hiragana \u30A0
\u30FF
ア Katakana \u3100
\u312F
ㄅ Bopomofo \u3130
\u318F
ㄱ Hangul Compatibility Jamo \u3190
\u319F
㆐ Kanbun \u31A0
\u31BF
ㆠ Bopomofo Extended \u31C0
\u31EF
㇀ CJK Strokes \u31F0
\u31FF
ㇰ Katakana Phonetic Extensions \u3200
\u32FF
㉑ Enclosed CJK Letters and Months \u3300
\u33FF
㎏ CJK Compatibility \u3400
\u4DBF
㐀 CJK Unified Ideographs Extension A \u4DC0
\u4DFF
䷀ Yijing Hexagram Symbols \u4E00
\u9FFF
一 CJK Unified Ideographs \uF900
\uFAFF
豈 CJK Compatibility Ideographs \uFF00
\uFF60
! Halfwidth and Fullwidth Forms (Full width Forms) \uFFE0
\uFFE6
¢ Halfwidth and Fullwidth Forms (Full width Forms) \U00020000
\U0002A6DF
𠀀 CJK Unified Ideographs Extension B \U0002A700
\U0002B73F
𪜀 CJK Unified Ideographs Extension C \U0002B740
\U0002B81F
𫝀 CJK Unified Ideographs Extension D \U0002B820
\U0002CEAF
𫠠 CJK Unified Ideographs Extension E \U0002CEB0
\U0002EBEF
𬺰 CJK Unified Ideographs Extension F \U0002F800
\U0002FA1F
丽 CJK Compatibility Ideographs Supplement \U00030000
\U0003134F
𰀀 CJK Unified Ideographs Extension G
The following block is also included into cjkspacer_cjk_characters
for consistency with Enclosed CJK Letters and Months.
-
Treated as CJK characters
from to Example Unicode block name \u2460
\u24FF
① Enclosed Alphanumerics
The following characters are eliminated from cjkspacer_cjk_characters
since they are spaces, punctuation, and parentheses.
Instead, they are included into cjkspacer_before_exceptions
and cjkspacer_after_exceptions
.
-
Exceptions among CJK symbols and punctuation (
\u3000-\u303F
)Unicode Character Name \u3000
Ideographicl Space \u3001
、 Ideographic Comma \u3002
。 Ideographic Full Stop \u3008
〈 Left Angle Bracket \u3009
〉 Right Angle Bracket \u300A
《 Left Double Angle Bracket \u300B
》 Right Double Angle Bracket \u300C
「 Left Corner Bracket \u300D
」 Right Corner Bracket \u300E
『 Left White Corner Bracket \u300F
』 Right White Corner Bracket \u3010
【 Left Black Lenticular Bracket \u3011
】 Right Black Lenticular Bracket \u3014
〔 Left Tortoise Shell Bracket \u3015
〕 Right Tortoise Shell Bracket \u3016
〖 Left White Lenticular Bracket \u3017
〗 Right White Lenticular Bracket \u3018
〘 Left White Turtoise Shell Bracket \u3019
〙 Right White Turtoise Shell Bracket \u301A
〚 Left White Square Bracket \u301B
〛 Right White Square Bracket -
Exceptions among Katakana (
\u30A0-\u30FF
)Unicode Character Name \u30FB
・ Katakana Middle Dot -
Exceptions among Halfwidth and Fullwidth Forms (
\uFF00-\uFF60
,\uFFE0-\uFFE6
)Unicode Character Name \uFF01
! Fullwidth Exclamation Mark \uFF02
" Fullwidth Quotation Mark \uFF07
' Fullwidth Apostrophe \uFF08
( Fullwidth Left Parenthesis \uFF09
) Fullwidth RIght Parenthesis \uFF0C
, Fullwidth Comma \uFF0E
. Fullwidth Full Stop \uFF0F
/ Fullwidth Solidus \uFF1A
: Fullwidth Colon \uFF1B
; Fullwidth Semicolon \uFF1F
? Fullwidth Question Mark \uFF3B
[ Fullwidth Left Square Bracket \uFF3C
\ Fullwidth Reverse Solidus \uFF3D
] Fullwidth Right Square Bracket \uFF5B
{ Fullwidth Left Curly Bracket \uFF5C
| Fullwidth Vertical Line \uFF5D
} Fullwidth Right Curly Bracket \uFF5F
⦅ Fullwidth Left White Parenthesis \uFF60
⦆ Fullwidth Right White Parenthesis
Thus, we set the following as the default configuration.
cjkspacer_cjk_characters = r'\u2460-\u24FF\u2E80-\u2FFF\u3003-\u3007\u3012\u3013\u301C-\u30FA\u30FC-\u9FFF\uF900-\uFAFF\uFF00\uFF03-\uFF06\uFF0A\uFF0B\uFF0D\uFF10-\uFF19\uFF1C\uFF1D\uFF1E\uFF20-\uFF3A\uFF3E-\uFF5A\uFF5E\uFFE0-\uFFE6\U00020000-\U0002A6DF\U0002A700-\U0002EBEF\U0002F800-\U0002FA1F\U00030000-\U0003134F'
cjkspacer_before_exceptions = ' \t\f\v\n\r' + r'({\[\u3000\u3001\u3002\u3008-\u3011\u3014-\u301B\u30FB\uFF01\uFF02\uFF07\uFF08\uFF09\uFF0C\uFF0E\uFF0F\uFF1A\uFF1B\uFF1F\uFF3B\uFF3C\uFF3D\uFF5B\uFF5C\uFF5D\uFF5F\uFF60'
cjkspacer_after_exceptions = ' \t\f\v\n\r' + r')}\],.:;!?\u3000\u3001\u3002\u3008-\u3011\u3014-\u301B\u30FB\uFF01\uFF02\uFF07\uFF08\uFF09\uFF0C\uFF0E\uFF0F\uFF1A\uFF1B\uFF1F\uFF3B\uFF3C\uFF3D\uFF5B\uFF5C\uFF5D\uFF5F\uFF60'
License
MIT License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file sphinxcontrib-cjkspacer-0.4.3.tar.gz
.
File metadata
- Download URL: sphinxcontrib-cjkspacer-0.4.3.tar.gz
- Upload date:
- Size: 14.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.3.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.0 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 56b15fb3c03db6edce8606ed052a008e7f51aed876baefe5378a9ce96d3a94e5 |
|
MD5 | 412f3676ef8c2011fd9f9f5051639169 |
|
BLAKE2b-256 | 1f6da5273fe7c2bb66a6796ad712aab5c90603717aa0e11e19a566f41e5e4f1f |
File details
Details for the file sphinxcontrib_cjkspacer-0.4.3-py3-none-any.whl
.
File metadata
- Download URL: sphinxcontrib_cjkspacer-0.4.3-py3-none-any.whl
- Upload date:
- Size: 10.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.3.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.0 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7ff0c62f338fd6afe9ab4732ad397611e4764f8241223382c33d4c48a90f1bdd |
|
MD5 | 415b450a1f9de3c79a8cfbfbd90d2ab1 |
|
BLAKE2b-256 | f32c381798a32fada1cfebdf0deb641b6b03574a60eb6dcf0ca16276816b619c |