Skip to main content

Measures the displayed width of unicode strings in a terminal

Project description

Downloads codecov.io Code Coverage MIT License

Introduction

This library is mainly for CLI/TUI programs that carefully produce output for Terminals.

Installation

The stable version of this package is maintained on pypi, install or upgrade, using pip:

pip install --upgrade wcwidth

Problem

All Python string-formatting functions, textwrap.wrap(), str.ljust(), str.rjust(), and str.center() incorrectly measure the displayed width of a string as equal to the number of their codepoints.

Some examples of incorrect results:

>>> # result consumes 16 total cells, 11 expected,
>>> 'コンニチハ'.rjust(11, 'X')
'XXXXXXコンニチハ'

>>> # result consumes 5 total cells, 6 expected,
>>> 'café'.center(6, 'X')
'caféX'

Solution

The lowest-level functions in this library are the POSIX.1-2001 and POSIX.1-2008 wcwidth(3) and wcswidth(3), which this library precisely copies by interface as wcwidth() and wcswidth(). These functions return -1 when C0 and C1 control codes are present.

An easy-to-use width() function is provided as a wrapper of wcswidth() that is also capable of measuring most terminal control codes and sequences, like colors, bold, tabstops, and horizontal cursor movement.

Text-justification is solved by the grapheme and sequence-aware functions ljust(), rjust(), center(), and wrap(), serving as drop-in replacements to python standard functions of the same names.

The iterator functions iter_graphemes() and iter_sequences() allow for careful navigation of grapheme and terminal control sequence boundaries. iter_graphemes_reverse(), and grapheme_boundary_before() are useful for editing and searching of complex unicode. The clip() function extracts substrings by display column positions, and strip_sequences() removes terminal escape sequences from text altogether.

Discrepancies

You may find that support varies for complex unicode sequences or codepoints.

A companion utility, jquast/ucs-detect was authored to gather and publish the results of Wide character, language/grapheme clustering and complex script support, emojis and zero-width joiner, variations, and regional indicator (flags) as a General Tabulated Summary by terminal emulator software and version.

Overview

wcwidth()

Use function wcwidth() to determine the length of a single unicode codepoint.

A brief overview, through examples, for all of the public API functions.

Full API Documentation at https://wcwidth.readthedocs.io/en/latest/api.html

wcwidth()

Measures width of a single codepoint,

>>> # '♀' narrow emoji
>>> wcwidth.wcwidth('\u2640')
1

Use function wcwidth() to determine the length of a single unicode character.

See specification of character measurements. Note that -1 is returned for control codes.

wcswidth()

Measures width of a string, returns -1 for control codes.

>>> # '♀️' emoji w/vs-16
>>> wcwidth.wcswidth('\u2640\ufe0f')
2

Use function wcswidth() to determine the length of many, a string of unicode characters.

See specification of character measurements. Note that -1 is returned if control codes occurs anywhere in the string.

width()

Use function width() to measure a string with improved handling of control_codes.

>>> # same support as wcswidth(), eg. regional indicator flag:
>>> wcwidth.width('\U0001F1FF\U0001F1FC')
2
>>> # but also supports SGR colored text, 'WARN', followed by SGR reset
>>> wcwidth.width('\x1b[38;2;255;150;100mWARN\x1b[0m')
4
>>> # tabs,
>>> wcwidth.width('\t', tabsize=4)
4
>>> # or, tab and all other control characters can be ignored
>>> wcwidth.width('\t', control_codes='ignore')
0
>>> # "vertical" control characters are ignored
>>> wcwidth.width('\n')
0
>>> # as well as sequences with "indeterminate" effects like Home + Clear
>>> wcwidth.width('\x1b[H\x1b[2J')
0
>>> # or, raise ValueError for "indeterminate" effects using control_codes='strict'
>>> wcwidth.width('\n', control_codes='strict')
Traceback (most recent call last):
...
ValueError: Vertical movement character 0xa at position 0

Use control_codes='ignore' when the input is known not to contain any control characters or terminal sequences for slightly improved performance. Note that TAB ('\t') is a control character and is also ignored, you may want to use str.expandtabs(), first.

iter_sequences()

Iterates through text, segmented by terminal sequence,

>>> list(wcwidth.iter_sequences('hello'))
[('hello', False)]
>>> list(wcwidth.iter_sequences('\x1b[31mred\x1b[0m'))
[('\x1b[31m', True), ('red', False), ('\x1b[0m', True)]

Use iter_sequences() to split text into segments of plain text and escape sequences. Each tuple contains the segment string and a boolean indicating whether it is an escape sequence (True) or text (False).

iter_graphemes()

Use iter_graphemes() to iterate over grapheme clusters of a string.

>>> from wcwidth import iter_graphemes
>>> # ok + Regional Indicator 'Z', 'W' (Zimbabwe)
>>> list(wcwidth.iter_graphemes('ok\U0001F1FF\U0001F1FC'))
['o', 'k', '🇿🇼']

>>> # cafe + combining acute accent
>>> list(wcwidth.iter_graphemes('cafe\u0301'))
['c', 'a', 'f', 'é']

>>> # ok + Emoji Man + ZWJ + Woman + ZWJ + Girl
>>> list(wcwidth.iter_graphemes('ok\U0001F468\u200D\U0001F469\u200D\U0001F467'))
['o', 'k', '👨\u200d👩\u200d👧']

A grapheme cluster is what a user perceives as a single character, even if it is composed of multiple Unicode codepoints. This function implements Unicode Standard Annex #29 grapheme cluster boundary rules.

ljust()

Use ljust() as replacement of str.ljust():

>>> 'コンニチハ'.ljust(11, '*')             # don't do this
'コンニチハ******'
>>> wcwidth.ljust('コンニチハ', 11, '*')    # do this!
'コンニチハ*'

rjust()

Use rjust() as replacement of str.rjust():

>>> 'コンニチハ'.rjust(11, '*')             # don't do this
'******コンニチハ'
>>> wcwidth.rjust('コンニチハ', 11, '*')    # do this!
'*コンニチハ'

center()

Use center() as replacement of str.center():

>>> 'cafe\u0301'.center(6, '*')             # don't do this
'café*'
>>> wcwidth.center('cafe\u0301', 6, '*')
'*café*'                                    # do this!

wrap()

Use function wrap() to wrap text containing terminal sequences, Unicode grapheme clusters, and wide characters to a given display width.

>>> from wcwidth import wrap
>>> # Basic wrapping
>>> wrap('hello world', 5)
['hello', 'world']

>>> # Wrapping CJK text (each character is 2 cells wide)
>>> wrap('コンニチハ', 4)
['コン', 'ニチ', 'ハ']

>>> # Text with ANSI color sequences - SGR codes are propagated by default
>>> # Each line ends with reset, next line starts with restored style
>>> wrap('\x1b[1;31mhello world\x1b[0m', 5)
['\x1b[1;31mhello\x1b[0m', '\x1b[1;31mworld\x1b[0m']

clip()

Use clip() to extract a substring by column positions, preserving terminal sequences.

>>> from wcwidth import clip
>>> # Wide characters split to Narrow boundaries using fillchar=' '
>>> clip('中文字', 0, 3)
'中 '
>>> clip('中文字', 1, 5, fillchar='.')
'.文.'

>>> # SGR codes are propagated by default - result begins with active style
>>> # and ends with reset if styles are active
>>> clip('\x1b[1;31mHello world\x1b[0m', 6, 11)
'\x1b[1;31mworld\x1b[0m'

>>> # Disable SGR propagation to preserve original sequences as-is
>>> clip('\x1b[31m中文\x1b[0m', 0, 3, propagate_sgr=False)
'\x1b[31m中 \x1b[0m'

strip_sequences()

Use strip_sequences() to remove all terminal escape sequences from text.

>>> from wcwidth import strip_sequences
>>> strip_sequences('\x1b[31mred\x1b[0m')
'red'

ambiguous_width

Some Unicode characters have “East Asian Ambiguous” (A) width. These characters display as 1 cell by default, matching Western terminal contexts, but many CJK (Chinese, Japanese, Korean) environments may have a preference for 2 cells. This is often found as boolean option, “Ambiguous width as wide” in Terminal Emulator software preferences.

By default, wcwidth treats ambiguous characters as narrow (width 1). For CJK environments where your terminal is configured to display ambiguous characters as double-width, pass ambiguous_width=2:

>>> # CIRCLED DIGIT ONE - ambiguous width
>>> wcwidth.width('\u2460')
1
>>> wcwidth.width('\u2460', ambiguous_width=2)
2

The ambiguous_width parameter is available on all width-measuring functions: wcwidth(), wcswidth(), width(), ljust(), rjust(), center(), wrap(), and clip().

Terminal Detection

The most reliable method to detect whether a terminal profile is set for “Ambiguous width as wide” mode is to display an ambiguous character surrounded by a pair of Cursor Position Report (CPR) queries with a terminal in cooked or raw mode, and to parse the responses for their (y, x) locations and measure the difference x.

This code should also be careful check whether it is attached to a terminal and be careful of possible timeout, slow network, or non-response when working with “dumb terminals” like a CI build.

jquast/blessed library provides such a helping Terminal.detect_ambiguous_width() method:

>>> import blessed, functools
>>> # Detect terminal ambiguous width as wide (2) or narrow (1)
>>> ambiguous_width = blessed.Terminal().detect_ambiguous_width()
>>> # Define a new 'width' function with this argument
>>> awidth = functools.partial(wcwidth.width, ambiguous_width=ambiguous_width)
>>> # result depends on attached terminal mode
>>> awidth('\u2460')
1

Developing

Install wcwidth in editable mode:

pip install -e .

Execute all code generation, autoformatters, linters and unit tests using tox:

tox

Or execute individual tasks, see tox -lv for all available targets:

tox -e pylint,py36,py314

To run tests with detailed coverage reporting showing missing lines:

tox -epy314 -- --cov-report=term-missing

Updating Unicode Version

Regenerate python code tables from latest Unicode Specification data files:

tox -e update

The script is located at bin/update-tables.py, requires Python 3.9 or later. It is recommended but not necessary to run this script with the newest Python, because the newest Python has the latest unicodedata for generating comments.

Building Documentation

This project is using sphinx 4.5 to build documentation:

tox -e sphinx

The output will be in docs/_build/html/.

Updating Requirements

This project is using pip-tools to manage requirements.

To upgrade requirements for updating unicode version, run:

tox -e update_requirements_update

To upgrade requirements for testing, run:

tox -e update_requirements38,update_requirements39

To upgrade requirements for building documentation, run:

tox -e update_requirements_docs

Utilities

Supplementary tools for browsing and testing terminals for wide unicode characters are found in the bin/ of this project’s source code. Just ensure to first pip install -r requirements-develop.txt from this projects main folder. For example, an interactive browser for testing:

python ./bin/wcwidth-browser.py

Uses

This library is used in:

Other Languages

There are similar implementations of the wcwidth() and wcswidth() functions in other languages.

History

0.6.0 2026-02-06
  • New Parameters expand_tabs, replace_whitespace, fix_sentence_endings, drop_whitespace, max_lines, and placeholder for wrap(), completing stdlib textwrap.wrap() compatibility.

0.5.3 2026-01-30
0.5.2 2026-01-29
  • Bugfix Measurement of category Mc (Spacing Combining Mark), approx. 443, has a more nuanced specification, and may be categorized as either zero or wide. PR #200.

  • Bugfix Measurement of “standalone” modifiers and regional indicators, PR #202.

  • Updated Data files used in some automatic tests are no longer distributed. PR #199

0.5.1 2026-01-27
  • Updated generated zero and wide code tables to length of 1 to complete the previously announced removal of historical wide and zero tables. PR #196.

0.5.0 2026-01-26
  • Drop Support of many historical versions of wide and zero unicode tables. Only the latest Unicode version (17.0.0) is now shipped. The related unicode_version='auto' keyword of the wcwidth() family of functions are ignored. list_versions() always returns a tuple of only a single element of the only unicode version supported. PR #195.

  • Performance improvement of most common call without version or ambiguous_width specified by 20%. PR #195.

  • New Function propagate_sgr() for applying SGR state propagation to a list of lines. PR #194.

  • Improved wrap() and clip() with propagate_sgr=True. PR #194.

  • Bugfix clip() zero-width characters at clipping boundaries. PR #194.

  • Bugfix OSC Hyperlinks when broken mid-text by wrap(). PR #193.

0.4.0 2026-01-25
0.3.5 2026-01-24
  • Bugfix packaging of 0.3.4 contains a failing test.

0.3.4 2026-01-24
0.3.3 2026-01-24
0.3.2 2026-01-23
  • Updated type hinting for full mympy --strict compliance. PR #183.

0.3.1 2026-01-22
  • Performance improvement up to 30% in width()_. PR #181.

0.3.0 2026-01-21
0.2.14 2025-09-22
  • Drop Support for Python 2.7 and 3.5. PR #117.

  • Update tables to include Unicode Specifications 16.0.0 and 17.0.0. PR #146.

  • Bugfix U+00AD SOFT HYPHEN should measure as 1, versions 0.2.9 through 0.2.13 measured as 0. PR #149.

0.2.13 2024-01-06
  • Bugfix zero-width support for Hangul Jamo (Korean)

0.2.12 2023-11-21
  • Bugfix Re-release to remove .pyi files misplaced in wheel Issue #101.

0.2.11 2023-11-20
  • Updated Include tests files in the source distribution (PR #98, PR #100).

0.2.10 2023-11-13
  • Bugfix accounting of some kinds of emoji sequences using U+FE0F Variation Selector 16 (PR #97).

  • Updated specification.

0.2.9 2023-10-30
  • Bugfix zero-width characters used in Emoji ZWJ sequences, Balinese, Jamo, Devanagari, Tamil, Kannada and others (PR #91).

  • Updated to include specification of character measurements.

0.2.8 2023-09-30
  • Include requirements files in the source distribution (PR #82).

0.2.7 2023-09-28
  • Updated tables to include Unicode Specification 15.1.0.

  • Include bin, docs, and tox.ini in the source distribution

0.2.6 2023-01-14
  • Updated tables to include Unicode Specification 14.0.0 and 15.0.0.

  • Changed developer tools to use pip-compile, and to use jinja2 templates for code generation in bin/update-tables.py to prepare for possible compiler optimization release.

0.2.1 .. 0.2.5 2020-06-23
  • Repository changes to update tests and packaging issues, and begin tagging repository with matching release versions.

0.2.0 2020-06-01
  • Enhancement: Unicode version may be selected by exporting the Environment variable UNICODE_VERSION, such as 13.0, or 6.3.0. See the jquast/ucs-detect CLI utility for automatic detection.

  • Enhancement: API Documentation is published to readthedocs.io.

  • Updated tables for all Unicode Specifications with files published in a programmatically consumable format, versions 4.1.0 through 13.0

0.1.9 2020-03-22
  • Performance optimization by Avram Lubkin, PR #35.

  • Updated tables to Unicode Specification 13.0.0.

0.1.8 2020-01-01
  • Updated tables to Unicode Specification 12.0.0. (PR #30).

0.1.7 2016-07-01
  • Updated tables to Unicode Specification 9.0.0. (PR #18).

0.1.6 2016-01-08 Production/Stable
  • LICENSE file now included with distribution.

0.1.5 2015-09-13 Alpha
  • Bugfix: Resolution of “combining character width” issue, most especially those that previously returned -1 now often (correctly) return 0. resolved by Philip Craig via PR #11.

  • Deprecated: The module path wcwidth.table_comb is no longer available, it has been superseded by module path wcwidth.table_zero.

0.1.4 2014-11-20 Pre-Alpha
0.1.3 2014-10-29 Pre-Alpha
0.1.2 2014-10-28 Pre-Alpha
0.1.1 2014-05-14 Pre-Alpha
  • Initial release to pypi, Based on Unicode Specification 6.3.0

This code was originally derived directly from C code of the same name, whose latest version is available at https://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c:

* Markus Kuhn -- 2007-05-26 (Unicode 5.0)
*
* Permission to use, copy, modify, and distribute this software
* for any purpose and without fee is hereby granted. The author
* disclaims all warranties with regard to this software.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wcwidth-0.6.0.tar.gz (159.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

wcwidth-0.6.0-py3-none-any.whl (94.2 kB view details)

Uploaded Python 3

File details

Details for the file wcwidth-0.6.0.tar.gz.

File metadata

  • Download URL: wcwidth-0.6.0.tar.gz
  • Upload date:
  • Size: 159.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.15.0a5

File hashes

Hashes for wcwidth-0.6.0.tar.gz
Algorithm Hash digest
SHA256 cdc4e4262d6ef9a1a57e018384cbeb1208d8abbc64176027e2c2455c81313159
MD5 e2c63b1bce58b1d78cd125b8bfabf2a8
BLAKE2b-256 35a28e3becb46433538a38726c948d3399905a4c7cabd0df578ede5dc51f0ec2

See more details on using hashes here.

File details

Details for the file wcwidth-0.6.0-py3-none-any.whl.

File metadata

  • Download URL: wcwidth-0.6.0-py3-none-any.whl
  • Upload date:
  • Size: 94.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.15.0a5

File hashes

Hashes for wcwidth-0.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1a3a1e510b553315f8e146c54764f4fb6264ffad731b3d78088cdb1478ffbdad
MD5 642f128544cbfa543830447dc56424e8
BLAKE2b-256 685a199c59e0a824a3db2b89c5d2dade7ab5f9624dbf6448dc291b46d5ec94d3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page