CLI control characters and escape sequences viewer/visualizer
Project description
CLI application for visualising usually invisible characters and bytes:
- whitespace characters;
- ASCII control characters;
- ANSI escape sequences;
- UTF-8 encoded characters;
- binary data.
Installation
Via pipx
pipx install kolombos
Without pipx
System-wide install (sudo
)
python -m pip install kolombos
User install (no sudo
)
python -m pip install --user kolombos
export PATH="${PATH}:${HOME}/.local/bin/"
Usage
Application can be useful for a variety of tasks, e.g. browsing unknown data formats, searching for patterns or debugging combinations of SGR sequences.
USAGE
kolombos [[--text] | --binary] [<options>] [--demo | <file>]
INPUT
<file> file to read from; if empty or "-", read stdin
instead; ignored if --demo is present
-M, --demo show output examples and exit; see --legend for the
description
OPERATING MODE
-t, --text open file in text mode [this is a default]
-b, --binary open file in binary mode
-l, --legend show annotation symbol list and exit
-v, --version show app version and exit
-h, --help show this help message and exit
[...]
Text mode and binary mode
kolombos
can work in two primary modes: text and binary. The differences between them are line-by-line input reading in text mode vs. fixed size byte chunk reading in binary mode, and extended output in binary mode, which consists of text representation (similar to text mode) and hexademical byte values.
As you can see, some of the settings are shared between both modes, while the others are unique for one or another.
GENERIC OPTIONS
-f, --buffer <size> read buffer size, in bytes [default: 4096]
-L, --max-lines <num> stop after reading <num> lines [default: no limit]
-B, --max-bytes <num> stop after reading <num> bytes [default: no limit]
-D, --debug enable debug mode; can be used from 1 to 4 times,
each level increases verbosity (-D|DD|DDD|DDDD)
--color-markers apply SGR marker format to themselves
TEXT MODE OPTIONS
-m, --marker <details> marker details: 0 is none, 1 is brief, 2 is full
[default: 0]
--no-separators do not print ⢸separators⡇ around escape sequences
--no-line-numbers do not print line numbers
BINARY MODE OPTIONS
-w, --columns <num> format output as <num>-columns wide table [default: auto]
-d, --decode decode valid UTF-8 sequences, print as unicode chars
--decimal-offsets output offsets in decimal format [default: hex format]
--no-offsets do not print offsets
[...]
Character classes
There are 6 different character classes, and each of those can be displayed normally, highlighted (or focused) or ignored.
output | character class | byte ranges | focus flag | ignore flag | examples |
---|---|---|---|---|---|
whitespace | 09-0d 20 |
-s |
-S |
space, line feed, carriage return | |
control char | 01-08 0e-1f |
-c |
-C |
null byte, backspace, delete | |
printable char | 21-7e |
-p |
-P |
ASCII alphanumeric and punctuation characters: A-Z, a-z, 0-9, (), [] | |
escape sequence | 1b[..] |
-e |
-E |
ANSI escape sequences controlling cursor position, color, font styling, and other terminal options | |
UTF-8 sequence | various | -u |
-U |
valid UTF-8 byte sequences that can be decoded into Unicode characters | |
binary data | 80-ff |
-i |
-I |
standalone non-(7 bit)-ASCII bytes |
Examples
Control and whitespace characters
Let's take a look at one of the files from somebody's home directory — .psql_history
. At the first sight it's a regular text file:
But what if we look a bit more deeper into it?
kolombos
shows us hidden until now characters — not only spaces and line breaks, but even more: some control characters, namely 01
START OF HEADING ASCII bytes, which postgresql
uses to store multiline queries.
Red symbol is an example of marker — special sigil that indicates invisibile character in the input. Sigils were selected with a focus on dissimilarity and noticeability, which helps to detect them as soon as possible. Control char and escape sequence markers also provide some details about original input byte(s); there are three different levels of these details in text mode.
- Level 0 is no details, just the marker itself.
- Level 1 is medium details (this is a default) — one extra character for control chars and varying amount for escape sequences. For most of the control characters the second char corresponds to their caret notation, e.g.
ⱯA
should be read as ^A [wiki]. - Level 2 is maximuim amount of verbosity. For control chars it's their 2-digit hexademical value. Also note
-c
option in the last example below — which tells the application to highlight control characters and make them even more noticable.
Some of the control characters has unique sigils — for example, null byte (see Legend).
A few more examples of option combinations. First one is --focus-space
flag, or -s
, which can be useful for a situations where whitespaces are the points of interest, but input is a mess of different character classes.
Second example is a result of running the app with --ignore-space
and --ignore-printable
options; as you can see, now almost nothing is in the way of observing our precious control characters (if that's what you were after, that is):
ANSI escape sequecnces
Escape sequences and their overlapping combinations were the main reason for me to develop this application. For those who doesn't know much about them here's some comprehensive materials: [one] [two].
kolombos
can distiguish a few types of escape sequences, but most interesting and frequent type is SGR sequence, which consists of escape control character 1b
, square bracket [
, one or more digit params separated by ;
and m
character (terminator). Let me illustrate.
SGR sequences are used for terminal text coloring and formatting. Consider this command with the following output:
kolombos
can show us what exactly is happening out there:
There are 3 different types of markers in the example above:
ǝ
is a sigil for regular SGR sequence (which for example sets the color of the following text to red);θ
is a reset SGR sequence (ESC[0m
) which completely disables all previously set colors and effects;Ͻ
is CSI sequence (more common sequence class which includes SGRs) — they also begin withESC[
, but have different terminator characters; in general, they control cursor position.- Other types are listed in Legend section.
For this example binary more would be more convenient.
As a rule of a thumb, the only underlined bytes in kolombo
's output are the bytes that correspond to escape sequences' params, introducers or terminators (but not the 1b
|ESC
character itself, though).
Now it's clear where and which sequences are located:
ǝ[35m
— SGR that sets text color to magenta;Ͻ[K
— CSI that erases all characters from cursor to the end of the current line;θ[m
— SGR that resets, or disables all formatting;ǝ[01;91m
— SGR that sets text style to bold and text color to bright red, etc.
There is an option of highlighting SGR sequences with their own colors: --color-markers
, which is disabled by default. In this particular case, even more clear picture can be seen after launching the app with -P
option (--ignore-printable
):
Also notice that in binary mode each byte of input corresponds strictly to one hex value and one text representation character. That means that option -m
is always equal to 2 (maximum verbosity) and cannot be changed.
UTF-8 and binary data
There is no limitation for input bytes range in kolombos
text mode — binary data will be displayed with the replacement character -- Ḇ
:
But it's better and faster to work with binary data in binary mode. Valid UTF-8 sequences and escape sequences can be seen even in completely random byte data:
UTF-8 sequences in text mode are automatically decoded and displayed as Unicode characters. In binary mode for faster data processing they are displayed as boxes by default, but still can be decoded with -d
|--decode
option (note the same requirement as for escape sequence markers — hex value length must always correspond to text representation length):
Legend
Even more information can be seen after running kolombos --legend
.
Changelog
v1.5.4
- FIX: reverted default column amount in
--demo
mode
v1.5.3
- FIX: errors while processing SGR with subparams (e.g.
4:3;
)
v1.5.2
- UPDATE: icon redraw
v1.5.1
- FIX: packaging assets
v1.5
- NEW:
--demo
mode
v1.4.1
- Temporarily injected
pytermor
v2.1
v1.4
- REFACTOR: base colors
- REFACTOR: extended legend
- DOCS: update README and screenshots
v1.3
- Swap -D and -d (debug/decode)
- Make '--marker 0' default (was 1)
- Update legend
- Upgrade
pytermor
to 2.1
v1.2.1
- Minor update.
v1.2
- Separators additional styling.
- Separators auto-hide from
-m0
. --no-sep[arators]
launch option.run
dev script for quick launch of repo versions.- Updated output format of SGR color prefixes.
- SGR labels are now getting colored instead of marker details (if
-m0
is set). - Updated legend.
v1.1
- Additional separators around escape seqs (in text mode) for better readability.
v1.0.2
- Added logos.
- Fixed pipy README images.
v1.0.1
- First public version.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file kolombos-1.5.4.tar.gz
.
File metadata
- Download URL: kolombos-1.5.4.tar.gz
- Upload date:
- Size: 69.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.8.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b0bdf11fe40661bb6ed086236af5dbbdf842dcf576cdcb472f7cd9ae521d035a |
|
MD5 | 63afbe0e7c9e21bd7141ded663135d10 |
|
BLAKE2b-256 | e1dd979f6046307bb44dcbccd48a3e6146d2937ef898681d55fa9b9cf54cf06d |
File details
Details for the file kolombos-1.5.4-py3-none-any.whl
.
File metadata
- Download URL: kolombos-1.5.4-py3-none-any.whl
- Upload date:
- Size: 85.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.8.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 496a7c6a4f8f5a8f84089ad1b46dc91b242b6429bf36535d228e0438e0347e6a |
|
MD5 | d2daca46ea14d228574e00a139b1eb41 |
|
BLAKE2b-256 | d918dc34bc09e5bc78821013ef102c50a7b32b9939ce353129b0bc16e4987c7d |