Skip to main content

Lexical analysis functions, tokenisers, transcribers: an arbitrary assortment of lexical and tokenisation functions useful for writing recursive descent parsers, of which I have several. There are also some transcription functions for producing text from various objects, such as `hexify` and `unctrl`.

Project description

Lexical analysis functions, tokenisers, transcribers: an arbitrary assortment of lexical and tokenisation functions useful for writing recursive descent parsers, of which I have several. There are also some transcription functions for producing text from various objects, such as hexify and unctrl.

Latest release 20241119: stripped_dedent: new optional sub_indent parameter for indenting the second and following lines, handy for usage messages.

Generally the get_* functions accept a source string and an offset (usually optional, default 0) and return a token and the new offset, raising ValueError on failed tokenisation.

as_lines(chunks, partials=None)

Generator yielding complete lines from arbitrary pieces of text from the iterable of str chunks.

After completion, any remaining newline-free chunks remain in the partials list; they will be unavailable to the caller unless the list is presupplied.

camelcase(snakecased, first_letter_only=False)

Convert a snake cased string snakecased into camel case.

Parameters:

  • snakecased: the snake case string to convert
  • first_letter_only: optional flag (default False); if true then just ensure that the first character of a word is uppercased, otherwise use str.title

Example:

>>> camelcase('abc_def')
'abcDef'
>>> camelcase('ABc_def')
'abcDef'
>>> camelcase('abc_dEf')
'abcDef'
>>> camelcase('abc_dEf', first_letter_only=True)
'abcDEf'

common_prefix(*strs)

Return the common prefix of the strings strs.

Examples:

>>> common_prefix('abc', 'def')
''
>>> common_prefix('abc', 'abd')
'ab'
>>> common_prefix('abc', 'abcdef')
'abc'
>>> common_prefix('abc', 'abcdef', 'abz')
'ab'
>>> # contrast with cs.fileutils.common_path_prefix
>>> common_prefix('abc/def', 'abc/def1', 'abc/def2')
'abc/def'

common_suffix(*strs)

Return the common suffix of the strings strs.

cropped(s: str, max_length: int = 32, roffset: int = 1, ellipsis: str = '...')

If the length of s exceeds max_length (default 32), replace enough of the tail with ellipsis and the last roffset (default 1) characters of s to fit in max_length characters.

cropped_repr(o, roffset=1, max_length=32, inner_max_length=None)

Compute a cropped repr() of o.

Parameters:

  • o: the object to represent
  • max_length: the maximum length of the representation, default 32
  • inner_max_length: the maximum length of the representations of members of o, default max_length//2
  • roffset: the number of trailing characters to preserve, default 1

cutprefix(s, prefix)

Strip a prefix from the front of s. Return the suffix if s.startswith(prefix), else s.

Example:

>>> abc_def = 'abc.def'
>>> cutprefix(abc_def, 'abc.')
'def'
>>> cutprefix(abc_def, 'zzz.')
'abc.def'
>>> cutprefix(abc_def, '.zzz') is abc_def
True

cutsuffix(s, suffix)

Strip a suffix from the end of s. Return the prefix if s.endswith(suffix), else s.

Example:

>>> abc_def = 'abc.def'
>>> cutsuffix(abc_def, '.def')
'abc'
>>> cutsuffix(abc_def, '.zzz')
'abc.def'
>>> cutsuffix(abc_def, '.zzz') is abc_def
True

Class FFloat(FNumericMixin, builtins.float)

Formattable float.

Class FInt(FNumericMixin, builtins.int)

Formattable int.

Class FNumericMixin(FormatableMixin)

A FormatableMixin subclass.

FNumericMixin.localtime(self): Treat this as a UNIX timestamp and return a localtime datetime.

FNumericMixin.utctime(self): Treat this as a UNIX timestamp and return a UTC datetime.

format_as(format_s: str, format_mapping, formatter=None, error_sep=None, strict=None)

Format the string format_s using Formatter.vformat, return the formatted result. This is a wrapper for str.format_map which raises a more informative FormatAsError exception on failure.

Parameters:

  • format_s: the format string to use as the template
  • format_mapping: the mapping of available replacement fields
  • formatter: an optional string.Formatter-like instance with a .vformat(format_string,args,kwargs) method, usually a subclass of string.Formatter; if not specified then FormatableFormatter is used
  • error_sep: optional separator for the multipart error message, default from FormatAsError.DEFAULT_SEPARATOR: '; '
  • strict: optional flag (default False) indicating that an unresolveable field should raise a KeyError instead of inserting a placeholder

format_attribute(method)

A decorator to mark a method as available as a format method. Requires the enclosing class to be decorated with @has_format_attributes.

For example, the FormatableMixin.json method is defined like this:

@format_attribute
def json(self):
    return self.FORMAT_JSON_ENCODER.encode(self)

which allows a FormatableMixin subclass instance to be used in a format string like this:

{instance:json}

to insert a JSON transcription of the instance.

It is recommended that methods marked with @format_attribute have no side effects and do not modify state, as they are intended for use in ad hoc format strings supplied by an end user.

format_escape(s)

Escape {} characters in a string to protect them from str.format.

format_recover(*da, **dkw)

Decorator for __format__ methods which replaces failed formats with {self:format_spec}.

Class FormatableFormatter(string.Formatter)

A string.Formatter subclass interacting with objects which inherit from FormatableMixin.

FormatableFormatter.format_field(value, format_spec: str): Format a value using value.format_format_field, returning an FStr (a str subclass with additional format_spec features).

We actually recognise colon separated chains of formats and apply each format to the previously converted value. The final result is promoted to an FStr before return.

FormatableFormatter.format_mode: Thread local state object.

Attributes:

  • strict: initially False; raise a KeyError for unresolveable field names

FormatableFormatter.get_arg_name(field_name): Default initial arg_name is an identifier.

Returns (prefix,offset), and ('',0) if there is no arg_name.

FormatableFormatter.get_field(self, field_name, args, kwargs): Get the object referenced by the field text field_name. Raises KeyError for an unknown field_name.

FormatableFormatter.get_format_subspecs(format_spec): Parse a format_spec as a sequence of colon separated components, return a list of the components.

FormatableFormatter.get_subfield(value, subfield_text: str): Resolve value against subfield_text, the remaining field text after the term which resolved to value.

For example, a format {name.blah[0]} has the field text name.blah[0]. A get_field implementation might initially resolve name to some value, leaving .blah[0] as the subfield_text. This method supports taking that value and resolving it against the remaining text .blah[0].

For generality, if subfield_text is the empty string value is returned unchanged.

FormatableFormatter.get_value(self, arg_name, args, kwargs): Get the object with index arg_name.

This default implementation returns (kwargs[arg_name],arg_name).

Class FormatableMixin(FormatableFormatter)

A subclass of FormatableFormatter which provides 2 features:

  • a __format__ method which parses the format_spec string into multiple colon separated terms whose results chain
  • a format_as method which formats a format string using str.format_map with a suitable mapping derived from the instance via its format_kwargs method (whose default is to return the instance itself)

The format_as method is like an inside out str.format or object.__format__ method.

The str.format method is designed for formatting a string from a variety of other objects supplied in the keyword arguments.

The object.__format__ method is for filling out a single str.format replacement field from a single object.

By contrast, format_as is designed to fill out an entire format string from the current object.

For example, the cs.tagset.TagSetMixin class uses FormatableMixin to provide a format_as method whose replacement fields are derived from the tags in the tag set.

Subclasses wanting to provide additional format_spec terms should:

  • override FormatableFormatter.format_field1 to implement terms with no colons, letting format_field do the split into terms
  • override FormatableFormatter.get_format_subspecs to implement the parse of format_spec into a sequence of terms. This might recognise a special additional syntax and quietly fall back to super().get_format_subspecs if that is not present.

FormatableMixin.__format__(self, format_spec): Format self according to format_spec.

This implementation calls self.format_field. As such, a format_spec is considered a sequence of colon separated terms.

Classes wanting to implement additional format string syntaxes should either:

  • override FormatableFormatter.format_field1 to implement terms with no colons, letting format_field1 do the split into terms
  • override FormatableFormatter.get_format_subspecs to implement the term parse.

The default implementation of __format1__ just calls super().__format__. Implementations providing specialised formats should implement them in __format1__ with fallback to super().__format1__.

FormatableMixin.convert_field(self, value, conversion): The default converter for fields calls Formatter.convert_field.

FormatableMixin.convert_via_method_or_attr(self, value, format_spec): Apply a method or attribute name based conversion to value where format_spec starts with a method name applicable to value. Return (converted,offset) being the converted value and the offset after the method name.

Note that if there is not a leading identifier on format_spec then value is returned unchanged with offset=0.

The methods/attributes are looked up in the mapping returned by .format_attributes() which represents allowed methods (broadly, one should not allow methods which modify any state).

If this returns a callable, it is called to obtain the converted value otherwise it is used as is.

As a final tweak, if value.get_format_attribute() raises an AttributeError (the attribute is not an allowed attribute) or calling the attribute raises a TypeError (the value isn't suitable) and the value is not an instance of FStr, convert it to an FStr and try again. This provides the common utility methods on other types.

The motivating example was a PurePosixPath, which does not JSON transcribe; this tweak supports both posixpath:basename via the pathlib stuff and posixpath:json via FStr even though a PurePosixPath does not subclass FStr.

FormatableMixin.format_as(self, format_s, error_sep=None, strict=None, **control_kw): Return the string format_s formatted using the mapping returned by self.format_kwargs(**control_kw).

If a class using the mixin has no format_kwargs(**control_kw) method to provide a mapping for str.format_map then the instance itself is used as the mapping.

FormatableMixin.get_format_attribute(self, attr): Return a mapping of permitted methods to functions of an instance. This is used to whitelist allowed :name method formats to prevent scenarios like little Bobby Tables calling delete().

FormatableMixin.get_format_attributes(): Return the mapping of format attributes.

FormatableMixin.json(self): The value transcribed as compact JSON.

Class FormatAsError(builtins.LookupError)

Subclass of LookupError for use by format_as.

Class FStr(FormatableMixin, builtins.str)

A str subclass with the FormatableMixin methods, particularly its __format__ method which uses str method names as valid formats.

It also has a bunch of utility methods which are available as :method in format strings.

FStr.basename(self): Treat as a filesystem path and return the basename.

FStr.dirname(self): Treat as a filesystem path and return the dirname.

FStr.f(self): Parse self as a float.

FStr.i(self, base=10): Parse self as an int.

FStr.lc(self): Lowercase using lc_().

FStr.path(self): Convert to a native filesystem pathlib.Path.

FStr.posix_path(self): Convert to a Posix filesystem pathlib.Path.

FStr.windows_path(self): Convert to a Windows filesystem pathlib.Path.

get_chars(s, offset, gochars)

Scan the string s for characters in gochars starting at offset. Return (match,new_offset).

gochars may also be a callable, in which case a character ch is accepted if gochars(ch) is true.

get_decimal(s, offset=0)

Scan the string s for decimal characters starting at offset (default 0). Return (dec_string,new_offset).

get_decimal_or_float_value(s, offset=0)

Fetch a decimal or basic float (nnn.nnn) value from the str s at offset (default 0). Return (value,new_offset).

get_decimal_value(s, offset=0)

Scan the string s for a decimal value starting at offset (default 0). Return (value,new_offset).

get_delimited(s, offset, delim)

Collect text from the string s from position offset up to the first occurence of delimiter delim; return the text excluding the delimiter and the offset after the delimiter.

get_dotted_identifier(s, offset=0, **kw)

Scan the string s for a dotted identifier (by default an ASCII letter or underscore followed by letters, digits or underscores) with optional trailing dot and another dotted identifier, starting at offset (default 0). Return (match,new_offset).

Note: the empty string and an unchanged offset will be returned if there is no leading letter/underscore.

Keyword arguments are passed to get_identifier (used for each component of the dotted identifier).

get_envvar(s, offset=0, environ=None, default=None, specials=None)

Parse a simple environment variable reference to $varname or $x where "x" is a special character.

Parameters:

  • s: the string with the variable reference
  • offset: the starting point for the reference
  • default: default value for missing environment variables; if None (the default) a ValueError is raised
  • environ: the environment mapping, default os.environ
  • specials: the mapping of special single character variables

get_hexadecimal(s, offset=0)

Scan the string s for hexadecimal characters starting at offset (default 0). Return (hex_string,new_offset).

get_hexadecimal_value(s, offset=0)

Scan the string s for a hexadecimal value starting at offset (default 0). Return (value,new_offset).

get_identifier(s, offset=0, alpha='abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ', number='0123456789', extras='_')

Scan the string s for an identifier (by default an ASCII letter or underscore followed by letters, digits or underscores) starting at offset (default 0). Return (match,new_offset).

Note: the empty string and an unchanged offset will be returned if there is no leading letter/underscore.

Parameters:

  • s: the string to scan
  • offset: the starting offset, default 0.
  • alpha: the characters considered alphabetic, default string.ascii_letters.
  • number: the characters considered numeric, default string.digits.
  • extras: extra characters considered part of an identifier, default '_'.

get_ini_clause_entryname(s, offset=0)

Parse a [clausename]entryname string from s at offset (default 0). Return (clausename,entryname,new_offset).

get_ini_clausename(s, offset=0)

Parse a [clausename] string from s at offset (default 0). Return (clausename,new_offset).

get_nonwhite(s, offset=0)

Scan the string s for characters not in string.whitespace starting at offset (default 0). Return (match,new_offset).

get_other_chars(s, offset=0, stopchars=None)

Scan the string s for characters not in stopchars starting at offset (default 0). Return (match,new_offset).

get_prefix_n(s, prefix, n=None, *, offset=0)

Strip a leading prefix and numeric value n from the string s starting at offset (default 0). Return the matched prefix, the numeric value and the new offset. Returns (None,None,offset) on no match.

Parameters:

  • s: the string to parse
  • prefix: the prefix string which must appear at offset or an object with a match(str,offset) method such as an re.Pattern regexp instance
  • n: optional integer value; if omitted any value will be accepted, otherwise the numeric part must match n

If prefix is a str, the "matched prefix" return value is prefix. Otherwise the "matched prefix" return value is the result of the prefix.match(s,offset) call. The result must also support a .end() method returning the offset in s beyond the match, used to locate the following numeric portion.

Examples:

import re get_prefix_n('s03e01--', 's') ('s', 3, 3) get_prefix_n('s03e01--', 's', 3) ('s', 3, 3) get_prefix_n('s03e01--', 's', 4) (None, None, 0) get_prefix_n('s03e01--', re.compile('[es]',re.I)) (<re.Match object; span=(0, 1), match='s'>, 3, 3) get_prefix_n('s03e01--', re.compile('[es]',re.I), offset=3) (<re.Match object; span=(3, 4), match='e'>, 1, 6)

get_qstr(s, offset=0, q='"', environ=None, default=None, env_specials=None)

Get quoted text with slosh escapes and optional environment substitution.

Parameters:

  • s: the string containg the quoted text.
  • offset: the starting point, default 0.
  • q: the quote character, default '"'. If q is None, do not expect the string to be delimited by quote marks.
  • environ: if not None, also parse and expand $envvar references.
  • default: passed to get_envvar

get_qstr_or_identifier(s, offset)

Parse a double quoted string or an identifier.

get_sloshed_text(s, delim, offset=0, slosh='\\', mapper=<function slosh_mapper at 0x10727b0a0>, specials=None)

Collect slosh escaped text from the string s from position offset (default 0) and return the decoded unicode string and the offset of the completed parse.

Parameters:

  • delim: end of string delimiter, such as a single or double quote.
  • offset: starting offset within s, default 0.
  • slosh: escape character, default a slosh ('').
  • mapper: a mapping function which accepts a single character and returns a replacement string or None; this is used the replace things such as '\t' or '\n'. The default is the slosh_mapper function, whose default mapping is SLOSH_CHARMAP.
  • specials: a mapping of other special character sequences and parse functions for gathering them up. When one of the special character sequences is found in the string, the parse function is called to parse at that point. The parse functions accept s and the offset of the special character. They return the decoded string and the offset past the parse.

The escape character slosh introduces an encoding of some replacement text whose value depends on the following character. If the following character is:

  • the escape character slosh, insert the escape character.
  • the string delimiter delim, insert the delimiter.
  • the character 'x', insert the character with code from the following 2 hexadecimal digits.
  • the character 'u', insert the character with code from the following 4 hexadecimal digits.
  • the character 'U', insert the character with code from the following 8 hexadecimal digits.
  • a character from the keys of mapper

get_suffix_part(s, *, keywords=('part',), numeral_map=None)

Strip a trailing "part N" suffix from the string s. Return the matched suffix and the number part number. Retrn (None,None) on no match.

Parameters:

  • s: the string
  • keywords: an iterable of str to match, or a single str; default 'part'
  • numeral_map: an optional mapping of numeral names to numeric values; default NUMERAL_NAMES['en'], the English numerals

Exanmple:

>>> get_suffix_part('s09e10 - A New World: Part One')
(': Part One', 1)

get_tokens(s, offset, getters)

Parse the string s from position offset using the supplied tokeniser functions getters. Return the list of tokens matched and the final offset.

Parameters:

  • s: the string to parse.
  • offset: the starting position for the parse.
  • getters: an iterable of tokeniser specifications.

Each tokeniser specification getter is either:

  • a callable expecting (s,offset) and returning (token,new_offset)
  • a literal string, to be matched exactly
  • a tuple or list with values (func,args,kwargs); call func(s,offset,*args,**kwargs)
  • an object with a .match method such as a regex; call getter.match(s,offset) and return a match object with a .end() method returning the offset of the end of the match

get_uc_identifier(s, offset=0, number='0123456789', extras='_')

Scan the string s for an identifier as for get_identifier, but require the letters to be uppercase.

get_white(s, offset=0)

Scan the string s for characters in string.whitespace starting at offset (default 0). Return (match,new_offset).

has_format_attributes(*da, **dkw)

Class decorator to walk this class for direct methods marked as for use in format strings and to include them in cls.format_attributes().

Methods are normally marked with the @format_attribute decorator.

If inherit is true the base format attributes will be obtained from other classes:

  • inherit is True: use cls.__mro__
  • inherit is a class: use that class
  • otherwise assume inherit is an iterable of classes For each class otherclass, update the initial attribute mapping from otherclass.get_format_attributes().

hexify(bs)

A flavour of binascii.hexlify returning a str.

htmlify(s, nbsp=False)

Convert a string for safe transcription in HTML.

Parameters:

  • s: the string
  • nbsp: replaces spaces with "&nbsp;" to prevent word folding, default False.

htmlquote(s)

Quote a string for use in HTML.

indent(paragraph, line_indent=' ')

Return the paragraph indented by line_indent (default " ").

is_dotted_identifier(s, offset=0, **kw)

Test if the string s is an identifier from position offset onward.

is_identifier(s, offset=0, **kw)

Test if the string s is an identifier from position offset (default 0) onward.

is_uc_identifier(s, offset=0, **kw)

Test if the string s is an uppercase identifier from position offset (default 0) onward.

isUC_(s)

Check that a string matches the regular expression ^[A-Z][A-Z_0-9]*$.

jsquote(s)

Quote a string for use in JavaScript.

lc_(value)

Return value.lower() with '-' translated into '_' and ' ' translated into '-'.

I use this to construct lowercase filenames containing a readable transcription of a title string.

See also titleify_lc(), an imperfect reversal of this.

match_tokens(s, offset, getters)

Wrapper for get_tokens which catches ValueError exceptions and returns (None,offset).

parseUC_sAttr(attr)

Take an attribute name attr and return (key,is_plural).

Examples:

  • 'FOO' returns ('FOO',False).
  • 'FOOs' or 'FOOes' returns ('FOO',True). Otherwise return (None,False).

phpquote(s)

Quote a string for use in PHP code.

r(o, max_length=None, *, use_cls=False)

Like typed_str but using repr instead of str. This is available as both typed_repr and r.

s(o, use_cls=False, use_repr=False, max_length=32)

Return "type(o).name:str(o)" for some object o. This is available as both typed_str and s.

Parameters:

  • use_cls: default False; if true, use str(type(o)) instead of type(o).__name__
  • use_repr: default False; if true, use repr(o) instead of str(o)

I use this a lot when debugging. Example:

from cs.lex import typed_str as s
......
X("foo = %s", s(foo))

skipwhite(s, offset=0)

Convenience routine for skipping past whitespace; returns the offset of the next nonwhitespace character.

slosh_mapper(c, charmap=None)

Return a string to replace backslash-c, or None.

snakecase(camelcased)

Convert a camel cased string camelcased into snake case.

Parameters:

  • cameelcased: the cameel case string to convert
  • first_letter_only: optional flag (default False); if true then just ensure that the first character of a word is uppercased, otherwise use str.title

Example:

>>> snakecase('abcDef')
'abc_def'
>>> snakecase('abcDEf')
'abc_def'
>>> snakecase('AbcDef')
'abc_def'

split_remote_path(remotepath: str) -> Tuple[Optional[str], str]

Split a path with an optional leading [user@]rhost: prefix into the prefix and the remaining path. None is returned for the prefix is there is none. This is useful for things like rsync targets etc.

stripped_dedent(s, post_indent='', sub_indent='')

Slightly smarter dedent which ignores a string's opening indent.

Algorithm: strip the supplied string s, pull off the leading line, dedent the rest, put back the leading line.

This supports my preferred docstring layout, where the opening line of text is on the same line as the opening quote.

The optional post_indent parameter may be used to indent the dedented text before return.

The optional sub_indent parameter may be used to indent the second and following lines if the dedented text before return.

Examples:

>>> def func(s):
...   """ Slightly smarter dedent which ignores a string's opening indent.
...       Strip the supplied string `s`. Pull off the leading line.
...       Dedent the rest. Put back the leading line.
...   """
...   pass
...
>>> from cs.lex import stripped_dedent
>>> print(stripped_dedent(func.__doc__))
Slightly smarter dedent which ignores a string's opening indent.
Strip the supplied string `s`. Pull off the leading line.
Dedent the rest. Put back the leading line.
>>> print(stripped_dedent(func.__doc__, sub_indent='  '))
Slightly smarter dedent which ignores a string's opening indent.
  Strip the supplied string `s`. Pull off the leading line.
  Dedent the rest. Put back the leading line.
>>> print(stripped_dedent(func.__doc__, post_indent='  '))
  Slightly smarter dedent which ignores a string's opening indent.
  Strip the supplied string `s`. Pull off the leading line.
  Dedent the rest. Put back the leading line.
>>> print(stripped_dedent(func.__doc__, post_indent='  ', sub_indent='| '))
  Slightly smarter dedent which ignores a string's opening indent.
  | Strip the supplied string `s`. Pull off the leading line.
  | Dedent the rest. Put back the leading line.

strlist(ary, sep=', ')

Convert an iterable to strings and join with sep (default ', ').

tabpadding(padlen, tabsize=8, offset=0)

Compute some spaces to use a tab padding at an offfset.

tabulate(*rows, sep=' ')

A generator yielding lines of values from rows aligned in columns.

texthexify(bs, shiftin='[', shiftout=']', whitelist=None)

Transcribe the bytes bs to text using compact text runs for some common text values.

This can be reversed with the untexthexify function.

This is an ad doc format devised to be compact but also to expose "text" embedded within to the eye. The original use case was transcribing a binary directory entry format, where the filename parts would be somewhat visible in the transcription.

The output is a string of hexadecimal digits for the encoded bytes except for runs of values from the whitelist, which are enclosed in the shiftin and shiftout markers and transcribed as is. The default whitelist is values of the ASCII letters, the decimal digits and the punctuation characters '_-+.,'. The default shiftin and shiftout markers are '[' and ']'.

String objects converted with either hexify and texthexify output strings may be freely concatenated and decoded with untexthexify.

Example:

>>> texthexify(b'&^%&^%abcdefghi)(*)(*')
'265e25265e25[abcdefghi]29282a29282a'

Parameters:

  • bs: the bytes to transcribe
  • shiftin: Optional. The marker string used to indicate a shift to direct textual transcription of the bytes, default: '['.
  • shiftout: Optional. The marker string used to indicate a shift from text mode back into hexadecimal transcription, default ']'.
  • whitelist: an optional bytes or string object indicating byte values which may be represented directly in text; the default value is the ASCII letters, the decimal digits and the punctuation characters '_-+.,'.

titleify_lc(value_lc)

Translate '-' into ' ' and '_' translated into '-', then titlecased.

See also lc_(), which this reverses imperfectly.

typed_repr(o, max_length=None, *, use_cls=False)

Like typed_str but using repr instead of str. This is available as both typed_repr and r.

typed_str(o, use_cls=False, use_repr=False, max_length=32)

Return "type(o).name:str(o)" for some object o. This is available as both typed_str and s.

Parameters:

  • use_cls: default False; if true, use str(type(o)) instead of type(o).__name__
  • use_repr: default False; if true, use repr(o) instead of str(o)

I use this a lot when debugging. Example:

from cs.lex import typed_str as s
......
X("foo = %s", s(foo))

unctrl(s, tabsize=8)

Return the string s with TABs expanded and control characters replaced with printable representations.

untexthexify(s, shiftin='[', shiftout=']')

Decode a textual representation of binary data into binary data.

This is the reverse of the texthexify function.

Outside of the shiftin/shiftout markers the binary data are represented as hexadecimal. Within the markers the bytes have the values of the ordinals of the characters.

Example:

>>> untexthexify('265e25265e25[abcdefghi]29282a29282a')
b'&^%&^%abcdefghi)(*)(*'

Parameters:

  • s: the string containing the text representation.
  • shiftin: Optional. The marker string commencing a sequence of direct text transcription, default '['.
  • shiftout: Optional. The marker string ending a sequence of direct text transcription, default ']'.

Release Log

Release 20241119: stripped_dedent: new optional sub_indent parameter for indenting the second and following lines, handy for usage messages.

Release 20241109:

  • stripped_dedent: new optional post_indent parameter to indent the dedented text.
  • New tabulate(*rows) generator function yielding lines of padded columns.

Release 20240630: New indent(paragraph,line_indent=" ") function.

Release 20240519: New get_suffix_part() to extract things line ": Part One" from something such as a TV episode name.

Release 20240316: Fixed release upload artifacts.

Release 20240211: New split_remote_path() function to recognise [[user@]host]:path.

Release 20231018: New is_uc_identifier function.

Release 20230401: Import update.

Release 20230217.1: Fix package requirements.

Release 20230217:

  • New get_prefix_n function to parse a numeric value preceeded by a prefix.
  • Drop strip_prefix_n, get_prefix_n is more general and I had not got around to using strip_prefix_n yet - when I did, I ended up writing get_prefix_n.

Release 20230210:

  • @has_format_attributes: new optional inherit parameter to inherit superclass (or other) format attributes, default False.
  • New FNumericMixin, FFloat, FInt FormatableMixin subclasses like FStr - they add .localtime and .utctime formattable attributes.

Release 20220918: typed_str(): crop the value part, default max_length=32, bugfix message cropping.

Release 20220626:

  • Remove dependency on cs.py3, we've been Python 2 incompatible for a while.
  • FormatableFormatter.format_field: promote None to FStr(None).

Release 20220227:

  • typed_str,typed_repr: make max_length the first optional positional parameter, make other parameters keyword only.
  • New camelcase() and snakecase() functions.

Release 20211208: Docstring updates.

Release 20210913:

  • FormatableFormatter.FORMAT_RE_ARG_NAME_s: strings commencing with digits now match \d+(.\d+)[a-z]+, eg "02d".
  • Alias typed_str as s and typed_repr as r.
  • FormatableFormatter: new .format_mode thread local state object initially with strict=False, used to control whether unknown fields leave a placeholder or raise KeyError.
  • FormatableFormatter.format_field: assorted fixes.

Release 20210906: New strip_prefix_n() function to strip a leading prefix and numeric value n from the start of a string.

Release 20210717:

  • Many many changes to FormatableMixin, FormatableFormatter and friends around supporting {foo|conv1|con2|...} instead of {foo!conv}. Still in flux.
  • New typed_repr like typed_str but using repr.

Release 20210306:

  • New cropped() function to crop strings.
  • Rework cropped_repr() to do the repr() itself, and to crop the interiors of tuples and lists.
  • cropped_repr: new inner_max_length for cropping the members of collections.
  • cropped_repr: special case for length=1 tuples.
  • New typed_str(o) object returning type(o).name:str(o) in the default case, useful for debugging.

Release 20201228: Minor doc updates.

Release 20200914:

  • Hide terribly special purpose lastlinelen() in cs.hier under a private name.
  • New common_prefix and common_suffix function to compare strings.

Release 20200718: get_chars: accept a callable for gochars, indicating a per character test function.

Release 20200613: cropped_repr: replace hardwired 29 with computed length

Release 20200517:

  • New get_ini_clausename to parse "[clausename]".
  • New get_ini_clause_entryname parsing "[clausename]entryname".
  • New cropped_repr for returning a shortened repr()+"..." if the length exceeds a threshold.
  • New format_escape function to double {} characters to survive str.format.

Release 20200318:

  • New lc_() function to lowercase and dash a string, new titleify_lc() to mostly reverse lc_().
  • New format_as function, FormatableMixin and related FormatAsError.

Release 20200229: New cutprefix and cutsuffix functions.

Release 20190812: Fix bad slosh escapes in strings.

Release 20190220: New function get_qstr_or_identifier.

Release 20181108: new function get_decimal_or_float_value to read a decimal or basic float

Release 20180815: No semantic changes; update some docstrings and clean some lint, fix a unit test.

Release 20180810:

  • New get_decimal_value and get_hexadecimal_value functions.
  • New stripped_dedent function, a slightly smarter textwrap.dedent.

Release 20171231: New function get_decimal. Drop unused function dict2js.

Release 20170904: Python 2/3 ports, move rfc2047 into new cs.rfc2047 module.

Release 20160828:

  • Use "install_requires" instead of "requires" in DISTINFO.
  • Discard str1(), pointless optimisation.
  • unrfc2047: map _ to SPACE, improve exception handling.
  • Add phpquote: quote a string for use in PHP code; add docstring to jsquote.
  • Add is_identifier test.
  • Add get_dotted_identifier.
  • Add is_dotted_identifier.
  • Add get_hexadecimal.
  • Add skipwhite, convenince wrapper for get_white returning just the next offset.
  • Assorted bugfixes and improvements.

Release 20150120: cs.lex: texthexify: backport to python 2 using cs.py3 bytes type

Release 20150118: metadata updates

Release 20150116: PyPI metadata and slight code cleanup.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cs_lex-20241119.tar.gz (60.5 kB view details)

Uploaded Source

Built Distribution

cs_lex-20241119-py3-none-any.whl (30.9 kB view details)

Uploaded Python 3

File details

Details for the file cs_lex-20241119.tar.gz.

File metadata

  • Download URL: cs_lex-20241119.tar.gz
  • Upload date:
  • Size: 60.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.6

File hashes

Hashes for cs_lex-20241119.tar.gz
Algorithm Hash digest
SHA256 e1247c467f03e3458d6a6fcd79e4ea85a54383c2cc6c02aa79f68ee376557a05
MD5 ace261ea41dffa022cd5f90b3249a9e4
BLAKE2b-256 6f12572c615d27e1938e832357a0a6be9ca2f0404256375212d9728ced516bdb

See more details on using hashes here.

File details

Details for the file cs_lex-20241119-py3-none-any.whl.

File metadata

  • Download URL: cs_lex-20241119-py3-none-any.whl
  • Upload date:
  • Size: 30.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.6

File hashes

Hashes for cs_lex-20241119-py3-none-any.whl
Algorithm Hash digest
SHA256 6b4a142314c69615513c2e1491f392852ec27c31a68744022d9cb0288d6f9a46
MD5 902601d12fc6c16af86cce5361732132
BLAKE2b-256 64f7493ceb22cdd555006d6cdd8f2ac9b7c919894cbc6085cee632b863a891de

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page