The big package is a grab-bag of cool code for use in your programs.
Project description
Copyright 2022-2026 by Larry Hastings
big is a Python package of small functions and classes that aren't big enough to get a package of their own. It's zillions of useful little bits of Python code I always want to have handy.
For years, I've copied-and-pasted all my little helper functions between projects--we've all done it. But now I've finally taken the time to consolidate all those useful little functions into one big package--no more copy-and-paste, I just install one package and I'm ready to go. And, since it's a public package, you can use 'em too!
Not only that, but I've taken my time and re-thought and retooled a lot of this code. All the difficult-to-use, overspecialized, cheap hacks I've lived with for years have been upgraded with elegant, intuitive APIs and dazzling functionality. big is chock full of the sort of little functions and classes we've all hacked together a million times--only with all the API gotchas fixed, and thoroughly tested with 100% coverage. It's the missing batteries Python never shippet. It's the code you would have written... if only you had the time. And every API is a pleasure to use!
big requires Python 3.6 or newer. It has no required dependencies to run. (big's test suite havs a few external dependencies, but big itself will run fine without them.) big is 100% pure Python code--no C extension needed, no compilation step.
The current version is 0.13.1.
Think big!
Why use big?
It's true that much of the code in big is short, and one might reasonably have the reaction "that's so short, it's easier to write it from scratch every time I need it than remember where it is and how to call it". I still see value in these short functions in big because:
- everything in big is tested,
- every interface in big has been thoughtfully considered and designed.
For example, consider
Log(*destinations, **options).
It's easy to write a quick little disposable log function.
I should know; I've done it myself, many times.
But big's Log class is feature-rich, thoroughly debugged,
and lightning fast. Rather than waste your time hacking
together something cheap, just use big!
Using big
To use big, just install the big package (and its dependencies) from PyPI using your favorite Python package manager.
Once big is installed, you can simply import it. However, the top-level big package doesn't contain anything but a version number. Internally big is broken up into submodules, aggregated together loosely by problem domain, and you can selectively import just the functions you want. For example, if you only want to use the text functions, just import the text submodule:
import big.text
If you'd prefer to import everything all at once, simply import the big.all module. This one module imports all the other modules, and imports all their symbols too. So, one convenient way to work with big is this:
import big.all as big
That will make every symbol defined in big accessible from the big
object. For example, if you want to use
multisplit,
you can access it with just big.multisplit.
You can also use big.all with import *:
from big.all import *
but that's up to you. Me, I generally use import big.all as big .
big is licensed using the MIT license. You're free to use it and even ship it in your own programs, as long as you leave my copyright notice on the source code.
The best of big
Although big is crammed full of fabulous code, a few of its subsystems rise above the rest. If you're curious what big might do for you, here are the six things in big I'm proudest of:
linked_liststring- Bound inner classes
- The
multi-family of string functions big.state- Enhanced
TopologicalSorter
And here are six little functions/classes I use all the time:
Index
Modules
Functions, classes, and values
-
accessor(attribute='state', state_manager='state_manager')combine_splits(s, *split_arrays)date_ensure_timezone(d, timezone)date_set_timezone(d, timezone)datetime_ensure_timezone(d, timezone)datetime_set_timezone(d, timezone)decode_python_script(script, *, newline=None, use_bom=True, use_source_code_encoding=True)Delimiter(close, *, escape='', multiline=True, quoting=False)dispatch(state_manager='state_manager', *, prefix='', suffix='')encode_strings(o, *, encoding='ascii')Event(scheduler, event, time, priority, sequence)eval_template_string(s, globals, locals=None, *, ...)Formatter(template, map=None, *, stretch=True, width=79, **kwargs)fgrep(path, text, *, encoding=None, enumerate=False, case_insensitive=False)gently_title(s, *, apostrophes=None, double_quotes=None)get_float(o, default=_sentinel)get_int_or_float(o, default=_sentinel)grep(path, pattern, *, encoding=None, enumerate=False, flags=0)int_to_words(i, *, flowery=True, ordinal=False)Interpolation(expression, *filters, debug='')iterator_context(iterator, start=0)iterator_filter(iterator, *, ...)linked_list(iterable=(), *, lock=None)linked_list.copy(*, lock=None)linked_list.cut(start=None, stop=None, *, lock=None)linked_list.extendleft(iterable)linked_list.index(value, start=0, stop=sys.maxsize)linked_list.insert(index, object)linked_list.move(where, start=None, stop=None)linked_list.rcut(start=None, stop=None, *, lock=None)linked_list.remove(value, default=undefined)linked_list.rmove(where, start=None, stop=None)linked_list.rremove(value, default=undefined)linked_list.rsplice(other, *, where=None)linked_list.sort(key=None, reverse=False)linked_list.splice(other, *, where=None)linked_list_iterator.after(count=1)linked_list_iterator.append(value)linked_list_iterator.before(count=1)linked_list_iterator.count(value)linked_list_iterator.cut(stop=None, *, lock=None)linked_list_iterator.exhaust()linked_list_iterator.extend(iterable)linked_list_iterator.find(value)linked_list_iterator.insert(index, object)linked_list_iterator.is_special()linked_list_iterator.linked_listlinked_list_iterator.match(predicate)linked_list_iterator.move(where, stop=None)linked_list_iterator.next(default=undefined, *, count=1)linked_list_iterator.pop(index=0)linked_list_iterator.prepend(value)linked_list_iterator.previous(default=undefined, *, count=1)linked_list_iterator.rcount(value)linked_list_iterator.rcut(stop=None, *, lock=None)linked_list_iterator.remove(value, default=undefined)linked_list_iterator.rextend(iterable)linked_list_iterator.rfind(value)linked_list_iterator.rmatch(predicate)linked_list_iterator.rmove(where, stop=None)linked_list_iterator.rpop(index=0)linked_list_iterator.rremove(value, default=undefined)linked_list_iterator.rsplice(other)linked_list_iterator.rtruncate()linked_list_iterator.special()linked_list_iterator.splice(other)linked_list_iterator.truncate()linked_list_reverse_iterator.after(count=1)linked_list_reverse_iterator.append(value)linked_list_reverse_iterator.before(count=1)linked_list_reverse_iterator.copy()linked_list_reverse_iterator.count(value)linked_list_reverse_iterator.cut(stop=None, *, lock=None)linked_list_reverse_iterator.exhaust()linked_list_reverse_iterator.extend(iterable)linked_list_reverse_iterator.find(value)linked_list_reverse_iterator.insert(index, object)linked_list_reverse_iterator.is_special()linked_list_reverse_iterator.linked_listlinked_list_reverse_iterator.match(predicate)linked_list_reverse_iterator.move(where, stop=None)linked_list_reverse_iterator.next(default=undefined, *, count=1)linked_list_reverse_iterator.pop(index=0)linked_list_reverse_iterator.prepend(value)linked_list_reverse_iterator.previous(default=undefined, *, count=1)linked_list_reverse_iterator.rcount(value)linked_list_reverse_iterator.rcut(stop=None, *, lock=None)linked_list_reverse_iterator.remove(value, default=undefined)linked_list_reverse_iterator.reset()linked_list_reverse_iterator.rextend(iterable)linked_list_reverse_iterator.rfind(value)linked_list_reverse_iterator.rmatch(predicate)linked_list_reverse_iterator.rmove(where, stop=None)linked_list_reverse_iterator.rpop(index=0)linked_list_reverse_iterator.rremove(value, default=undefined)linked_list_reverse_iterator.rsplice(other)linked_list_reverse_iterator.rtruncate()linked_list_reverse_iterator.special()linked_list_reverse_iterator.splice(other)linked_list_reverse_iterator.truncate()Log.Destination.Buffer(destination=None)Log.File(path, initial_mode="at", *, flush=False)Log.FileHandle(handle, *, flush=False)Log.print(*args, end='\n', sep=' ', flush=False, format='print')multipartition(s, separators, count=1, *, reverse=False, separate=True)multisplit(s, separators, *, keep=False, maxsplit=-1, reverse=False, separate=False, strip=False)multistrip(s, separators, left=True, right=True)normalize_whitespace(s, separators=None, replacement=None)parse_template_string(s, *, ...)parse_timestamp_3339Z(s, *, timezone=None)PushbackIterator(iterable=None)PushbackIterator.next(default=None)prefix_format(time_seconds_width, time_fractional_width, thread_name_width=12)read_python_file(path, *, newline=None, use_bom=True, use_source_code_encoding=True)re_partition(text, pattern, count=1, *, flags=0, reverse=False)re_rpartition(text, pattern, count=1, *, flags=0)reversed_re_finditer(pattern, string, flags=0)Scheduler(regulator=default_regulator)Scheduler.schedule(o, time, *, absolute=False, priority=DEFAULT_PRIORITY)split_delimiters(s, delimiters={...}, *, state=(), yields=None)split_quoted_strings(s, quotes=('"', "'"), *, escape='\\', multiline_quotes=(), state='')split_text_with_code(s, *, tab_width=8, allow_code=True, code_indent=4, convert_tabs_to_spaces=True)split_title_case(s, *, split_allcaps=True)StateManager(state, *, on_enter='on_enter', on_exit='on_exit', state_class=None)strip_indents(lines, *, tab_width=8, linebreaks=linebreaks)timestamp_3339Z(t=None, want_microseconds=None)timestamp_human(t=None, want_microseconds=None, *, tzinfo=None)TopologicalSorter.remove(node)TopologicalSorter.View.close()TopologicalSorter.View.done(*nodes)TopologicalSorter.View.print(print=print)TopologicalSorter.View.ready()TopologicalSorter.View.reset()translate_filename_to_exfat(s)unicode_linebreaks_without_crlf
Tutorials
API Reference, By Module
big.all
-
This submodule doesn't define any of its own symbols. Instead, it imports every other submodule in big, and uses
import *to import every symbol from every other submodule, too. Every public symbol in big is available inbig.all.When I'm using big in my own projects, I tend to import it as
import big.all as big
That way, all big's symbols are available as one big flat namespace.
big.boundinnerclass
-
Class decorators that implement bound inner classes. See the Bound inner classes tutorial for more information.
BoundInnerClass(cls)
-
Class decorator for an inner class. When accessing the inner class through an instance of the outer class, "binds" the inner class to the instance. This changes the signature of the inner class's
__init__fromdef __init__(self, *args, **kwargs):`
to
def __init__(self, outer, *args, **kwargs):
where
outeris the instance of the outer class.Compare this to functions:
- If you put a function inside a class, and access it through an instance I of that class, the function becomes a method. When you call the method, I is automatically passed in as the first argument.
- If you put a class inside a class,
and access it through an instance of that class,
the class becomes a bound inner class. When
you call the bound inner class, I is automatically
passed in as the second argument to
__init__, afterself.
Note that this has an implication for all subclasses. If class B is decorated with
BoundInnerClass, and class S is a subclass of B, such thatissubclass(S,B), class S must be decorated with eitherBoundInnerClassorUnboundInnerClass.
UnboundInnerClass(cls)
-
Class decorator for an inner class that prevents binding the inner class to an instance of the outer class.
If class B is decorated with
BoundInnerClass, and class S is a subclass of B, such thatissubclass(S,B)returnsTrue, class S must be decorated with eitherBoundInnerClassorUnboundInnerClass.
bound_inner_base(cls)
-
Simple wrapper for Python 3.6 compatibility for bound inner classes.
Returns the base class for declaring a subclass of a bound inner class while still in the outer class scope. Only needed for Python 3.6 compatibility; unnecessary in Python 3.7+, or when the child class is defined after exiting the outer class scope.
See the Bound inner classes tutorial for more information.
bound_to(cls)
-
Returns the outer instance that
clsis bound to, orNone.If
clsis a bindable inner class that was bound to an outer instance, returns that outer instance. Ifclsis any other variety of type object, returnsNone. RaisesTypeErrorifclsis not a class object.BoundInnerClassdoesn't keep strong references to outer instances. Ifclswas bound to an object that has since been destroyed,bound_towill returnNone.See the Bound inner classes tutorial for more information.
BOUNDINNERCLASS_OUTER_ATTR
-
A string constant containing the attribute name that
BoundInnerClassuses to store its per-instance cache on the outer instance. If your outer class uses__slots__, you must include this attribute in your slots definition.However, rather than using this attribute directly, we suggest you use
BOUNDINNERCLASS_OUTER_SLOTSto add the necessary attribute to your__slots__tuple.See the Bound inner classes tutorial for more information.
BOUNDINNERCLASS_OUTER_SLOTS
-
A tuple containing
BOUNDINNERCLASS_OUTER_ATTR. If your outer class uses__slots__, you can add this to your slots definition to ensureBoundInnerClassworks correctly.Example:
class Foo: __slots__ = ('x', 'y', 'z') + BOUNDINNERCLASS_SLOTS @BoundInnerClass class Bar: ...
See the Bound inner classes tutorial for more information.
is_bound(cls)
-
Returns
Trueifclsis a bound inner class that has been bound to a specific outer instance. Said another way,is_bound(cls)returnsTrueifbound_to(cls)would return a non-Nonevalue.Returns
Falsefor unbound inner classes and non-participating classes. RaisesTypeErrorifclsis not a class object.See the Bound inner classes tutorial for more information.
is_boundinnerclass(cls)
-
Returns
Trueifclswas decorated with@BoundInnerClass, or is a bound wrapper class created from one.Returns
Falsefor@UnboundInnerClassclasses and regular classes. RaisesTypeErrorifclsis not a class object.See the Bound inner classes tutorial for more information.
is_unboundinnerclass(cls)
-
Returns
Trueifclswas decorated with@UnboundInnerClass, or is a wrapper class created from one.Returns
Falsefor@BoundInnerClassclasses and regular classes. RaisesTypeErrorifclsis not a class object.See the Bound inner classes tutorial for more information.
type_bound_to(instance)
-
Returns the outer instance that
type(instance)is bound to, orNone.This is a convenience function equivalent to calling
bound_to(type(instance)).BoundInnerClassdoesn't keep strong references to outer instances. Iftype(instance)was bound to an object that has since been destroyed,type_bound_towill returnNone.See the Bound inner classes tutorial for more information.
unbound(cls)
-
Returns the unbound version of a bound class.
If
clsis a bound inner class, returns the original unbound class. Ifclsis already unbound (or not a bindable inner class), returnscls.Raises
ValueErrorifclsinherits directly from a bound class (e.g.class Child(o.Inner)), since such classes have no unbound version. RaisesTypeErrorifclsis not a class object.See the Bound inner classes tutorial for more information.
big.builtin
-
Fundamental functions and types that don't fit neatly into any other submodule. (Named
builtinto avoid a name collision with Python'sbuiltinsmodule.)
ClassRegistry()
-
A
dictsubclass with attribute-style access, useful as a class decorator for registering base classes.BoundInnerClassencourages heavily-nested classes, but Python's scoping rules make it clumsy to reference base classes defined in a different class scope.ClassRegistrysolves this by giving you a place to store references to base classes you can access later.To use, create a
ClassRegistryinstance, then use it as a decorator to register classes. Access registered classes as attributes on theClassRegistry. By default the class's__name__is used as the attribute name; pass a string argument to use a custom name instead.When using with
BoundInnerClass, put@base()above@BoundInnerClass.
get_float(o, default=_sentinel)
-
Returns
float(o), unless that conversion fails, in which case returns the default value. If you don't pass in an explicit default value, the default value iso.
get_int(o, default=_sentinel)
-
Returns
int(o), unless that conversion fails, in which case returns the default value. If you don't pass in an explicit default value, the default value iso.
get_int_or_float(o, default=_sentinel)
-
Converts
ointo a number, preferring an int to a float.If
ois already an int or float, returnsounchanged. Otherwise, triesint(o). If that conversion succeeds, returns the result. Otherwise, triesfloat(o). If that conversion succeeds, returns the result. Otherwise returns the default value. If you don't pass in an explicit default value, the default value iso.
ModuleManager()
-
A class that manages your module's namespace, including
__all__.ModuleManagermakes it easy to populate__all__and clean up temporary symbols. Instantiate aModuleManagerat module scope, use its methods to declare exports and deletions, then call the instance at the end of your module to finalize.ModuleManagerprovides two methods, both of which can be used as decorators or called with string arguments:mm.export(*args)adds symbols to__all__. When used as a decorator, adds the decorated function or class by name. When called with strings, adds those strings to__all__.mm.delete(*args)marks symbols for deletion. When used as a decorator, marks the decorated function or class for deletion. When called with strings, marks those names for deletion.When the
ModuleManagerinstance is called, it deletes all symbols on the deletions list from the module namespace. It also automatically deletes itself, and any module-level references to itsexportanddeletemethods.
pure_virtual()
-
A decorator for class methods. When you have a method in a base class that's "pure virtual"--that must not be called, but must be overridden in child classes--decorate it with
@pure_virtual(). Calling that method will throw aNotImplementedError.Note that the body of any function decorated with
@pure_virtual()is ignored. By convention the body of these methods should contain only a single ellipsis, literally like this:class BaseClass: @big.pure_virtual() def on_reset(self): ...
try_float(o)
-
Returns
Trueifocan be converted into afloat, andFalseif it can't.
try_int(o)
-
Returns
Trueifocan be converted into anint, andFalseif it can't.
big.deprecated
-
Old versions of functions (and classes) from big. These versions are deprecated, either because the name was changed, or the semantics were changed, or both.
Unlike the other modules, the contents of
big.deprecatedaren't automatically imported intobig.all. (big.alldoes import thedeprecatedsubmodule, it just doesn'tfrom deprected import *all the symbols.)
big.file
-
Functions for working with files, directories, and I/O.
fgrep(path, text, *, encoding=None, enumerate=False, case_insensitive=False)
-
Find the lines of a file that match some text, like the UNIX
fgreputility program.pathshould be an object representing a path to an existing file, one of:- a string,
- a bytes object, or
- a
pathlib.Pathobject.
textshould be either string or bytes.encodingis used as the file encoding when opening the file.- If
textis a str, the file is opened in text mode. - If
textis a bytes object, the file is opened in binary mode.encodingmust beNonewhen the file is opened in binary mode.
If
case_insensitiveis true, perform the search in a case-insensitive manner.Returns a list of lines in the file containing
text. The lines are either strings or bytes objects, depending on the type ofpattern. The lines have their newlines stripped but preserve all other whitespace.If
enumerateis true, returns a list of tuples of (line_number, line). The first line of the file is line number 1.For simplicity of implementation, the entire file is read in to memory at one time. If
case_insensitiveis true,fgrepalso makes a lowercased copy.
file_mtime(path)
-
Returns the modification time of
path, in seconds since the epoch. Note that seconds is a float, indicating the sub-second with some precision.
file_mtime_ns(path)
-
Returns the modification time of
path, in nanoseconds since the epoch.
file_size(path)
-
Returns the size of the file at
path, as an integer representing the number of bytes.
grep(path, pattern, *, encoding=None, enumerate=False, flags=0)
-
Look for matches to a regular expression pattern in the lines of a file, similarly to the UNIX
greputility program.pathshould be an object representing a path to an existing file, one of:- a string,
- a bytes object, or
- a
pathlib.Pathobject.
patternshould be an object containing a regular expression, one of:- a string,
- a bytes object, or
- an
re.Pattern, initialized with eitherstrorbytes.
encodingis used as the file encoding when opening the file.If
patternuses astr, the file is opened in text mode. Ifpatternuses a bytes object, the file is opened in binary mode.encodingmust beNonewhen the file is opened in binary mode.flagsis passed in as theflagsargument tore.compileifpatternis a string or bytes. (It's ignored ifpatternis anre.Patternobject.)Returns a list of lines in the file matching the pattern. The lines are either strings or bytes objects, depending on the type of
text. The lines have their newlines stripped but preserve all other whitespace.If
enumerateis true, returns a list of tuples of(line_number, line). The first line of the file is line number 1.For simplicity of implementation, the entire file is read in to memory at one time.
Tip: to perform a case-insensitive pattern match, pass in the
re.IGNORECASEflag into flags for this function (if pattern is a string or bytes) or when creating your regular expression object (if pattern is anre.Patternobject.(In older versions of Python,
re.Patternwas a private type calledre._pattern_type.)
pushd(directory)
-
A context manager that temporarily changes the directory. Example:
with big.pushd('x'): pass
This would change into the
'x'subdirectory before executing the nested block, then change back to the original directory after the nested block.You can change directories in the nested block; this won't affect pushd restoring the original current working directory upon exiting the nested block.
You can safely nest
with pushdblocks.
read_python_file(path, *, newline=None, use_bom=True, use_source_code_encoding=True)
-
Opens, reads, and correctly decodes a Python script from a file.
pathshould specify the filesystem path to the file; it can be any object accepted bybuiltins.open(a "path-like object").Returns a
strcontaining the decoded Python script.Opens the file using
builtins.open.Decodes the script using big's
decode_python_scriptfunction. Thenewline,use_bomanduse_source_code_encodingparameters are passed through to that function.
safe_mkdir(path)
-
Ensures that a directory exists at
path. If this function returns and doesn't raise, it guarantees that a directory exists atpath.If a directory already exists at
path,safe_mkdirdoes nothing.If a file exists at
path,safe_mkdirunlinkspaththen creates the directory.If the parent directory doesn't exist,
safe_mkdircreates that directory, then createspath.This function can still fail:
pathcould be on a read-only filesystem.- You might lack the permissions to create
path. - You could ask to create the directory
x/yandxis a file (not a directory).
safe_unlink(path)
-
Unlinks
path, ifpathexists and is a file.
search_path(paths, extensions=('',), *, case_sensitive=None, preserve_extension=True, want_directories=False, want_files=True)
-
Search a list of directories for a file. Given a sequence of directories, an optional list of file extensions, and a filename, searches those directories for a file with that name and possibly one of those file extensions.
search_pathaccepts the paths and extensions as parameters and returns a search function. The search function accepts onefilenameparameter and performs the search, returning either the path to the file it found (as apathlib.Pathobject) orNone. You can reuse the search function to perform as many searches as you like.pathsshould be an iterable ofstrorpathlib.Pathobjects representing directories. These may be relative or absolute paths; relative paths will be relative to the current directory at the time the search function is run. Specifying a directory that doesn't exist is not an error.extensionsshould be an iterable ofstrobjects representing extensions. Every non-empty extension specified should start with a period ('.') character (technicallyos.extsep). You may specify at most one empty string in extensions, which represents testing the filename without an additional extension. By defaultextensionsis the tuple `('',)``. Extension strings may contain additional period characters after the initial one.Shell-style "globbing" isn't supported for any parameter. Both the filename and the extension strings may contain filesystem globbing characters, but they will only match those literal characters themselves. (
'*'won't match any character, it'll only match a literal'*'in the filename or extension.)case_sensitiveworks like the parameter topathlib.Path.glob. Ifcase_sensitiveis true, files found while searching must match the filename and extension exactly. Ifcase_sensitiveis false, the comparison is done in a case-insensitive manner. Ifcase_sensitiveisNone(the default), case sensitivity obeys the platform default (as peros.path.normcase). In practice, only Windows platforms are case-insensitive by convention; all other platforms that support Python are case-sensitive by convention.If
preserve_extensionis true (the default), the search function checks the filename to see if it already ends with one of the extensions. If it does, the search is restricted to only files with that extension--the other extensions are ignored. This check obeys thecase_sensitiveflag; ifcase_sensitiveis None, this comparison is case-insensitive only on Windows.want_filesandwant_directoriesare boolean values; the search function will only return that type of file if the corresponding want_ parameter is true. You can request files, directories, or both. (want_filesandwant_directoriescan't both be false.) By default,want_filesis true andwant_directoriesis false.pathsandextensionsare both tried in order, and the search function returns the first match it finds. All extensions are tried in a path entry before considering the next path.Returns a function:
search(filename)
which returns either a
pathlib.Pathobject on success orNoneon failure.
touch(path)
-
Ensures that
pathexists, and its modification time is the current time.If
pathdoes not exist, creates an empty file.If
pathexists, updates its modification time to the current time.
translate_filename_to_exfat(s)
-
Ensures that all characters in s are legal for a FAT filesystem.
Returns a copy of
swhere every character not allowed in a FAT filesystem filename has been replaced with a character (or characters) that are permitted.
translate_filename_to_unix(s)
-
Ensures that all characters in s are legal for a UNIX filesystem.
Returns a copy of
swhere every character not allowed in a UNIX filesystem filename has been replaced with a character (or characters) that are permitted.
big.graph
-
A drop-in replacement for Python's
graphlib.TopologicalSorterwith an enhanced API. This version ofTopologicalSorterallows modifying the graph at any time, and supports multiple simultaneous views, allowing iteration over the graph more than once.See the Enhanced
TopologicalSortertutorial for more information.
CycleError()
-
Exception thrown by
TopologicalSorterwhen it detects a cycle.
TopologicalSorter(graph=None)
-
An object representing a directed graph of nodes. See Python's
graphlib.TopologicalSorterfor concepts and the basic API.
New methods on TopologicalSorter:
TopologicalSorter.copy()
-
Returns a shallow copy of the graph. The copy also duplicates the state of
get_readyanddone.
TopologicalSorter.cycle()
-
Checks the graph for cycles. If no cycles exist, returns None. If at least one cycle exists, returns a tuple containing nodes that constitute a cycle.
TopologicalSorter.print(print=print)
-
Prints the internal state of the graph. Used for debugging.
printis the function used for printing; it should behave identically to the builtinprintfunction.
TopologicalSorter.remove(node)
-
Removes
nodefrom the graph.If any node
Pdepends on a nodeN, andNis removed, this dependency is also removed, butPis not removed from the graph.Note that, while
remove()works, it's slow. (It's O(N).)TopologicalSorteris optimized for fast adds and fast views.
TopologicalSorter.reset()
-
Resets
get_readyanddoneto their initial state.
TopologicalSorter.view()
-
Returns a new
Viewobject on this graph.
TopologicalSorter.View
-
A view on a
TopologicalSortergraph object. Allows iterating over the nodes of the graph in dependency order.
Methods on a View object:
TopologicalSorter.View.__bool__()
-
Returns
Trueif more work can be done in the view--if there are nodes waiting to be yielded byget_ready, or waiting to be returned bydone.Aliased to
TopologicalSorter.is_activefor compatibility with graphlib.
TopologicalSorter.View.close()
-
Closes the view. A closed view can no longer be used.
TopologicalSorter.View.copy()
-
Returns a shallow copy of the view, duplicating its current state.
TopologicalSorter.View.done(*nodes)
-
Marks nodes returned by
readyas "done", possibly allowing additional nodes to be available fromready.
TopologicalSorter.View.print(print=print)
-
Prints the internal state of the view, and its graph. Used for debugging.
printis the function used for printing; it should behave identically to the builtinprintfunction.
TopologicalSorter.View.ready()
-
Returns a tuple of "ready" nodes--nodes with no predecessors, or nodes whose predecessors have all been marked "done".
Aliased to
TopologicalSorter.get_readyfor compatibility withgraphlib.
TopologicalSorter.View.reset()
-
Resets the view to its initial state, forgetting all "ready" and "done" state.
big.heap
-
Functions for working with heap objects. Well, just one heap object really.
Heap(i=None)
-
An object-oriented wrapper around the
heapqlibrary, designed to be easy to use--and easy to remember how to use. Theheapqlibrary implements a binary heap, a data structure used for sorting; you add objects to the heap, and you can then remove objects in sorted order. Heaps are useful because they have are efficient both in space and in time; they're also inflexible, in that iterating over the sorted items is destructive.The
HeapAPI in big mimics thelistandcollections.dequeobjects; this way, all you need to remember is "it works kinda like alistobject". Youappendnew items to the heap, thenpopleftthem off in sorted order.By default
Heapcreates an empty heap. If you pass in an iterableito the constructor, this is equivalent to calling theextend(i)on the freshly-constructedHeap.In addition to the below methods,
Heapobjects support iteration,len, theinoperator, and use as a boolean expression. You can also index or slice into aHeapobject, which behaves as if the heap is a list of objects in sorted order. Getting the first item (Heap[0], aka peek) is cheap, the other operations can get very expensive.
Methods on a Heap object:
Heap.append(o)
-
Adds object
oto the heap.
Heap.clear()
-
Removes all objects from the heap, resetting it to empty.
Heap.copy()
-
Returns a shallow copy of the heap. Only duplicates the heap data structures itself; does not duplicate the objects in the heap.
Heap.extend(i)
-
Adds all the objects from the iterable
ito the heap.
Heap.remove(o)
-
If object
ois in the heap, removes it. Ifois not in the heap, raisesValueError.
Heap.popleft()
-
If the heap is not empty, returns the first item in the heap in sorted order. If the heap is empty, raises
IndexError.
Heap.append_and_popleft(o)
-
Equivalent to calling
Heap.append(o)immediately followed byHeap.popleft(). Ifois smaller than any other object in the heap at the time it's added, this will returno.
Heap.popleft_and_append(o)
-
Equivalent to calling
Heap.popleft()immediately followed byHeap.append(o). This method will never returno, unlessowas already in the heap before the method was called.
Heap.queue
-
Not a method, a property. Returns a copy of the contents of the heap, in sorted order.
big.itertools
-
Functions and classes for working with iteration.
iterator_context(iterator, start=0)
-
Iterates over
iterable. Yields(ctx, o)whereois each value yielded byiterable, andctxis a "context" variable of typeIteratorContextcontaining metadata about the iteration.ctxsupports the following attributes: - `ctx.countdown`
- contains the "opposite" value of `ctx.index`. The values yielded by `ctx.countdown` are the same as `ctx.index`, but in reversed order. (If `start` is 0, and the iterator yields four items, `ctx.index` will be `0`, `1`, `2`, and `3` in that order, and `ctx.countdown` will be `3`, `2`, `1`, and `0` in that order.) `ctx.countdown` requires the iterator to support `__len__`; if it doesn't, `ctx.countdown` will be undefined.
- `ctx.current`
- contains the current value yielded by the iterator (`o` as described above).
- `ctx.index`
-
contains the index of this value. The first
time the iterator yields a value, this will be `start`;
the second time, it will be `start + 1`, etc.
- `ctx.is_first`
- is true only for the first value yielded, and false otherwise.
- `ctx.is_last`
- is true only for the last value yielded, and false otherwise. (If the iterator only yields one value, `is_first` and `is_last` will both be true.)
- `ctx.length`
- contain the total number of items that will be yielded. `ctx.length` requires the iterator to support `__len__`; if it doesn't, `ctx.length` will be undefined.
- `ctx.next`
- contains the next value to be yielded by this iterator if there is one. (If `o` is the last value yielded by the iterator, `ctx.previous` will be an `undefined` value.)
- `ctx.previous`
- contains the previous value yielded if this is the second or subsequent time this iterator has yielded a value. (If this is the first time the iterator has yielded, `ctx.previous` will be an `undefined` value.)
iterator_filter(iterator, *, stop_at_value=undefined, stop_at_in=None, stop_at_predicate=None, stop_at_count=None, reject_value=undefined, reject_in=None, reject_predicate=None, only_value=undefined, only_in=None, only_predicate=None)
-
Wraps any iterator, filtering the values it yields based on rules you specify as keyword-only parameters.
There are three categories of rules, examined in order:
"stop_at" rules cause the iterator to become exhausted. If a value passes a "stop_at" rule, the iterator immediately becomes exhausted without yielding that value.
"reject" rules act as a blacklist. If a value passes any "reject" rule, it's discarded and iteration continues.
"only" rules act as a whitelist. If a value doesn't pass all "only" rules, it's discarded and iteration continues.
Each category supports three suffix variants that define the test:
A rule ending in
_valuepasses if the yielded value==the argument.A rule ending in
_inpasses if the yielded value isinthe argument (which must support theinoperator).A rule ending in
_predicatetakes a callable as its argument; it passes if calling the argument with the yielded value returns a true value.There is one additional rule:
stop_at_count, an integer. The iterator becomes exhausted after yieldingstop_at_countitems. Ifstop_at_countis initially<= 0, the iterator is initialized in an exhausted state.
PushbackIterator(iterable=None)
-
Wraps any iterator, letting you push items to be yielded first.
The
PushbackIteratorconstructor accepts one argument, an iterable. When you iterate over thePushbackIteratorinstance, it yields values from that iterable. You may also pass inNone, in which case thePushbackIteratoris created in an "exhausted" state.PushbackIteratoralso supports a `push(o)`` method, which "pushes" that object onto the iterator. If any objects have been pushed onto the iterator, they're yielded first, before attempting to yield from the wrapped iterator. Pushed values are yielded in first-in-first-out order, like a stack.When the wrapped iterable is exhausted, you can still call push to add new items, at which point the
PushbackIteratorcan be iterated over again.
PushbackIterator.next(default=None)
-
Equivalent to
next(PushbackIterator), but won't raiseStopIteration. If the iterator is exhausted, returns thedefaultargument.
PushbackIterator.push(o)
-
Pushes a value into the iterator's internal stack. When a
PushbackIteratoris iterated over, and there are any pushed values, the top value on the stack will be popped and yielded.PushbackIteratoronly yields from the iterator it wraps when this internal stack is empty.Example: you have a pushback iterator
J, and you callJ.push(3)followed byJ.push('x'). The next two times you iterate overJ, it will yield'x', followed by3.It's explicitly supported to push values that were never yielded by the wrapped iterator. If you create
J = PushbackIterator(range(1, 20)), you may still callJ.push(33), orJ.push('xyz'), orJ.push(None), etc.
big.log
-
A lightweight, high-performance text-oriented thread-safe logging module, intended for debug-print-style use. Not a full-fledged application logger like Python's
loggingmodule.Logis flexible in where output is written (stdout, files, lists, arbitrary callables, or customDestinationobjects) and when (by default it runs in a background thread for minimal overhead).See the The big
Logtutorial for an introduction and examples.
default_clock()
-
The default clock function used by
Log. Returns the current time, expressed as integer nanoseconds (>= 0) since some earlier event.In Python 3.7+, this is
time.monotonic_ns. In Python 3.6 this is a compatibility function that callstime.monotonicand converts the result to integer nanoseconds.
Log(*destinations, **options)
-
A lightweight, high-performance text-oriented thread-safe log object intended for debug-print-style use. To use, create a
Loginstance, then call it to log messages:j = Log() j("Hello, world!")
Calling the
Loginstance is equivalent to callingLog.print.Loghas three states: initial (just created or reset), logging (actively accepting messages), and closed (ignoring all writes). The log transitions from initial to logging automatically the first time a message is logged.See the The big
Logfor an in-depth tutorial on how to useLog, including examples.Destinations
Positional arguments to the
Logconstructor define destinations for log messages. The log sends every formatted message to every destination.If you construct a
Logand don't specify any positional arguments, it will send logged messages tobuiltins.print.Destinations can be any of the following:
print(orbuiltins.print) — Log messages are printed usingprint(s, end=''). Equivalent toLog.Print().bytes,strorpathlib.Path— Log messages are buffered locally and written to the named file. Equivalent toLog.File(path).list— Log messages are appended to the list. Equivalent toLog.List(list).io.TextIOBase— Log messages are written to the file-like object. Equivalent toLog.FileHandle(handle).callable— The callable is called with every formatted log message. Equivalent toLog.Callable(callable).TMPFILE— A special sentinel value. Log messages are written to a timestamped temporary file.Log.Destination— Used directly as a destination.You can also call
Log.map_destinationto manually convert any of the above types to the appropriateDestinationobject.Keyword-only options
Keyword-only parameters for the
Logconstructor specify configuration for the log.Logaccepts the following keyword-only parameters:name— The name for this log. Default is'Log'.threading— If true (the default), log messages are sent to a background thread for formatting and writing, reducing overhead in the calling thread. If false, messages are formatted and written immediately, using a lock for thread safety.Logis always thread-safe regardless of this setting.indent— Number of spaces to indent per nesting level when usingLog.enter. Default is4.width— Width in characters used for formatting separator lines. Default is79.clock— A function returning nanoseconds since some arbitrary past event, expressed as an integer. Default isdefault_clock.timestamp_clock— A function returning seconds since the UNIX epoch, expressed as a float. Default istime.time.timestamp_format— A function that formats values returned bytimestamp_clockinto a human-readable string. Default isbig.time.timestamp_human.prefix— A format string used to format text inserted at the beginning of every log message. Default isprefix_format(3, 10, 12).formats— A dict mapping format names to format dicts. A format dict has a"template"key (a template string with{}-style placeholders) and an optional"line"key (a fill character for separator lines).Loghas six built-in formats:"print","box","enter","exit","start", and"end". User-defined formats are automatically added as methods on theLoginstance. To suppress the start or end log banners, set"start"or"end"toNonein the formats dict; this also works for the"enter"and"exit"formats to suppress the enter and exit banners.All keyword-only options are also available as read-only properties on the
Loginstance.Read-only properties
In addition to the constructor options above,
Logexposes these read-only properties:dirty—Trueif any formatted text has been written to the log since it was started or last flushed.closed—Trueif the log is in the "closed" state.start_time_ns— The time reported byLog.clockwhen the log was started, in nanoseconds.start_time_epoch— The wall-clock time reported byLog.timestamp_clockwhen the log was started, as seconds since the UNIX epoch.end_time_epoch— The wall-clock time when the log was closed, as seconds since the UNIX epoch.
Log.box(s)
-
Logs
sto the log, formatted with a three-sided box around it to call attention to the message.See the The big
Logtutorial for more.
Log.close(block=True)
-
Closes the log. This ensures the log is in the "closed" state, in which all writes are silently ignored. To write to the log again after closing, call
Log.reset().- If the log is in "logging" state, it gets closed in an orderly
fashion. This includes logging an "end" banner, if an
"end"format is defined. - If the log is in "initial" state, it goes directly to closed state,
and no
"end"banner will be logged. - If the log is already "closed", this is a no-op.
If
blockis true (the default),closewon't return until after the log is fully closed. If false, the log may be closed asynchronously.See the The big
Logtutorial for more. - If the log is in "logging" state, it gets closed in an orderly
fashion. This includes logging an "end" banner, if an
Log.enter(message)
-
Logs
messageto the log formatted with a box, then indents subsequent log output. CallLog.exit()to outdent. Nesting may be arbitrarily deep.Log.enterreturns a context manager; if used with awithstatement,Log.exit()will be called automatically upon exiting the block.See the The big
Logtutorial for more.
Log.exit()
-
Outdents the log from the most recent
Log.enter()call.See the The big
Logtutorial for more.
Log.flush(block=True)
-
Flushes the log, if it is in "logging" state and is dirty. A log is "dirty" if any formatted text has been written since the log was started or last flushed.
If
blockis true (the default),flushwon't return until after the log is flushed. If false, the log may be flushed asynchronously.See the The big
Logtutorial for more.
Log.map_destination(o)
-
Class method. Maps
oto the appropriateDestinationobject using the same conversion rules as theLogconstructor's positional arguments (e.g.builtins.printbecomesLog.Print(), astrbecomesLog.File(path), etc.).Returns the
Destinationobject, or raisesTypeErrorifois not a recognized type.
Log.print(*args, end='\n', sep=' ', flush=False, format='print')
-
Logs a message, with an interface similar to
builtins.print.The arguments are formatted into a string as
sep.join(str(a) for a in args) + end, then logged using the specifiedformat(default"print").Calling the
Loginstance directly (e.g.log("message")) is equivalent to calling this method.If
flushis true, the log is flushed immediately after logging this message.See the The big
Logtutorial for more.
Log.reset()
-
Resets the log to its initial state.
- If the log is in "initial" state, this is a no-op.
- If the log is in "logging" state, the log is closed, then reset.
- If the log is in "closed" state, the log is reset.
Log.reset()is the only way to reopen a closed log.See the The big
Logtutorial for more.
Log.write(formatted)
-
Writes a pre-formatted string directly to the log with no further formatting or modification.
See the The big
Logtutorial for more.
Log.Destination
-
Base class for objects that perform the actual logging for a
Log. ADestinationis owned by aLog, and theLogsends it events by calling named methods.All
Destinationsubclasses must implement thewritemethod:write(elapsed, thread, formatted)
Subclasses may also optionally override these seven methods:
flush() reset() start(start_time_ns, start_time_epoch) end(elapsed) log(elapsed, thread, format, message, formatted) enter(elapsed, thread, message, formatted) exit(elapsed, thread, message, formatted)
The default implementations of
log,enter, andexitall callself.write(elapsed, thread, formatted). Subclasses need not call the base class method for any of these seven methods.Subclasses may also override
register(owner), but must call the base class implementation viasuper().register(owner). Subclasses must also call the base class__init__without arguments.The meaning of the arguments to the various
Destinationmethods:-
elapsedis the elapsed time since the log was started/reset, in nanoseconds. -
threadis thethreading.Threadhandle for the thread that logged the message. This can beNonefor messages that aren't logged from any particular thread; currently this includes the "start banner" (using format"start") and the "end banner" (using format"end"). -
formattedis the formatted log message. This is always astr. -
formatis the name of the format applied tomessageto produceformatted. -
messageis the original message passed in to aLogmethod. This can be an empty string if no text was actually logged. (The "start banner" and "end banner" also specify empty strings for theirmessagearguments.) -
owneris theLogobject that owns thisDestination.
Logguarantees events are sent in this order:register → start → [write | log | enter | exit | flush]* → [flush] → end-
registeris always the first event sent; it's sent exactly once, when the destination is added to theLog, in the constructor. -
startis lazily sent immediately before the first message is logged (one ofwrite,log, orenter). If no message is ever logged,startis never sent. -
flushis only sent if the log is dirty. The log is only considered "dirty" if any non-emptyformattedstrings were sent to the log. -
When the log is closed, if
startwas ever sent, it sends an optionalflush(if the log is dirty, followed byend. -
If the log is reset, if
startwas ever sent, the log is immediately closed--possibly sendingflush, always sendingend--followed by aresetevent.resetshould reset the destination to its initial state. -
If the log is closed but
startwas never sent,Logdoesn't bother sending any messages to the destinations. Noflush, and noend. And if a log in this state is reset, noresetevent is sent. (Nothing has happened, so there's no point in notifying the destinations of meaningless non-events.)
See the The big
Logtutorial for more. -
Log.Destination.Buffer(destination=None)
-
A
Destinationthat buffers log messages before sending them to anotherDestination.Every formatted log message is stored in an internal buffer. When the log is flushed,
Bufferconcatenates all buffered messages into one string, writes that string to the underlyingDestination, and flushes it.If
destinationisNone(the default),Bufferwraps aLog.Print()destination. Otherwise,destinationis mapped usingLog.map_destination.See the The big
Logtutorial for more.
Log.Callable(callable)
-
A
Destinationwrapping a callable.Calls
callable(formatted)for every formatted log message.See the The big
Logtutorial for more.
Log.File(path, initial_mode="at", *, flush=False)
-
A
Destinationthat writes to a file in the filesystem.pathmay be abytes,strorpathlib.Pathobject. (bytesobjects are decoded tostrusingos.fsdecode.)If
flushis false (the default), formatted log messages are buffered internally. When the log is flushed,Fileconcatenates all buffered messages, opens the file, writes them with one write call, and closes it.If
flushis true, the file is opened and kept open. Every formatted log message is written and flushed immediately. Onclose, the file is closed; onreset, it is reopened.The first time the file is opened, it uses
initial_mode(default"at"). After the first time,Filealways uses mode"at".See the The big
Logtutorial for more.
Log.FileHandle(handle, *, flush=False)
-
A
Destinationwrapping an already-open Python file handle (anio.TextIOBaseinstance).Every formatted log message is written to the file handle immediately.
FileHandlewill never close the handle; it only writes to it and flushes it.If
flushis true, the file handle is flushed after every write. By defaultflushis false.See the The big
Logtutorial for more.
Log.List(list)
-
A
Destinationwrapping a Python list.Appends every formatted log message to
list.See the The big
Logtutorial for more.
Log.Print()
-
A
Destinationthat writes to stdout usingbuiltins.print.Calls
builtins.print(formatted, end='', flush=True)for every formatted log message.This is the default destination when no destinations are passed to the
Logconstructor.See the The big
Logtutorial for more.
Log.Sink()
-
A
Destinationthat retains all log events in order asSinkEventobjects.You may iterate over a
Sinkto yield all events logged so far.Sinkalso has aprintmethod that prints the events so far with some simple formatting.See the The big
Logtutorial for more.
Log.TmpFile(*, flush=False)
-
A
Destinationsubclass ofLog.Filethat writes to a timestamped temporary file.The filename is computed approximately as:
tempfile.gettempdir() / "{Log.name}.{start timestamp}.{pid}.txt"The filename is recomputed on
registerandresetevents, so resetting theLogcloses the old file and opens a new one.The sentinel value
TMPFILEis a pre-createdLog.TmpFile()instance.See the The big
Logtutorial for more.
OldDestination()
-
A
Log.Destinationsubclass providing backwards compatibility with the oldbig.log.Loginterface. Deprecated; will be removed no earlier than March 2027.See the The big
Logtutorial for more.
OldLog(clock=None)
-
A drop-in replacement for the old
big.log.Logclass, reimplemented on top of the newLog. Provides interface compatibility with the oldLogto ease the transition to the new one. Deprecated; will be removed no earlier than March 2027.See the The big
Logtutorial for more.
prefix_format(time_seconds_width, time_fractional_width, thread_name_width=12)
-
Returns a prefix format string for use with the
prefixoption of theLogconstructor.The returned format string produces a prefix of the form:
[{elapsed} {thread.name}]formatted with the specified widths. For example, the default
Logprefix isprefix_format(3, 10, 12), which produces output like:[003.0706368860 MainThread]See the The big
Logtutorial for more.
SinkEvent
-
Base class for log events stored by
Log.Sink. Each subclass represents a different type of event.All
SinkEventobjects have these read-only properties:type— A string identifying the event type (e.g."start","end","write","log","enter","exit").number— The log's sequence number (incremented on each reset).elapsed— Elapsed time in nanoseconds since the log was started.duration— Time in nanoseconds since the previous event with a message.thread— Thethreading.Threadthat logged the event, orNone.formatted— The formatted log message string.message— The original unformatted message, orNone.format— The format name used, orNone.depth— The nesting depth at the time of the event.The subclasses are:
See the The big
Logtutorial for more.
SinkStartEvent
-
A
SinkEventrepresenting the start of a log. Has aconfigurationproperty containing a dict of theLog's configuration at start time.
SinkEndEvent
-
A
SinkEventrepresenting the end of a log.
SinkWriteEvent
SinkLogEvent
SinkEnterEvent
SinkExitEvent
TMPFILE
-
A pre-created
Log.TmpFile()sentinel value. Pass this as a destination to theLogconstructor to log to a timestamped temporary file.See the The big
Logtutorial for more.
big.metadata
-
Contains metadata about big itself.
metadata.version
-
A
Versionobject representing the current version of big.
big.scheduler
-
A replacement for Python's
sched.schedulerobject, adding full threading support and a modern Python interface.Python's
sched.schedulerobject was added way back in 1991, and it was full of clever ideas. It abstracted away the concept of time from its interface, allowing it to be adapted to new schemes of measuring time--including mock time, making testing easy and repeatable. Very nice!Unfortunately,
sched.schedulerpredates multithreading becoming common, much less multicore computers. It certainly predates threading support in Python. And its API isn't flexible enough to correctly handle some common scenarios in multithreaded programs:- If one thread is blocking on
sched.scheduler.run, and the next scheduled event will occur at time T, and a second thread schedules a new event which occurs at a time < T,sched.scheduler.runwon't return any events to the first thread until time T. - If one thread is blocking on
sched.scheduler.run, and the next scheduled event will occur at time T, and a second thread cancels all events,sched.scheduler.runwon't exit until time T.
big's
Schedulerobject fixes both these problems.Also,
sched.scheduleris thirty years behind the times in Python API design--its design predates many common modern Python conventions. Its events are callbacks, which it calls directly.Schedulerfixes this: its events are objects, and you iterate over theSchedulerobject to see events as they occur.Scheduleralso benefits from thirty years of experience withsched.scheduler. In particular, big reimplements the relevant parts of thesched.schedulertest suite, ensuringSchedulerwill never trip over the problems discovered bysched.schedulerover its lifetime. - If one thread is blocking on
Event(scheduler, event, time, priority, sequence)
-
An object representing a scheduled event in a
Scheduler. You shouldn't need to create them manually;Eventobjects are created automatically when you add events to aScheduler.Supports one method:
Event.cancel()
-
Cancels this event. If this event has already been canceled, raises
ValueError.
Regulator()
-
An abstract base class for
Schedulerregulators.A "regulator" handles all the details about time for a
Scheduler.Schedulerobjects don't actually understand time; it's all abstracted away by theRegulator.You can implement your own
Regulatorand use it withScheduler. YourRegulatorsubclass must implement three methods:now,sleep, andwake. It must also provide alockattribute.Normally a
Regulatorrepresents time using a floating-point number, representing a fractional number of seconds since some epoch. But this isn't strictly necessary. Any Python object that fulfills these requirements will work:- The time class must implement
__le__,__eq__,__add__, and__sub__, and these operations must be consistent in the same way they are for number objects. - If
aandbare instances of the time class, anda.__le__(b)is true, thenamust either be an earlier time, or a smaller interval of time. - The time class must also implement rich comparison
with numbers (integers and floats), and
0must represent both the earliest time and a zero-length interval of time.
- The time class must implement
Regulator.lock
-
A lock object. The
Scheduleruses this lock to protect its internal data structures.Must support the "context manager" protocol (
__enter__and__exit__). Entering the object must acquire the lock; exiting must release the lock.This lock does not need to be recursive.
Regulator.now()
-
Returns the current time in local units. Must be monotonically increasing; for any two calls to now during the course of the program, the later call must never have a lower value than the earlier call.
A
Schedulerwill only call this method while holding this regulator's lock.
Regulator.sleep(t)
-
Sleeps for some amount of time, in local units. Must support an interval of
0, which should represent not sleeping. (Though it's preferable that an interval of0yields the rest of the current thread's remaining time slice back to the operating system.)If
wakeis called on thisRegulatorobject while a different thread has called this function to sleep,sleepmust abandon the rest of the sleep interval and return immediately.A
Schedulerwill only call this method while not holding this regulator's lock.
Regulator.wake()
-
Aborts all current calls to
sleepon thisRegulator, across all threads.A
Schedulerwill only call this method while holding this regulator's lock.
Scheduler(regulator=default_regulator)
-
Implements a scheduler. The only argument is the "regulator" object to use; the regulator abstracts away all time-related details for the scheduler. By default
Scheduleruses an instance ofSingleThreadedRegulator, which is not thread-safe.(If you need the scheduler to be thread-safe, pass in an instance of a thread-safe
Regulatorclass likeThreadSafeRegulator.)In addition to the below methods,
Schedulerobjects support being evaluated in a boolean context (they are true if they contain any events), and they support being iterated over. Iterating over aSchedulerobject blocks until the next event comes due, at which point theScheduleryields that event. An emptySchedulerthat is iterated over raisesStopIteration. You can reuseSchedulerobjects, iterating over them until empty, then adding more objects and iterating over them again.
Scheduler.schedule(o, time, *, absolute=False, priority=DEFAULT_PRIORITY)
-
Schedules an object
oto be yielded as an event by thisscheduleobject at some time in the future.By default the
timevalue is a relative time value, and is added to the current time; using atimevalue of 0 should schedule this event to be yielded immediately.If
absoluteis true,timeis regarded as an absolute time value.If multiple events are scheduled for the same time, they will be yielded by order of
priority. Lowever values ofpriorityrepresent higher priorities. The default value isScheduler.DEFAULT_PRIORITY, which is 100. If two events are scheduled for the same time, and have the same priority,Schedulerwill yield the events in the order they were added.Returns an
Eventobject, which can be used to cancel the event.
Scheduler.cancel(event)
-
Cancels a scheduled event.
eventmust be an object returned by thisSchedulerobject. Ifeventis not currently scheduled in thisSchedulerobject, raisesValueError.
Scheduler.queue
-
A list of the currently scheduled
Eventobjects, in the order they will be yielded.
Scheduler.non_blocking()
-
Returns an iterator for the events in the
Schedulerthat only yields the events that are currently due. Never blocks; if the next event is not due yet, raisesStopIteration.
SingleThreadedRegulator()
-
An implementation of
Regulatordesigned for use in single-threaded programs. It doesn't support multiple threads, and in particular is not thread-safe. But it's much higher performance than thread-safeRegulatorimplementations.
ThreadSafeRegulator()
-
A thread-safe implementation of
Regulatordesigned for use in multithreaded programs.
big.state
-
Code that makes it easy to write simple state machines.
There are lots of popular Python libraries for implementing state machines. But they all seem to be designed for large-scale state machines. These libraries are sophisticated and data-driven, with expansive APIs. And, as a rule, they require the state to be a passive object (e.g. an
Enum), and require you to explicitly describe every possible state transition.That approach is great for massive, super-complex state machines--you need the features of a sophisticated library to manage all that complexity. It also enables clever features like automatically generating diagrams of your state machine, which is great!
But most of the time this level of sophistication is unnecessary. There are lots of use cases for small scale, simple state machines, where the sophisticated data-driven approach and expansive, complex API only gets in the way. I prefer writing my state machines with active objects--where states are implemented as classes, events are implemented as method calls on those classes, and you transition to a new state by simply overwriting a
stateattribute with a different state instance.big.statemakes it easy to write this style of state machine. It has a deliberately minimal, simple interface--the constructor for the mainStateManagerclass only has four parameters, and it only exposes three attributes. The module also has two decorators to make your life easier. And that's it! But even this small API surface area makes it effortless to write some pretty big state machines.(Of course, you can also use
big.stateto write tiny data-driven state machines too. Althoughbig.statemakes state machines with active states easy to write, it's agnostic about how you actually implement your state machine. Really,big.statemakes it easy to write any kind of state machine you like!)big.stateprovides features like:- method calls that get called when entering and exiting a state,
- "observers", callables that get called each time you transition to a new state, and
- safety mechanisms to catch bugs and prevent design mistakes.
Recommended best practices
The main class in big.state is StateManager. This class
maintains the current "state" of your state machine, and
manages transitions to new states. The constructor takes
one required parameter, the initial state.
Here are my recommendations for best practices when working
with StateManager for medium-sized and larger state machines:
- Your state machine should be implemented as a class.
- You should store
StateManageras an attribute of that class, preferably calledstate_manager. (Your state machine should have a "has-a" relationship withStateManager, not an "is-a" relationship where it inherits fromStateManager.) - You should decorate your state machine class with the
accessordecorator--this will save you a lot of boilerplate. If your state machine is stored ino, decorating withaccessorlets you can access the current state usingo.stateinstead ofo.state_manager.state.
- You should store
- Every state should be implemented as a class.
- You should have a base class for your state classes, containing whatever functionality they have in common.
- You're encouraged to define these state classes inside
your state machine class, and use
BoundInnerClassso they automatically get references to the state machine they're a part of.
- Events should be method calls made on your state machine object.
- Your state base class should have a method for every event, decorated with `pure_virtual'.
- As a rule, events should be dispatched from the state machine to a method call on the current state with the same name.
- If all the code to handle a particular event lives in the
states, use the
dispatchdecorator to save you more boilerplate when calling the event method. Similarly toaccessor, this creates a new method for you that calls the equivalent method on the current state, passing in all the arguments it received.
Example code
Here's a simple example demonstrating all this functionality.
It's a state machine with two states, On and Off, and
one event method toggle. Calling toggle transitions
the state machine from the Off state to the On state,
and vice-versa.
from big.all import accessor, BoundInnerClass, dispatch, pure_virtual, StateManager
@accessor()
class StateMachine:
def __init__(self):
self.state_manager = StateManager(self.Off())
@dispatch()
def toggle(self):
...
@BoundInnerClass
class State:
def __init__(self, state_machine):
self.state_machine = state_machine
def __repr__(self):
return f"<{type(self).__name__}>"
@pure_virtual()
def toggle(self):
...
@BoundInnerClass
class Off(State):
def on_enter(self):
print("off!")
def toggle(self):
sm = self.state_machine
sm.state = sm.On() # sm.state is the accessor
@BoundInnerClass
class On(State):
def on_enter(self):
print("on!")
def toggle(self):
sm = self.state_machine
sm.state = sm.Off()
sm = StateMachine()
print(sm.state)
for _ in range(3):
sm.toggle()
print(sm.state)
This code demonstrates both
accessor
and
dispatch.
accessor lets us reference the current state with sm.state
instead of sm.state_manager.state, and dispatch lets us call
sm.toggle() instead of sm.state_manager.state.toggle().
For a more complete example of working with StateManager,
see the test_vending_machine test code in tests/test_state.py
in the big source tree.
accessor(attribute='state', state_manager='state_manager')
-
Class decorator. Adds a convenient state accessor attribute to your class.
When you have a state machine class containing a
StateManagerobject, it can be wordy and inconvenient to access the state through the state machine attribute:class StateMachine: def __init__(self): self.state_manager = StateManager(self.InitialState) ... sm = StateMachine() # vvvvvvvvvvvvvvvvvvvv that's a lot! sm.state_manager.state = NextState()
The
accessorclass decorator creates a property for you--a shortcut that directly accesses thestateattribute of your state manager. Just decorate your state machine class with@accessor():@accessor() class StateMachine: def __init__(self): self.state_manager = StateManager(self.InitialState) ... sm = StateMachine() # vvvvvv that's a lot shorter! sm.state = NextState()
The
stateattribute evaluates to the same value:sm.state == sm.state_manager.state
And setting it sets the state on your
StateManagerinstance. These two statements now do the same thing:sm.state_manager.state = new_state sm.state = new_state
By default, this decorator assumes your
StateManagerinstance is in thestate_managerattribute, and you want to name the new accessor attributestate. You can override these defaults; the decorator's first parameter,attribute, should be the string used for the new accessor attribute, and the second parameter,state_manager, should be the name of the attribute where yourStateManagerinstance is stored.For example, if your state manager is stored in an attribute called
sm, and you want the short-cut to be calledst, you'd decorate your state machine class with@accessor(attribute='st', state_manager='sm')
dispatch(state_manager='state_manager', *, prefix='', suffix='')
-
Decorator for state machine event methods, dispatching the event from the state machine object to its current state.
dispatchhelps with the following scenario:- You have your own state machine class which contains
a
StateManagerobject. - You want your state machine class to have methods representing events.
- Rather than handle those events in your state machine object itself, you want to dispatch them to the current state.
Simply create a method in your state machine class with the correct name and parameters but a no-op body, and decorate it with
@dispatch. Thedispatchdecorator will rewrite your method so it calls the equivalent method on the current state, passing through all the arguments.For example, instead of writing this:
class StateMachine: def __init__(self): self.state_manager = StateManager(self.InitialState) def on_sunrise(self, time, *, verbose=False): return self.state_manager.state.on_sunrise(time, verbose=verbose)
you can literally write this, which does the same thing:
class StateMachine: def __init__(self): self.state_manager = StateManager(self.InitialState) @dispatch() def on_sunrise(self, time, *, verbose=False): ...
Here, the
on_sunrisefunction you wrote is actually thrown away. (That's why the body is simply one"..."statement.) Your function is replaced with a function that gets thestate_managerattribute fromself, then gets thestateattribute from thatStateManagerinstance, then calls a method with the same name as the decorated function, passing in using*argsand**kwargs.Note that, as a stylistic convention, you're encouraged to literally use a single ellipsis as the body of these functions, as in the example above. This is a visual cue to readers that the body of the function doesn't matter. (In fact, the original
on_sunrisemethod above is thrown away inside the decorator, and replaced with a customized method dispatch function.)The
state_managerargument to the decorator should be the name of the attribute where theStateManagerinstance is stored inself. The default is'state_manager', but you can specify a different string if you've stored yourStateManagerin another attribute. For example, if your state manager is in the attributesmedley, you'd decorate with:@dispatch('smedley')
The
prefixandsuffixarguments are strings added to the beginning and end of the method call we call on the current state. For example, if you want the method you call to have an active verb form (e.g.reset), but you want it to directly call an event handler that starts withon_by convention (e.g.on_reset), you could do this:@dispatch(prefix='on_') def reset(self): ...
This is equivalent to:
def reset(self): return self.state_manager.state.on_reset()
If you have more than one event method, instead of decorating every event method with the same copy-and-pasted
dispatchcall, it's better to calldispatchonce, cache the function it returns, and decorate with that. Like so:my_dispatch = dispatch('smedley', prefix='on_') @my_dispatch def reset(self): ... @my_dispatch def sunrise(self): ...
- You have your own state machine class which contains
a
State()
-
Base class for state machine state implementation classes. Use of this base class is optional; states can be instances of any type except
types.NoneType.
StateManager(state, *, on_enter='on_enter', on_exit='on_exit', state_class=None)
-
Simple, Pythonic state machine manager.
Has three public attributes:
-
state -
The current state. You transition from one state to another by assigning to this attribute.
-
next -
The state the
StateManageris transitioning to, if it's currently in the process of transitioning to a new state. If theStateManagerisn't currently transitioning to a new state, itsnextattribute isNone. And if theStateManageris currently transitioning to a new state, itsnextattribute will not beNone.During the time the manager is currently transitioning to a new state, it's illegal to start a second transition. (In other words: you can't assign to
statewhilenextis notNone.) -
observers -
A list of callables that get called during every state transition. It's initially empty; you may add and remove observers to the list as needed.
- The callables will be called with one positional argument, the state manager object.
- Since observers are called during the state transition, they aren't permitted to initiate state transitions.
- You're permitted to modify the list of observers
at any time--even from inside an observer callback.
Note that this won't modify the list of observers called
until the next state transition. (Upon every state
transition,
StateManagerlocally caches the list of observers before calling any of them.) - If an observer raises an exception,
StateManagerremembers the first exception, continues calling the remaining observers, completes the state transition, and then re-raises that first exception. (If more than one observer raises an exception, only the first exception is retained and re-raised.)
The constructor takes the following parameters:
-
state -
The initial state. It can be any valid state object; by default, any Python value can be a state except
None. (But also see thestate_classparameter below.) -
on_enter -
on_enterrepresents a method call on states called when entering that state. The value itself is a string used to look up an attribute on state objects; by defaulton_enteris the string'on_enter', but it can be any legal Python identifier string, or any false value.If
on_enteris a valid identifier string, and thisStateManagerobject transitions to a state object O, and O has an attribute defined with this name,StateManagerwill call that attribute (with no arguments) immediately after transitioning to that state. Passing in a false value foron_enterto theStateManagerconstructor disables this behavior.on_enteris called immediately after the transition is complete, which means you're expressly permitted to make a state transition inside anon_entercall.If defined,
on_enterwill be called on the initial state object, from inside theStateManagerconstructor. -
on_exit -
on_exitis similar toon_enter, except the attribute is called when transitioning away from a state object. Its default value is'on_exit'.on_exitis called during the state transition, which means you're expressly forbidden from making a state transition inside anon_exitcall.If the
on_exitmethod raises an exception, the state transition is aborted, and the state machine stays in the current state. -
state_class -
state_classis used to enforce that thisStateManageronly ever transitions to valid state objects. It should be eitherNoneor a class. If it's a class, theStateManagerobject will require every value assigned to itsstateattribute to be an instance of that class. If it'sNone, states can be any object (exceptNone).
State transitions
To transition to a new state, simply assign to the
stateattribute.- If
state_classisNone, you may use any value as a state exceptNone. - It's illegal to assign to
statewhile currently transitioning to a new state. (Or, in other words, at any timeself.nextis notNone.) - If the current state object has an
on_exitmethod, it will be called (with zero arguments) during the transition to the next state. This means it's illegal to initiate a state transition inside anon_exitcall. If theon_exitmethod raises an exception, the state transition is aborted, and theStateManagerstays in the current state. - If you assign an object to
statethat has anon_enterattribute, that method will be called (with zero arguments) immediately after we have transitioned to that state. This means it's permitted to initiate a state transition inside anon_entercall. - If the current state object's
on_exitraises an exception, the transition is aborted,stateremains unchanged, andnextis restored toNone. - If an observer raises an exception, the transition still
completes,
nextis restored toNone, andStateManagerre-raises the first observer exception after completing the transition. - It's illegal to attempt to transition to the current
state. If
state_manager.stateis alreadyfoo,state_manager.state = foowill raise an exception. (However, it's legal to transition to an objectbareven iffoo == baris true. Equivalent objects are fine; you just can't transition to literally the same object.)
Sequence of events during a state transition
If you have an
StateManagerinstance calledstate_manager, and you transition it tonew_state:state_manager.state = new_state
StateManagerwill execute the following sequence of events:- Set
state_manager.nexttonew_state.- At of this moment
state_manageris "transitioning" to the new state.
- At of this moment
- If
state_manager.statehas anon_exitattribute, callstate_manager.state.on_exit(). - For every object
oin a snapshot of thestate_manager.observerslist, callo(self).- If an observer raises an exception,
StateManagerremembers the first exception, then keeps calling the remaining observers.
- If an observer raises an exception,
- Set
state_manager.statetonew_state.- As of this moment, the transition is complete, and
state_manageris now "in" the new state.
- As of this moment, the transition is complete, and
- Set
state_manager.nexttoNone. - If
state_manager.statehas anon_enterattribute, callstate_manager.state.on_enter(). - If an observer raised an exception, re-raise the first observer exception now.
-
TransitionError()
-
Exception raised when attempting to execute an illegal state transition.
There are only two types of illegal state transitions:
-
An attempted state transition while we're in the process of transitioning to another state. In other words, if
state_manageris yourStateManagerobject, you can't setstate_manager.statewhenstate_manager.nextis notNone. -
An attempt to transition to the current state. This is illegal:
state_manager = StateManager() state_manager.state = foo state_manager.state = foo # <-- this statement raises TransitionError
-
Note that transitioning to a different but identical object is expressly permitted.
-
big.template
-
Functions for parsing strings containing a simple template syntax, patterned after Django Templates and Jinja. Similar in spirit to Python 3.14+ t-strings.
Formatter(template, map=None, *, stretch=True, width=79, **kwargs)
-
A sophisticated template formatter, similar to
str.format.The
Formatterconstructor takes the following arguments: *template, a string. Calling theFormatterobject is like calling thestr.formatmethod on that string. *map, adictorNone, defaultNone. If adict, pre-initializes values used at interpolation time. *width, an integer, default 79, the target width of lines when computing "starred interpolations". *stretch, a boolean, defaultTrue, also used in conjunction with "starred interpolations".Also, additional
**kwargsare used as additional pre-initialized map values, and take precedence over the "map" parameter.Returns a
Formatterobject. Calling this object formats the template string usingstr.format_mapand returns the result. Substitutions in the template usestr.format_mapsyntax. The signature of this callable is:fn(message=''', **kwargs)
The
**kwargspassed in here are also used as values for the interpolation, and take precedence over any value passed in to the constructor.Formatter has two additional features:
* Special support for an interpolation named `"{message}"`, which are formatted in conjunction with the "message" parameter. If your template contains one or more lines containing `"{message}"`, these "message lines" are formatted using the lines of the `message` argument. The "message" argument is split by the newline character (`'\n'`) and these are zipped together with the "message lines"; the first "message line" will be formatted with the first line of the "message" parameter, the second with the second, etc. * If there are more "message lines" in the template than lines in the "message" parameter, the additional "message lines" are discarded. Example: if there are three "message lines" in the template, but only two lines in the "message" parameter, the third "message line" won't appear in the output. * If there are more lines in the "message" parameter than "message lines" in the template, the last template "message line" will be repeated. Example: if there are three lines in the "message" parameter, but only two "message lines" in the template, the last "message line" will be repeated, used to format the last two lines of the "message" parameter. * Values whose keys end with `'*'` (e.g. `"{line*}"`) are special: they are "starred interpolations". Their value is repeated zero or more times then truncated until the line is at least "width" characters. Starred interpolations must not use: * dotted expressions (`"{line.foo*}"`) * indexing (`"{line[3]*}"`) * a conversion (`"{line*!r}"`) * or a format spec (`"{line*:5}"`) If `stretch` is true, `Formatter` calculates the width of the longest formatted line (assuming all starred interpolations are length 0), then recomputes width aswidth = max(longest_line, width)
This means the starred interpolations will "stretch" to fit the longest line of the output.
Example:
fmt = Formatter('{line*}\\n{name} start\\n>> {message}\\n<< {message}\\n{double*}{line*}', {'line*': '-', 'double*': '=', 'name': 'Log'}, width=20) print(fmt("hello\\nthere\\nworld!"))
This prints:
-------------------- Log start >> hello << there << world! ==========----------
eval_template_string(s, globals, locals=None, *, parse_expressions=True, parse_comments=False, parse_whitespace_eater=False)
-
Parses and evaluates a template string, returning the rendered result.
sis parsed usingparse_template_string, then eachInterpolationis evaluated using Python's built-ineval()with the providedglobalsandlocalsdicts. Filters are applied in order. The rendered string is returned with all interpolations replaced by their values.parse_commentsandparse_whitespace_eatermay be enabled optionally; they are disabled by default. Statement parsing is not supported byeval_template_string.
Interpolation(expression, *filters, debug='')
-
Represents a
{{ }}expression interpolation from a parsed template.expressioncontains the text of the expression.filtersis a tuple containing the text of each filter expression, if any. If the expression ended with=,debugcontains the text of the expression along with the=and all whitespace; otherwise it is an empty string.
parse_template_string(s, *, parse_expressions=True, parse_comments=False, parse_statements=False, parse_whitespace_eater=False, quotes=('"', "'"), multiline_quotes=(), escape='\\')
-
Parses a string containing simple template markup, yielding its components.
Returns a generator yielding
strobjects (literal text),Interpolationobjects (parsed expressions), andStatementobjects (parsed statements).The supported delimiters are:
{{ ... }}— An expression. Parsed into anInterpolationobject. Expressions may include filters separated by|.{% ... %}— A statement. Parsed into aStatementobject. Quoted strings inside statements are preserved and respected when looking for the close delimiter.{# ... #}— A comment. The delimiters and all text between them are discarded.{>}— The whitespace eater. These three characters and all subsequent whitespace are discarded.Each delimiter type can be individually enabled or disabled via its corresponding boolean keyword-only parameter. By default only
parse_expressionsis true.
Statement(statement)
-
Represents a
{% %}statement from a parsed template.statementcontains the text of the statement, including all leading and trailing whitespace.
big.text
-
Functions for working with text strings. There are several families of functions inside the
textmodule; for a higher-level view of those families, read the following tutorials:All the functions in
big.textwill work with eitherstrorbytesobjects, except the three Word wrapping and formatting functions. When working withbytes, by default the functions will only work with ASCII characters.Support for bytes and str
The big text functions all support both
strandbytes. The functions all automatically detect whether you passed instrorbytesusing an intentionally simple and predictable process, as follows:At the start of each function, it'll test its first "string" argument to see if it's a
bytesobject.is_bytes = isinstance(<argument>, bytes)
If
isinstancereturnsTrue, the function assumes all arguments arebytesobjects. Otherwise the function assumes all arguments arestrobjects.As a rule, no further testing, casting, or catching exceptions is done.
Functions that take multiple string-like parameters require all such arguments to be the same type. These functions will check that all such arguments are of the same type.
Subclasses of
strandbyteswill also work; anywhere you should pass in astr, you can also pass in a subclass ofstr, and likewise forbytes.
ascii_linebreaks
-
A tuple of
strobjects, representing every line-breaking whitespace character defined by ASCII.Useful as a
separatorargument for big functions that accept one, e.g. the big "multi-" family of functions.Also contains
'\r\n'. If you don't want to include this string, useascii_linebreaks_without_crlfinstead. See the tutorial section on The Unix, Mac, and DOS linebreak conventions for more.For more information, please see the Whitespace and line-breaking characters in Python and big tutorial.
ascii_linebreaks_without_crlf
-
Equivalent to
ascii_linebreakswithout'\r\n'.
ascii_whitespace
-
A tuple of
strobjects, representing every whitespace character defined by ASCII.Useful as a
separatorargument for big functions that accept one, e.g. the big "multi-" family of functions.Also contains
'\r\n'. If you don't want to include this string, useascii_whitespace_without_crlfinstead. See the tutorial section on The Unix, Mac, and DOS linebreak conventions for more.For more information, please see the Whitespace and line-breaking characters in Python and big tutorial.
ascii_whitespace_without_crlf
-
Equivalent to
ascii_whitespacewithout'\r\n'.
bytes_linebreaks
-
A tuple of
bytesobjects, representing every line-breaking whitespace character recognized by the Pythonbytesobject.Useful as a
separatorargument for big functions that accept one, e.g. the big "multi-" family of functions.Also contains
b'\r\n'. If you don't want to include this string, usebytes_linebreaks_without_crlfinstead. See the tutorial section on The Unix, Mac, and DOS linebreak conventions for more.For more information, please see the Whitespace and line-breaking characters in Python and big tutorial.
bytes_linebreaks_without_crlf
-
Equivalent to
bytes_linebreakswith'\r\n'removed.
bytes_whitespace
-
A tuple of
bytesobjects, representing every line-breaking whitespace character recognized by the Pythonbytesobject. (bytes.isspace,bytes.split, etc will tell you which characters are considered whitespace...)Useful as a
separatorargument for big functions that accept one, e.g. the big "multi-" family of functions.Also contains
b'\r\n'. If you don't want to include this string, usebytes_whitespace_without_crlfinstead. See the tutorial section on The Unix, Mac, and DOS linebreak conventions for more.For more information, please see the Whitespace and line-breaking characters in Python and big tutorial.
bytes_whitespace_without_crlf
-
Equivalent to
bytes_whitespacewithout'\r\n'.
combine_splits(s, *split_arrays)
-
Takes a string
s, and one or more "split arrays", and applies all the splits tos. Returns an iterator of the resulting string segments.A "split array" is an array containing the original string, but split into multiple pieces. For example, the string
"a b c d e"could be split into the split array["a ", "b ", "c ", "d ", "e"].For example,
combine_splits('abcde', ['abcd', 'e'], ['a', 'bcde'])returns
['a', 'bcd', 'e'].Note that the split arrays must contain all the characters from
s.''.join(split_array)must recreates.combine_splitsonly examines the lengths of the strings in the split arrays, and makes no attempt to infer stripped characters. (So, don't use the string's.splitmethod if you want to usecombine_splits. Instead, consider big'smultisplitwithkeep=Trueorkeep=ALTERNATING.)
decode_python_script(script, *, newline=None, use_bom=True, use_source_code_encoding=True)
-
Correctly decodes a Python script from a bytes string.
scriptshould be abytesobject containing an encoded Python script.Returns a
strcontaining the decoded Python script.By default, Python 3 scripts must be encoded using UTF-8. (This was established by PEP 3120.) Python scripts are allowed to use other encodings, but when they do so they must explicitly specify what encoding they used. Python defines two methods for scripts to specify their encoding;
decode_python_scriptsupports both.The first method uses a "byte order mark", aka "BOM". This is a sequence of bytes at the beginning of the file that indicate the file's encoding.
If
use_bomis true (the default),decode_python_scriptwill recognize a BOM if present, and decode the file using the encoding specified by the BOM. Note thatdecode_python_scriptremoves the BOM when it decodes the file.The second method is called a "source code encoding", and it was defined in PEP 263. This is a "magic comment" that must be one of the first two lines of the file.
If
use_source_code_encodingis true (the default),decode_python_scriptwill recognize a source code encoding magic comment, and use that to decode the file. (decode_python_scriptleaves the magic comment in place.)If both these "
use_" keyword-only parameters are true (the default),decode_python_scriptcan handle either, both, or neither. In this case, ifscriptcontains both a BOM and a source code encoding magic comment, the script will be decoded using the encoding specified by the BOM, and the source code encoding must agree with the BOM.The
newlineparameter supports Python's "universal newlines" convention. This behaves identically to the newline parameter for Python'sopen()function.
Delimiter(close, *, escape='', multiline=True, quoting=False)
-
Class representing a delimiter for
split_delimiters.closeis the closing delimiter character. It must be a valid string or bytes object, and cannot be a backslash ('"\"' orb"\\").If
escapeis true, it should be a string; when inside this delimiter, you can escape the trailing delimiter with this string. Ifescapeis false, there is no escape string for this delimiter.quotingis a boolean: does this set of delimiters "quote" the text inside? When an open delimiter enables quoting,split_delimiterswill ignore all other delimiters in the text until it encounters the matching close delimiter. (Single- and double-quotes set this toTrue.)If
escapeis true,quotingmust also be true.If
multilineis true, the closing delimiter may be on the current line or any subsequent line. Ifmultilineis false, the closing delimiter must appear on the current line.
encode_strings(o, *, encoding='ascii')
-
Converts an object
ofromstrtobytes. Ifois a container, recursively converts all objects and containers inside.oand allobjectsinsideomust be eitherbytes,str,dict,set,list,tuple, or a subclass of one of those.Encodes every string inside using the encoding specified in the encoding parameter, default is
'ascii'.Handles nested containers.
If
ois of, or contains, a type not listed above, raisesTypeError.
format_map(s, mapping)
-
An implementation of `str.format_map` supporting *nested replacements.*
Unlike
str.format_map, big'sformat_mapallows you to perform string replacements inside of other string replacements:big.format_map("{{extension} size}", {'extension': 'mp3', 'mp3 size': 8555})
returns the string
'8555'.Another difference between
str.format_mapand big'sformat_mapis how you escape curly braces. To produce a'{'or'}'in the output string, add'\{'or'\}'respectively. (To produce a backslash,'\\', you must put four backslashes,'\\\\'.)See the documentation for
str.format_mapfor more.
gently_title(s, *, apostrophes=None, double_quotes=None)
-
Uppercases the first character of every word in
s, leaving the other letters alone.sshould bestrorbytes.(For the purposes of this algorithm, words are any contiguous run of non-whitespace characters.)
This function will also capitalize the letter after an apostrophe if the apostrophe:
- is immediately after whitespace, or
- is immediately after a left parenthesis character (
'('), or - is the first letter of the string, or
- is immediately after a letter O or D, when that O or D
- is after whitespace, or
- is the first letter of the string.
In this last case, the O or D will also be capitalized.
Finally, this function will capitalize the letter after a quote mark if the quote mark:
- is after whitespace, or
- is the first letter of a string.
(A run of consecutive apostrophes and/or quote marks is considered one quote mark for the purposes of capitalization.)
All these rules mean
gently_titlecorrectly handles internally quoted strings:He Said 'No I Did Not'and contractions that start with an apostrophe:
'Twas The Night Before Christmasas well as certain Irish, French, and Italian names:
Peter O'Toole Lord D'ArcyIf specified,
apostrophesshould be astrorbytesobject containing characters that should be considered apostrophes. Ifapostrophesis false, andsisbytes,apostrophesis set to a bytes object containing the only ASCII apostrophe character:'If
apostrophesis false and s isstr,apostrophesis set to a string containing these Unicode apostrophe code points:'‘’‚‛Note that neither of these strings contains the "back-tick" character:
`This is a diacritical used for modifying letters, and isn't used as an apostrophe.
If specified,
double_quotesshould be astrorbytesobject containing characters that should be considered double-quote characters. Ifdouble_quotesis false, andsisbytes,double_quotesis set to a bytes object containing the only ASCII double-quote character:"If
double_quotesis false andsisstr, double_quotes is set to a string containing these Unicode double-quote code points:"“”„‟«»‹›
int_to_words(i, *, flowery=True, ordinal=False)
-
Converts an integer into the equivalent English string.
int_to_words(2) -> "two" int_to_words(35) -> "thirty-five"
If the keyword-only parameter
floweryis true (the default), you also get commas and the wordandwhere you'd expect them. (Whenfloweryis true,int_to_words(i)produces identical output toinflect.engine().number_to_words(i), except for negative numbers:inflectstarts negative numbers with "minus", big starts them with "negative".)If the keyword-only parameter
ordinalis true, the string produced describes that ordinal number (instead of that cardinal number). Ordinal numbers describe position, e.g. where a competitor placed in a competition. In other words,int_to_words(1)returns the string'one', butint_to_words(1, ordinal=True)returns the string'first'.Numbers >=
10**66(one thousand vigintillion) are only converted usingstr(i). Sorry!
linebreaks
-
A tuple of
strobjects, representing every line-breaking whitespace character recognized by the Pythonstrobject. Identical tostr_linebreaks.Useful as a
separatorargument for big functions that accept one, e.g. the big "multi-" family of functions.Also contains
'\r\n'. See the tutorial section on The Unix, Mac, and DOS linebreak conventions for more.For more information, please see the Whitespace and line-breaking characters in Python and big tutorial.
linebreaks_without_crlf
-
Equivalent to
linebreakswithout'\r\n'.
merge_columns(*columns, column_separator=" ", overflow_response=OverflowResponse.RAISE, overflow_before=0, overflow_after=0)
-
Merge an arbitrary number of separate text strings into columns. Returns a single formatted string.
columnsshould be an iterable of "column tuples". Each column tuple should contain three items:(text, min_width, max_width)
textshould be a single string, eitherstrorbytes, with newline characters separating lines.min_widthandmax_widthare the minimum and maximum permissible widths for that column, not including the column separator (if any).Note that this function does not text-wrap the text of the columns. The text in the columns should already be broken into lines and separated by newline characters. (Lines in that are longer than that column tuple's
max_widthare handled with theoverflow_strategy, described below.)column_separatoris printed between every column.overflow_strategytells merge_columns how to handle a column with one or more lines that are wider than that column'smax_width. The supported values are:OverflowStrategy.RAISE: Raise an OverflowError. The default.OverflowStrategy.INTRUDE_ALL: Intrude into all subsequent columns on all lines where the overflowed column is wider than itsmax_width.OverflowStrategy.DELAY_ALL: Delay all columns after the overflowed column, not beginning any until after the last overflowed line in the overflowed column.
When
overflow_strategyisINTRUDE_ALLorDELAY_ALL, and eitheroverflow_beforeoroverflow_afteris nonzero, these specify the number of extra lines before or after the overflowed lines in a column.For more information, see the tutorial on Word wrapping and formatting.
multipartition(s, separators, count=1, *, reverse=False, separate=True)
-
Like
str.partition, but supports partitioning based on multiple separator strings, and can partition more than once.scan be eitherstrorbytes.separatorsshould be an iterable of objects of the same type ass.By default, if any of the strings in
separatorsare found ins, returns a tuple of three strings: the portion ofsleading up to the earliest separator, the separator, and the portion ofsafter that separator. Example:>>> multipartition('aXbYz', ('X', 'Y')) ('a', 'X', 'bYz')
If none of the separators are found in the string, returns a tuple containing
sunchanged followed by two empty strings.Returns a tuple of slices of
s—including zero-length boundary slices when needed—so concatenating the returned values reconstitutes the originals.multipartitionis greedy: if two or more separators appear at the leftmost location ins,multipartitionpartitions using the longest matching separator. For example:>>> multipartition('wxabcyz', ('a', 'abc')) ('wx', 'abc', 'yz')
Passing in an explicit
countlets you control how many timesmultipartitionpartitions the string.multipartitionwill always return a tuple containing(2*count)+1elements. Passing in acountof 0 will always return a tuple containings.If
separateis false, multiple adjacent separator strings get joined together, behaving like one big separator. Ifseparateis true, they're kept separate. Example:>>> multipartition('aXYbYXc', ('X', 'Y',), separate=False) ('a', 'XY', 'b', 'YX', 'c') >>> multipartition('aXYbYXc', ('X', 'Y',), separate=True ) ('a', 'X', '', 'Y', 'b', 'Y', '', 'X', 'c') >>> multipartition('aXYbYXc', ('X', 'Y',), count=2, separate=True ) ('a', 'X', '', 'Y', 'bYXc')
If
reverseis true, multipartition behaves likestr.rpartition. It partitions starting on the right, scanning backwards through s looking for separators.For more information, see the tutorial on The
multi-family of string functions.
multisplit(s, separators=None, *, keep=False, maxsplit=-1, reverse=False, separate=False, strip=False)
-
Splits strings like
str.split, but with multiple separators and options.scan bestrorbytes.separatorsshould either beNone(the default), or an iterable ofstrorbytes, matchings.If
separatorsisNoneandsisstr,multisplitwill usebig.whitespaceasseparators. IfseparatorsisNoneandsisbytes,multisplitwill usebig.ascii_whitespaceasseparators.Returns an iterator yielding values split from
s. The values yielded are slices of the original object, or in some cases adjacent slices joined with+. All slices are yielded in left-to-right order; this even includes zero-length strings, which are sliced from the contextually correct spot.If
keepis true (orALTERNATING), andstripis false, joining these strings together will recreates.multisplitis greedy: if two or more separators start at the same location ins,multisplitsplits using the longest matching separator. For example:big.multisplit('wxabcyz', ('a', 'abc'))
yields
'wx'then'yz'.keepindicates whether or not multisplit should preserve the separator strings in the strings it yields. It supports four values:-
false (the default)
-
Discard the separators.
-
true (apart from
ALTERNATINGandAS_PAIRS) -
Append the separators to the end of the split strings. You can recreate the original string by using
"".jointo join the strings yielded bymultisplit. -
ALTERNATING -
Yield alternating strings in the output: strings consisting of separators, alternating with strings consisting of non-separators. If "separate" is true, separator strings will contain exactly one separator, and non-separator strings may be empty; if "separate" is false, separator strings will contain one or more separators, and non-separator strings will never be empty, unless "s" was empty. You can recreate the original string by using
"".jointo join the strings yielded bymultisplit. -
AS_PAIRS -
Yield 2-tuples containing a non-separator string and its subsequent separator string. Either string may be empty; the separator string in the last 2-tuple will always be empty, and if "s" ends with a separator string, both strings in the final 2-tuple will be empty.
separateindicates whether multisplit should consider adjacent separator strings insas one separator or as multiple separators each separated by a zero-length string. It supports two values:-
false (the default)
-
Group separators together. Multiple adjacent separators behave as if they're one big separator.
-
true
-
Don't group separators together. Each separator should split the string individually, even if there are no characters between two separators. (
multisplitwill behave as if there's a zero-character-wide string between adjacent separators.)
stripindicates whether multisplit should strip separators from the beginning and/or end ofs. It supports five values:-
false (the default)
- Don't strip separators from the beginning or end of "s".
-
true (apart from LEFT, RIGHT, and PROGRESSIVE)
- Strip separators from the beginning and end of "s" (similarly to `str.strip`).
-
LEFT - Strip separators only from the beginning of "s" (similarly to `str.lstrip`).
-
RIGHT - Strip separators only from the end of "s" (similarly to `str.rstrip`).
-
PROGRESSIVE - Strip from the beginning and end of "s", unless "maxsplit" is nonzero and the entire string is not split. If splitting stops due to "maxsplit" before the entire string is split, and "reverse" is false, don't strip the end of the string. If splitting stops due to "maxsplit" before the entire string is split, and "reverse" is true, don't strip the beginning of the string. (This is how `str.strip` and `str.rstrip` behave when you pass in `sep=None`.)
maxsplitshould be either an integer orNone. Ifmaxsplitis an integer greater than -1, multisplit will splittextno more thanmaxsplittimes.reversechanges wheremultisplitstarts splitting the string, and what direction it moves through the string when parsing.-
false (the default)
- Start splitting from the beginning of the string and parse moving right (towards the end).
-
true
- Start splitting from the end of the string and parse moving left (towards the beginning).
Splitting starting from the end of the string and parsing moving left has two effects. First, if
maxsplitis a number greater than 0, the splits will start at the end of the string rather than the beginning. Second, if there are overlapping instances of separators in the string,multisplitwill prefer the rightmost separator rather than the leftmost. Consider this example, wherereverseis false:multisplit("A x x Z", (" x ",), keep=big.ALTERNATING) => "A", " x ", "x Z"If you pass in a true value for
reverse,multisplitwill prefer the rightmost overlapping separator:multisplit("A x x Z", (" x ",), keep=big.ALTERNATING, reverse=True) => "A x", " x ", "Z"For more information, see the tutorial on The
multi-family of string functions. -
multistrip(s, separators, left=True, right=True)
-
Like
str.strip, but supports stripping multiple substrings froms.Strips from the string
sall leading and trailing instances of strings found inseparators.sshould bestrorbytes.separatorsshould be an iterable of eitherstrorbytesobjects matching the type ofs.If
leftis a true value, strips all leading separators froms.If
rightis a true value, strips all trailing separators froms.Processing always stops at the first character that doesn't match one of the separators.
Returns
sunchanged, or a slice ofs, with the leading and/or trailing separators stripped.For more information, see the tutorial on The
multi-family of string functions.
normalize_whitespace(s, separators=None, replacement=None)
-
Returns
s, but with every run of consecutive separator characters turned into a replacement string. By default turns all runs of consecutive whitespace characters into a single space character.smay bestrorbytes.separatorsshould be an iterable of eitherstrorbytesobjects, matchings.replacementshould be either astrorbytesobject, also matchings, orNone(the default). IfreplacementisNone,normalize_whitespacewill use a replacement string consisting of a single space character.Leading or trailing runs of separator characters will be replaced with the replacement string, e.g.:
normalize_whitespace(" a b c") == " a b c"
Pattern(s, flags=0)
-
A drop-in replacement for
re.Patternthat preservesstrsubclasses.Python's
remodule convertsstrsubclasses to plainstrwhen returning matched strings.Patternpreserves the subclass: if you search or match against abig.string, the strings returned in theMatchobject will bebig.stringslices, retaining their line number and column number information.Patternsupports the same interface asre.Pattern. See the Python documentation forre.Patternfor the full API.
python_delimiters
-
A delimiters mapping suitable for use as the
delimitersargument forsplit_delimiters.python_delimitersdefines all the delimiters for Python, and is able to correctly split any modern Python text at its delimiter boundaries.python_delimiterschanges the rules a little bit forsplit_delimiters:-
When you use
split_delimiterswithpython_delimiters, it yields four values, not three. The fourth value ischange. Seesplit_delimitersfor more information. -
If you make a copy of
python_delimitersand modify it, you will break its semantics. Internallypython_delimitersis really just a symbolic token, andsplit_delimitersuses a secret, internal-only, manually modified set of delimiters. This was necessary because theDelimitersobject isn't sophisticated enough (yet) to express all the semantics needed forpython_delimiters. -
When you call
split_delimitersand pass inpython_delimiters, you must include the linebreak characters in thetextstring(s) you pass in. This is necessary to support the comment delimiter correctly, and to enforce the no-linebreaks-inside-single-quoted-strings rule. If you're usingbig.linesto pre-process a script before passing it in tosplit_delimiters, consider calling it withclip_linebreaks=False.
Here's a list of all the delimiters recognized by
python_delimiters:(),{}, and[].- All four string delimiters:
',",''', and""". - All possible string prefixes, including all valid combinations of
b,f,r, andu, in both lower and upper case. - Inside f-strings:
- The quoting markers
{{and}}are passed through intextunmodified. - The converter (
!) and format spec (:) inside the curly braces inside an f-string. These two delimiters are the only two that use the newchangevalue yielded bysplit_delimiters.
- The quoting markers
- Line comments, which "open" with
#and "close" with either a linebreak (\n) or a carriage return (\r). (Python's "universal newlines" support should mean you won't normally see carriage returns here... unless you specifically permit them.) If the text being split ends with a comment without a newline, you'll see yield whereopenis'#', followed by a slightly-strange final yield:textwill be the body of the comment, and theopen,close, andchangefields will all be an empty string.
See also
python_delimiters_version. -
python_delimiters_version
-
A dictionary mapping strings containing a Python major and minor version to
python_delimitersobjects.By default,
python_delimitersparses the version of the Python language matching the version it's being run under. If you run Python 3.12, and callbig.split_delimitersand pass inpython_delimiters, it will split delimiters based on Python 3.12. If you instead wanted to parse using the semantics from Python 3.8, you would instead pass inpython_delimiters_version['3.8']as thedelimitersargument tosplit_delimiters.There are entries in
python_split_delimitersfor every version of Python supported by big (currently 3.6 to 3.13).
re_partition(text, pattern, count=1, *, flags=0, reverse=False)
-
Like
str.partition, butpatternis matched as a regular expression.textcan be a string or a bytes object.patterncan be a string, bytes, orre.Patternobject.textandpattern(orpattern.pattern) must be the same type.If
patternis found in text, returns a tuple(before, match, after)
where
beforeis the text before the matched text,matchis there.Matchobject resulting from the match, andafteris the text after the matched text.If
patternappears intextmultiple times,re_partitionwill match against the first (leftmost) appearance.If
patternis not found intext, returns a tuple(text, None, '')
where the empty string is
strorbytesas appropriate.Passing in an explicit
countlets you control how many timesre_partitionpartitions the string.re_partitionwill always return a tuple containing(2*count)+1elements, and odd-numbered elements will be eitherre.Matchobjects orNone. Passing in acountof 0 will always return a tuple containings.If
patternis a string or bytes object,flagsis passed in as theflagsargument tore.compile.If
reverseis true, partitions starting at the right, likere_rpartition.Note:
re_partitionsupports partitioning on subclasses ofstrorbytes, and thebeforeandafterobjects in the tuple returned will be slices of thetextobject. However, thematchobject doesn't honor this this; the objects it returns from e.g.match.groupwill always be of the base type, eitherstrorbytes. This isn't fixable, as you can't createre.Matchobjects in Python, nor can you subclass it.(In older versions of Python,
re.Patternwas a private type calledre._pattern_type.)
re_rpartition(text, pattern, count=1, *, flags=0)
-
Like
str.rpartition, butpatternis matched as a regular expression.textcan be astrorbytesobject.patterncan be astr,bytes, orre.Patternobject.textandpattern(orpattern.pattern) must be the same type.If
patternis found intext, returns a tuple(before, match, after)
where
beforeis the text before the matched text,matchis the re.Match object resulting from the match, andafteris the text after the matched text.If
patternappears intextmultiple times,re_partitionwill match against the last (rightmost) appearance.If
patternis not found intext, returns a tuple('', None, text)
where the empty string is
strorbytesas appropriate.Passing in an explicit
countlets you control how many timesre_rpartitionpartitions the string.re_rpartitionwill always return a tuple containing(2*count)+1elements, and odd-numbered elements will be eitherre.Matchobjects orNone. Passing in acountof 0 will always return a tuple containings.If
patternis a string,flagsis passed in as theflagsargument tore.compile.Note:
re_rpartitionsupports partitioning on subclasses ofstrorbytes, and thebeforeandafterobjects in the tuple returned will be slices of thetextobject. However, thematchobject doesn't honor this this; the objects it returns from e.g.match.groupwill always be of the base type, eitherstrorbytes. This isn't fixable, as you can't createre.Matchobjects in Python, nor can you subclass it.(In older versions of Python,
re.Patternwas a private type calledre._pattern_type.)
reversed_re_finditer(pattern, string, flags=0)
-
An iterator. Behaves almost identically to the Python standard library function
re.finditer, yielding non-overlapping matches ofpatterninstring. The difference is,reversed_re_finditersearchesstringfrom right to left.patterncan bestr,bytes, or a precompiledre.Patternobject. If it'sstrorbytes, it'll be compiled withre.compileusing theflagsyou passed in.stringshould be the same type aspattern(orpattern.pattern).
split_delimiters(s, delimiters={...}, *, state=(), yields=None)
-
Splits a string
sat delimiter substrings.smay bestrorbytes.delimitersmay be eitherNoneor a mapping of open delimiter strings toDelimiterobjects. The open delimiter strings, close delimiter strings, and escape strings must match the type ofs(eitherstrorbytes).If
delimitersisNone,split_delimitersuses a default value matching these pairs of delimiters:() [] {} "" ''The first three delimiters allow multiline, disable quoting, and have no escape string. The last two (the quote mark delimiters) enable quoting, disallow multiline, and specify their escape string as a single backslash. (This default value automatically supports both
strandbytes.)statespecifies the initial state of parsing. It's an iterable of open delimiter strings specifying the initial nested state of the parser, with the innermost nesting level on the right. If you wantedsplit_delimitersto behave as if it'd already seen a'('and a'[', in that order, pass in['(', '[']tostate.(Tip: Use a
listas a stack to track the state ofsplit_delimiters. Push open delimiters with.append, and pop them off using.popwhenever you see a close delimiter. Sincesplit_delimitersensures that open and close delimiters match, you don't need to check them yourself!)Yields a object of type
SplitDelimitersValue. This object contains five fields:-
text -
A string, the text before the next opening, closing, or changing delimiter.
-
open -
A string, the trailing opening delimiter.
-
close -
A string, the trailing closing delimiter.
-
change -
A string, the trailing change delimiter.
-
yields -
An integer, either 3 or 4.
At least one of the four strings will always be non-empty. (Only one of
open,close, andchangewill ever be non-empty in a singleSplitDelimitersValueobject.) Ifsdoesn't end with an opening or closing delimiter, the final value yielded will have empty strings foropen,close, andchange.The
yieldsparameter tosplit_delimitersaffects iteration over aSplitDelimitersValueobject.yieldsmay be None, 3, or 4:- If
yieldsis 3, when iterating over aSplitDelimitersValueobject, it will yieldtext,open, andclosein that order. - If
yieldsis 4, when iterating over aSplitDelimitersValueobject, it will yieldtext,open,close, andchangein that order. - If yields is
None(the default),split_delimiterswill use a value of 4 if itsdelimitersargument ispython_delimiters, and a value of 3 otherwise.
(The
yieldsparameter exists because previouslysplit_delimitersalways yielded an tuple containing three string values.python_delimitersrequired adding the fourth string value,change. Eventuallysplit_delimiterswill always yield an object yielding four values, but big is allowing for a transition period to minimize code breakage. See the release notes for big version 0.12.5 for more information.)You may not specify backslash ('\\') as an open delimiter.
Multiple Delimiter objects specified in delimiters may use the same close delimiter string.
split_delimitersdoesn't react if the string ends with unterminated delimiters.See the
Delimiterobject for how delimiters are defined, and how you can define your own delimiters. -
split_quoted_strings(s, quotes=('"', "'"), *, escape='\\', multiline_quotes=(), state='')
-
Splits
sinto quoted and unquoted segments.Returns an iterator yielding 3-tuples:
(leading_quote, segment, trailing_quote)where
leading_quoteandtrailing_quoteare either empty strings or quote delimiters fromquotes(ormultiline_quotes), andsegmentis a substring ofs. Joining together all strings yielded recreatess.scan be eitherstrorbytes.quotesis an iterable of unique quote delimiters. Quote delimiters may be any non-empty string. They must be the same type ass, eitherstrorbytes. By default,quotesis('"', "'"). (Ifsisbytes,quotesdefaults to(b'"', b"'").) If a newline character appears inside a quoted string,split_quoted_stringswill raiseSyntaxError.multiline_quotesis likequotes, except quoted strings using multiline quotes are permitted to contain newlines. By defaultsplit_quoted_stringsdoesn't define any multiline quote marks.escapeis a string of any length. Ifescapeis not an empty string, the string will "escape" (quote) quote delimiters inside a quoted string, like the backslash ('\') character inside strings in Python. By default,escapeis'\\'. (Ifsisbytes,escapedefaults tob'\\'.)stateis a string. It sets the initial state of the function. The default is an empty string (strorbytes, matchings); this means the parser starts parsing the string in an unquoted state. If you want parsing to start as if it had already encountered a quote delimiter--for example, if you were parsing multiple lines individually, and you wanted to begin a new line continuing the state from the previous line-- pass in the appropriate quote delimiter fromquotesintostate. Note that when a non-empty string is passed in tostate, theleading_quotein the first 3-tuple yielded bysplit_quoted_stringswill be an empty string:list(split_quoted_strings("a b c'", state="'"))evaluates to
[('', 'a b c', "'")]Note:
split_quoted_stringsis agnostic about the length of quoted strings. If you're usingsplit_quoted_stringsto parse a C-like language, and you want to enforce C's requirement that single-quoted strings only contain one character, you'll have to do that yourself.split_quoted_stringsdoesn't raise an error ifsends with an unterminated quoted string. In that case, the last tuple yielded will have a non-emptyleading_quoteand an emptytrailing_quote. (If you consider this an error, you'll need to raiseSyntaxErrorin your own code.)split_quoted_stringsonly supports the opening and closing markers for a string being the same string. If you need the opening and closing markers to be different strings, usesplit_delimiters.
split_text_with_code(s, *, tab_width=8, allow_code=True, code_indent=4, convert_tabs_to_spaces=True)
-
Splits
sinto individual words, suitable for feeding intowrap_words.smay be eitherstrorbytes.Paragraphs indented by less than
code_indentwill be broken up into individual words.If
allow_codeis true, paragraphs indented by at leastcode_indentspaces will preserve their whitespace: internal whitespace is preserved, and the newline is preserved. (This will preserve the formatting of code examples when these words are rejoined into lines bywrap_words.)For more information, see the tutorial on Word wrapping and formatting.
split_title_case(s, *, split_allcaps=True)
-
Splits
sinto words, assuming that upper-case characters start new words. Returns an iterator yielding the split words.Example:
list(split_title_case('ThisIsATitleCaseString'))is equal to
['This', 'Is', 'A', 'Title', 'Case', 'String']If
split_allcapsis a true value (the default), runs of multiple uppercase characters will also be split before the last character. This is needed to handle splitting single-letter words. Consider:list(split_title_case('WhenIWasATeapot', split_allcaps=True))returns
['When', 'I', 'Was', 'A', 'Teapot']but
list(split_title_case('WhenIWasATeapot', split_allcaps=False))returns
['When', 'IWas', 'ATeapot']Note: uses the
isupperandislowermethods to determine what are upper- and lower-case characters. This means it only recognizes the ASCII upper- and lower-case letters for bytes strings.
str_linebreaks
-
A tuple of
strobjects, representing every line-breaking whitespace character recognized by the Pythonstrobject. Identical tolinebreaks.Useful as a
separatorargument for big functions that accept one, e.g. the big "multi-" family of functions.Also contains
'\r\n'. See the tutorial section on The Unix, Mac, and DOS linebreak conventions for more.For more information, please see the Whitespace and line-breaking characters in Python and big tutorial.
str_linebreaks_without_crlf
-
Equivalent to
str_linebreakswithout'\r\n'.
str_whitespace
-
A tuple of
strobjects, representing every whitespace character recognized by the Pythonstrobject. Identical towhitespace.Useful as a
separatorargument for big functions that accept one, e.g. the big "multi-" family of functions.Also contains
'\r\n'. See the tutorial section on The Unix, Mac, and DOS linebreak conventions for more.For more information, please see the Whitespace and line-breaking characters in Python and big tutorial.
str_whitespace_without_crlf
-
Equivalent to
str_whitespacewithout'\r\n'.
strip_indents(lines, *, tab_width=8, linebreaks=linebreaks)
-
Takes an iterable of lines, with or without linebreaks; strips the leading whitespace from each line and tracks the indent level. Yields 2-tuples of
(depth, lstripped_line).depthis an integer, the ordinal number of times the lines were indented to reach the current indent. Text at the leftmost column is atdepth0; if the line was indented three times,depthwill be 3.Uses an intentionally simple algorithm. Only understands tab and space characters as indent characters. Internally converts tabs to spaces for consistency, using the
tab_widthpassed in.Text can only dedent out to a previous indent. Raises
IndentationErrorif there's an illegal dedent.Blank lines and empty lines have the indent level of the next non-blank line, or
0if there are no subsequent non-blank lines. If the line contains only whitespace, any trailing characters found inlinebreakswill be preserved. Pass inNoneor an empty sequence forlinebreaksto suppress this.
strip_line_comments(lines, line_comment_markers, *, escape='\\', quotes=(), multiline_quotes=(), linebreaks=linebreaks)
-
Strips line comments from an iterable of lines.
Line comments are substrings beginning with a special marker that mean the rest of the line should be ignored.
strip_line_commentstruncates each line at the beginning of the leftmost line comment marker and yields the result. If the line doesn't contain any unquoted comment markers, it's yielded unchanged.line_comment_markersshould be an iterable of strings denoting line comment markers (e.g.['#']or['//']).If
quotesis specified, it must be an iterable of quote marker strings.strip_line_commentswill parse the line usingsplit_quoted_stringsand ignore comment characters inside quoted strings. Quoted strings may not span lines; if a line ends with an unterminated quoted string,strip_line_commentswill raise aSyntaxError.If
multiline_quotesis specified, it must be an iterable of quote marker strings. Quoted strings enclosed in multiline quotes may span multiple lines. There must be no quote markers in common betweenquotesandmultiline_quotes.escapeis a string used to escape quote markers inside quoted strings, as per backslash inside strings in Python. The default is'\\'.If lines end with linebreak characters, they will be preserved even when a comment is stripped.
unicode_linebreaks
-
A tuple of
strobjects, representing every line-breaking whitespace character defined by Unicode.Useful as a
separatorargument for big functions that accept one, e.g. the big "multi-" family of functions.Also contains
'\r\n'. See the tutorial section on The Unix, Mac, and DOS linebreak conventions for more.For more information, please see the Whitespace and line-breaking characters in Python and big tutorial.
unicode_linebreaks_without_crlf
-
Equivalent to
unicode_linebreakswithout'\r\n'.
unicode_whitespace
-
A tuple of
strobjects, representing every whitespace character defined by Unicode.Useful as a
separatorargument for big functions that accept one, e.g. the big "multi-" family of functions.Also contains
'\r\n'. See the tutorial section on The Unix, Mac, and DOS linebreak conventions for more.For more information, please see the Whitespace and line-breaking characters in Python and big tutorial.
unicode_whitespace_without_crlf
-
Equivalent to
unicode_whitespacewithout'\r\n'.
whitespace
-
A tuple of
strobjects, representing every whitespace character recognized by the Pythonstrobject. Identical tostr_whitespace.Useful as a
separatorargument for big functions that accept one, e.g. the big "multi-" family of functions.Also contains
'\r\n'. See the tutorial section on The Unix, Mac, and DOS linebreak conventions for more.For more information, please see the Whitespace and line-breaking characters in Python and big tutorial.
whitespace_without_crlf
-
Equivalent to
whitespacewithout'\r\n'.
wrap_words(words, margin=79, *, two_spaces=True)
-
Combines
wordsinto lines and returns the result as a string. Similar totextwrap.wrap.wordsshould be an iterator yielding str or bytes strings, and these strings should already be split at word boundaries. Here's an example of a valid argument forwords:"this is an example of text split at word boundaries".split()
A single
'\n'indicates a line break. If you want a paragraph break, embed two'\n'characters in a row.marginspecifies the maximum length of each line. The length of every line will be less than or equal tomargin, unless the length of an individual element insidewordsis greater thanmargin.If
two_spacesis true, elements fromwordsthat end in sentence-ending punctuation ('.','?', and'!') will be followed by two spaces, not one.Elements in
wordsare not modified; any leading or trailing whitespace will be preserved. You can use this to preserve whitespace where necessary, like in code examples.For more information, see the tutorial on Word wrapping and formatting.
big.tokens
-
Functions and constants for working with Python's tokenizer.
Token constants
big.tokensdefines aTOKEN_<n>constant for every token that could exist in any supported version of Python. If a token isn't defined in the current version, its value is set to-1, an invalid token value that won't match any tokens.This lets you write version-independent code like:
if token.type == big.tokens.TOKEN_FSTRING_START: ...
In Python versions where
FSTRING_STARTdoesn't exist,TOKEN_FSTRING_STARTis-1and the condition will never be true.
generate_tokens(s)
-
A convenient wrapper around
tokenize.generate_tokens.This function takes a
str(orbig.string) and handles thereadlineinterface required bytokenize.generate_tokensinternally.If the argument is a
big.string, the string values in the yieldedTokenInfoobjects will bebig.stringslices from the original string, preserving line and column information.
big.time
-
Functions for working with time. Currently deals specifically with timestamps. The time functions in big are designed to make it easy to use best practices.
date_ensure_timezone(d, timezone)
-
Ensures that a
datetime.dateobject has a timezone set.If
dhas a timezone set, returnsd. Otherwise, returns a newdatetime.dateobject equivalent todwith itstzinfoset totimezone.
date_set_timezone(d, timezone)
-
Returns a new
datetime.dateobject identical todbut with itstzinfoset totimezone.
datetime_ensure_timezone(d, timezone)
-
Ensures that a
datetime.datetimeobject has a timezone set.If
dhas a timezone set, returnsd. Otherwise, creates a newdatetime.datetimeobject equivalent todwith itstzinfoset totimezone.
datetime_set_timezone(d, timezone)
-
Returns a new
datetime.datetimeobject identical todbut with itstzinfoset totimezone.
parse_timestamp_3339Z(s, *, timezone=None)
-
Parses a timestamp string returned by
timestamp_3339Z. Returns adatetime.datetimeobject.timezoneis an optional default timezone, and should be adatetime.tzinfoobject (orNone). If provided, and the time represented in the string doesn't specify a timezone, thetzinfoattribute of the returned object will be explicitly set totimezone.parse_timestamp_3339Zdepends on thepython-dateutilpackage. Ifpython-dateutilis unavailable,parse_timestamp_3339Zwill also be unavailable.
timestamp_3339Z(t=None, want_microseconds=None)
-
Return a timestamp string in RFC 3339 format, in the UTC time zone. This format is intended for computer-parsable timestamps; for human-readable timestamps, use
timestamp_human().Example timestamp:
'2021-05-25T06:46:35.425327Z'tmay be one of several types:- If
tis None,timestamp_3339Zuses the current time in UTC. - If
tis an int or a float, it's interpreted as seconds since the epoch in the UTC time zone. - If
tis atime.struct_timeobject ordatetime.datetimeobject, and it's not in UTC, it's converted to UTC. (Technically,time.struct_timeobjects are converted to GMT, usingtime.gmtime. Sorry, pedants!)
If
want_microsecondsis true, the timestamp ends with microseconds, represented as a period and six digits between the seconds and the'Z'. Ifwant_microsecondsisfalse, the timestamp will not include this text. Ifwant_microsecondsisNone(the default), the timestamp ends with microseconds if the type oftcan represent fractional seconds: a float, adatetimeobject, or the valueNone. - If
timestamp_human(t=None, want_microseconds=None, *, tzinfo=None)
-
Return a timestamp string formatted in a pleasing way for the local timezone (by default). This format is intended for human readability; for computer-parsable time, use
timestamp_3339Z().Example timestamp:
"2021/05/24 23:42:49.099437"tcan be one of several types:- If
tisNone,timestamp_humanuses the current local time. - If
tis an int or float, it's interpreted as seconds since the epoch. - If
tis atime.struct_time, it's converted to adatetime.datetimeobject. - If
tis adatetime.datetimeobject, it's used directly
If
want_microsecondsis true, the timestamp will end with the microseconds, represented as ".######". Ifwant_microsecondsis false, the timestamp will not include the microseconds.If
tzinfoisNone(the default), the time is converted to the local timezone. Iftzinfois adatetime.timezoneobject, the time is converted to this timezone. The timezone is printed at the end of the string.0.13 update: Added
tzinfoparameter, and added the timezone to the end of the string. - If
big.types
-
New types for big. Currently contains
stringandlinked_list.
string
-
stringis a subclass ofstrthat knows its own line number, column number, and source. Every operation that returns a substring returns abig.stringthat preserves this information.See the The big
stringtutorial for an introduction and examples.
string(s='', *, source=None, line_number=1, column_number=1, first_column_number=1, tab_width=8)
-
A subclass of
strthat maintains line, column, and offset information.stringis a drop-in replacement for Python'sstr. It implements everystrmethod; every operation that returns a substring returns abig.stringthat knows its own line and column information. For documentation of the standardstrmethods, see the Python documentation forstr.Keyword-only parameters to the constructor:
-
source— A human-readable string describing where this string came from (e.g. a filename). Included inwhere. -
line_number— The line number of the first character. Default is1. -
column_number— The column number of the first character. Default is1. -
first_column_number— The column number to reset to after a linebreak. Default is1. -
tab_width— The distance between tab columns, used when computing column numbers. Default is8.
Read-only properties:
-
line_number— The line number of this string. -
column_number— The column number of this string. -
source— The source string passed to the constructor. -
origin— The originalbig.stringthis string was sliced from. -
offset— The index of the first character of this string withinorigin. -
first_column_number— The column number reset to after linebreaks. -
tab_width— The tab width used for column calculations. -
where— A human-readable location string for error messages, in the format"<source> line <n> column <n>"(or without the source if none was specified).
If you pass a
big.stringinto Python modules implemented in C, the returned substrings will be plainstrobjects. big provides wrappers for two of these, drop-in replacements where the returned substrings will bebig.stringslices of the original string:string.generate_tokens, a wrapper fortokenize.generate_tokens, andstring.compile, a wrapper forre.compile.
See the The big
stringtutorial for more. -
string.bisect(index)
-
Splits the string at
index. Returns a tuple of two strings:(string[:index], string[index:]).
string.cat(*strings)
-
Class method. Concatenates the
strorbig.stringobjects passed in. Roughly equivalent tobig.string('').join(). Always returns abig.string.
string.compile(flags=0)
-
Returns a
Patterncompiled from this string. Equivalent tore.compile(self, flags). All methods on thePattern, and method calls on objects it returns, returnbig.stringslices of the original string as appropriate.
string.generate_tokens()
-
Wraps
tokenize.generate_tokens, preservingbig.stringslices in the yieldedTokenInfoobjects. Equivalent to callingbig.tokens.generate_tokenswith this string.
string.partition(sep, *, count=1) and string.rpartition(sep, *, count=1)
-
Behaves like
str.partitionandstr.rpartition, but adds one feature: acount=parameter that specifies how many times to split the string.countmust be an "index" with a value 0 or higher. The returned tuple will be length(count * 2) + 1.
string_context
The value returned by the string.context property.
str(s.context) evaluates to a string that represents
the "context" of s--where s was sliced from in the
larger string object. For example:
line = "elif attempt(blast):"
s = line.partition('(')[2][:5] # s is 'blast'
print(s.context)
would print:
elif attempt(blast):
^^^^^
Note that str(s.context) only shows one line of context;
if s is a multi-line string, this will only show the first
line. If you want to show all lines, use s.context.all
instead, see below.
string_context supports the following attributes:
partsis a tuple of "context line" tuples representing the lines ofstr(context). Each "context line" tuple matches(before, span, after, linebreak), and has named accessors for these values. These values are strings; joining all the strings of all the tuples producesstr(context).allis likestr(context), but contains context lines for all lines, in casecontextcontains linebreaks.all_partsis likeparts, but contains the parts for all the lines ofall.
In addition, it provides many of the string properties,
like where, origin, line_number, etc. These are the
same as the string object the context was taken from.
linked_list
-
linked_listis a doubly-linked list with an interface that's a superset of bothlistandcollections.deque. It also supports extracting and merging ranges of nodes withcutandsplice, and its iterators behave like database cursors.See the The big
linked_listtutorial for an introduction and examples.
linked_list(iterable=(), *, lock=None)
-
A doubly-linked list.
iterableprovides initial values. IflockisTrue, the list uses an internalthreading.Lockfor thread safety. Iflockis a lock object, that lock is used (but the list cannot be pickled). IflockisFalseorNone, no locking is used.linked_listhas explicit "head" and "tail" sentinel nodes. Iterating yields values between head and tail.linked_listsupportslen, indexing, slicing,in,==,bool, pickling, andreversed.See the The big
linked_listtutorial for more.
linked_list.append(object)
-
Appends
objectto the end of the linked list.
linked_list.clear()
-
Removes all values from the linked list.
linked_list.copy(*, lock=None)
-
Returns a shallow copy of the linked list.
lockis passed to the new list's constructor.
linked_list.count(value)
-
Returns the number of occurrences of
valuein the linked list.
linked_list.cut(start=None, stop=None, *, lock=None)
-
Cuts nodes from this list and returns them in a new
linked_list.startandstop, if specified, must be iterators over this list. IfstartisNone, it defaults to the first node after head. (If the list is empty, this will be tail.) IfstopisNone, it defaults to tail. The range of nodes cut includesstartbut excludesstop.startmust not point to a node afterstop.lockis passed to the new list's constructor; ifNone, the new list reuses this list'slockparameter.If any nodes are cut, the
startandstopiterators will still point at the same nodes--which meansstartwill have been moved to the new list.Raises
SpecialNodeErrorifstartpoints to head, because you can't cut the head of the list.startandstopmay be reverse iterators; however, the linked list resulting from a cut will have the elements in forward order. If eitherstartorstopis a reverse iterator, then they must both be reverse iterators (orNone), and:startdefaults to the last node before tail,stopdefaults to head,startmust not point to a node afterstop, and- raises
SpecialNodeErrorifstartpoints to head.
See the The big
linked_listtutorial for more.
linked_list.extend(iterable)
-
Extends the linked list by appending elements from
iterable.
linked_list.extendleft(iterable)
-
Prepends the elements from
iterableto the linked list, in reverse order. Provided forcollections.dequecompatibility.
linked_list.find(value)
-
Returns an iterator pointing at the first occurrence of
value, orNoneifvaluedoes not appear.
linked_list.index(value, start=0, stop=sys.maxsize)
-
Returns the first index of
value. RaisesValueErrorifvalueis not present.startandstoplimit the search to a subsequence.
linked_list.insert(index, object)
-
Inserts
objectbeforeindex.
linked_list.match(predicate)
-
Returns an iterator pointing at the first value for which
predicate(value)returns a true value, orNoneif no such value exists.
linked_list.move(where, start=None, stop=None)
-
Moves a range of nodes to after
where.startandstop, if specified, must be iterators over this list. IfstartisNone, it defaults to the first node after head. (If the list is empty, this will be tail.) IfstopisNone, it defaults to tail. The range of nodes moved includesstartbut excludesstop.startmust not point to a node afterstop.wheremust be an iterator over this list.wheremust not point to a node being moved, or tail.Raises
SpecialNodeErrorifstartpoints to head, because you can't move the head of the list.startandstopmay be reverse iterators. If eitherstartorstopis a reverse iterator, then they must both be reverse iterators (orNone), and:startdefaults to the last node before tail,stopdefaults to head,startmust not point to a node afterstop, and- raises
SpecialNodeErrorifstartpoints to head.
linked_list.pop(index=-1)
-
Removes and returns the value at
index(default last).
linked_list.prepend(object)
-
Prepends
objectto the beginning of the linked list.
linked_list.rcount(value)
-
Returns the number of occurrences of
valuein the linked list. Equivalent tolinked_list.countbut searches in reverse order.
linked_list.rcut(start=None, stop=None, *, lock=None)
-
Like
linked_list.cut, except all directions are reversed:startmust not point to a node beforestop, and the cut range is fromstopforwards tostart. The returned list is still in forwards order.
linked_list.remove(value, default=undefined)
-
Removes and returns the first occurrence of
value. Ifvaluedoes not appear, returnsdefaultif specified, otherwise raisesValueError.
linked_list.reverse()
-
Reverses all nodes in the linked list, including special nodes.
linked_list.rextend(iterable)
-
Extends the linked list by prepending elements from
iterable, in forwards order.
linked_list.rfind(value)
-
Returns an iterator pointing at the last occurrence of
value, orNoneifvaluedoes not appear.
linked_list.rmatch(predicate)
-
Returns an iterator pointing at the last value for which
predicate(value)returns a true value, orNoneif no such value exists.
linked_list.rmove(where, start=None, stop=None)
-
Like
linked_list.move, butstartmust come afterstop, the nodes are inserted beforewhere, andwherecannot be head. All other behaviors are unchanged (e.g.startis inclusive,stopis exclusive).
linked_list.rpop(index=0)
-
Removes and returns the value at
index(default first).
linked_list.rotate(n)
-
Rotates the linked list
nsteps to the right. Ifnis negative, rotates left. Provided forcollections.dequecompatibility.
linked_list.rremove(value, default=undefined)
-
Removes and returns the last occurrence of
value. Ifvaluedoes not appear, returnsdefaultif specified, otherwise raisesValueError.
linked_list.rsplice(other, *, where=None)
-
Like
linked_list.splice, except: ifwhereisNone, the nodes are prepended to the list. Ifwhereis notNone, the nodes are inserted before (rather than after) the node pointed to bywhere. RaisesSpecialNodeErrorifwhereis head, because you can't insert nodes before head.
linked_list.sort(key=None, reverse=False)
-
Sorts the linked list in ascending order. Arguments are the same as
list.sort.linked_list.sortmoves nodes rather than swapping values, so iterators continue to point at the same nodes.
linked_list.splice(other, *, where=None)
-
Moves all nodes from
otherinto this list.othermust be alinked_list; after a successful splice,otherwill be empty.wheremust be an iterator over this list, orNone. Ifwhereis an iterator, the nodes are inserted after the node pointed to bywhere. IfwhereisNone, the nodes are appended. RaisesSpecialNodeErrorifwhereis tail, because you can't insert nodes after tail.See the The big
linked_listtutorial for more.
linked_list.tail()
-
Returns a forwards iterator pointing at the linked list's tail sentinel node.
SpecialNodeError
-
A
LookupErrorsubclass raised when an operation is attempted on a special (sentinel) node that doesn't support it.
UndefinedIndexError
-
An
IndexErrorsubclass raised when accessing an undefined index in alinked_list(before head or after tail).
linked_list_iterator
-
Iterates over a
linked_list, yielding values in order. Created by callingiter()on alinked_listor by callinglinked_list.find()etc.A
linked_list_iteratorbehaves like a cursor: when it yields a value, it continues pointing at that node until explicitly advanced. Indexing and slicing are relative to the current node, and negative indices access previous nodes (not the end of the list).linked_listexplicitly supports removing nodes while iterating. If the current node is removed, the iterator points at a "special" placeholder node until advanced.See the The big
linked_listtutorial for more.
linked_list_iterator.after(count=1)
-
Returns a new iterator pointing at the node
countsteps after the current node.
linked_list_iterator.append(value)
-
Appends
valueimmediately after the current node.
linked_list_iterator.before(count=1)
-
Returns a new iterator pointing at the node
countsteps before the current node.
linked_list_iterator.copy()
-
Returns a copy of the iterator, pointing at the same node in the same linked list.
linked_list_iterator.count(value)
-
Returns the number of occurrences of
valuebetween the current node and tail.
linked_list_iterator.cut(stop=None, *, lock=None)
-
Bisects the list at the current node. Cuts nodes starting at the current node up to (but not including)
stop, and returns them as a newlinked_list.If
stopisNone, all subsequent nodes are cut (the original list gets a new tail).lockis passed to the new list's constructor.See the The big
linked_listtutorial for more.
linked_list_iterator.exhaust()
-
Advances the iterator to point to tail.
linked_list_iterator.extend(iterable)
-
Extends the list by appending elements from
iterableafter the current node.
linked_list_iterator.find(value)
-
Returns an iterator pointing at the nearest next occurrence of
value, orNoneif not found before tail.
linked_list_iterator.insert(index, object)
-
Inserts
objectafter theindex'th node relative to the current position.
linked_list_iterator.is_special()
-
Returns
Trueif the iterator is pointing at a special (sentinel) node,Falseotherwise.
linked_list_iterator.linked_list
-
Returns the
linked_listthis iterator belongs to.
linked_list_iterator.match(predicate)
-
Returns an iterator pointing at the nearest next value for which
predicate(value)returns a true value, orNoneif no such value exists before tail.
linked_list_iterator.move(where, stop=None)
-
Moves nodes from the current node up to (but not including)
stopto afterwhere. IfstopisNone, the moved range extends to tail.
linked_list_iterator.next(default=undefined, *, count=1)
-
Advances the iterator by
countsteps and returns the value there. If the iterator is exhausted, returnsdefaultif specified, otherwise raisesStopIteration.
linked_list_iterator.pop(index=0)
-
Removes and returns the value at
indexrelative to the current position. Ifindexis0(the default), removes the current node and the iterator advances backwards to the previous node.
linked_list_iterator.prepend(value)
-
Inserts
valueimmediately before the current node.
linked_list_iterator.previous(default=undefined, *, count=1)
-
Advances the iterator backwards by
countsteps and returns the value there. If the iterator reaches head, returnsdefaultif specified, otherwise raisesStopIteration.
linked_list_iterator.rcount(value)
-
Returns the number of occurrences of
valuebetween the current node and head.
linked_list_iterator.rcut(stop=None, *, lock=None)
-
Like
linked_list_iterator.cut, except it cuts backwards: nodes fromstopup to and including the current node. IfstopisNone, all preceding nodes are cut.
linked_list_iterator.remove(value, default=undefined)
-
Removes the nearest next occurrence of
value. Returnsdefaultif specified andvalueis not found, otherwise raisesValueError.
linked_list_iterator.reset()
-
Resets the iterator to point to head.
linked_list_iterator.rextend(iterable)
-
Extends the list by prepending elements from
iterablebefore the current node, preserving their order.
linked_list_iterator.rfind(value)
-
Returns an iterator pointing at the nearest previous occurrence of
value, orNoneif not found before head.
linked_list_iterator.rmatch(predicate)
-
Returns an iterator pointing at the nearest previous value for which
predicate(value)returns a true value, orNoneif no such value exists before head.
linked_list_iterator.rmove(where, stop=None)
-
Like
linked_list_iterator.move, but moves the range to beforewhereand walks the range in the reverse direction.
linked_list_iterator.rpop(index=0)
-
Removes and returns the value at
indexrelative to the current position (default0).
linked_list_iterator.rremove(value, default=undefined)
-
Removes the nearest previous occurrence of
value. Returnsdefaultif specified andvalueis not found, otherwise raisesValueError.
linked_list_iterator.rsplice(other)
-
Removes all nodes from
otherand inserts them immediately before the current node.
linked_list_iterator.rtruncate()
-
Truncates the linked list at the current node, discarding the current node and all previous nodes. After this operation, the iterator points to head.
linked_list_iterator.special()
-
Returns the
specialattribute of the current node:Nonefor normal nodes, or a string ('head','tail', or'special') for sentinel nodes.
linked_list_iterator.splice(other)
-
Removes all nodes from
otherand inserts them immediately after the current node.
linked_list_iterator.truncate()
-
Truncates the linked list at the current node, discarding the current node and all subsequent nodes. After this operation, the iterator points to tail.
linked_list_reverse_iterator
-
Iterates over a
linked_listin reverse order, yielding values from tail towards head. Created by callingreversed()on alinked_listor on alinked_list_iterator.Provides the same interface as
linked_list_iterator, but with all directions reversed:nextadvances towards head,previousadvances towards tail, and so on.See the The big
linked_listtutorial for more.
linked_list_reverse_iterator.after(count=1)
-
Behaves like
linked_list_iterator.before.
linked_list_reverse_iterator.append(value)
-
Behaves like
linked_list_iterator.prepend.
linked_list_reverse_iterator.before(count=1)
-
Behaves like
linked_list_iterator.after.
linked_list_reverse_iterator.copy()
-
Returns a copy of the reverse iterator, pointing at the same node.
linked_list_reverse_iterator.count(value)
-
Behaves like
linked_list_iterator.rcount.
linked_list_reverse_iterator.cut(stop=None, *, lock=None)
-
Behaves like
linked_list_iterator.rcut.
linked_list_reverse_iterator.exhaust()
-
Advances the reverse iterator to point to head.
linked_list_reverse_iterator.extend(iterable)
-
Behaves like
linked_list_iterator.rextend.
linked_list_reverse_iterator.find(value)
-
Behaves like
linked_list_iterator.rfind.
linked_list_reverse_iterator.insert(index, object)
-
Inserts
objectrelative to the current position, with reversed index direction.
linked_list_reverse_iterator.is_special()
-
Behaves like
linked_list_iterator.is_special.
linked_list_reverse_iterator.linked_list
-
Returns the
linked_listthis iterator belongs to.
linked_list_reverse_iterator.match(predicate)
-
Behaves like
linked_list_iterator.rmatch.
linked_list_reverse_iterator.move(where, stop=None)
-
Behaves like
linked_list_iterator.rmove.
linked_list_reverse_iterator.next(default=undefined, *, count=1)
-
Behaves like
linked_list_iterator.previous.
linked_list_reverse_iterator.pop(index=0)
-
Behaves like
linked_list_iterator.rpop.
linked_list_reverse_iterator.prepend(value)
-
Behaves like
linked_list_iterator.append.
linked_list_reverse_iterator.previous(default=undefined, *, count=1)
-
Behaves like
linked_list_iterator.next.
linked_list_reverse_iterator.rcount(value)
-
Behaves like
linked_list_iterator.count.
linked_list_reverse_iterator.rcut(stop=None, *, lock=None)
-
Behaves like
linked_list_iterator.cut.
linked_list_reverse_iterator.remove(value, default=undefined)
-
Behaves like
linked_list_iterator.rremove.
linked_list_reverse_iterator.reset()
-
Resets the reverse iterator to point to tail.
linked_list_reverse_iterator.rextend(iterable)
-
Behaves like
linked_list_iterator.extend.
linked_list_reverse_iterator.rfind(value)
-
Behaves like
linked_list_iterator.find.
linked_list_reverse_iterator.rmatch(predicate)
-
Behaves like
linked_list_iterator.match.
linked_list_reverse_iterator.rmove(where, stop=None)
-
Behaves like
linked_list_iterator.move.
linked_list_reverse_iterator.rpop(index=0)
-
Behaves like
linked_list_iterator.pop.
linked_list_reverse_iterator.rremove(value, default=undefined)
-
Behaves like
linked_list_iterator.remove.
linked_list_reverse_iterator.rsplice(other)
-
Behaves like
linked_list_iterator.splice.
linked_list_reverse_iterator.rtruncate()
-
Behaves like
linked_list_iterator.truncate.
linked_list_reverse_iterator.special()
-
Behaves like
linked_list_iterator.special.
linked_list_reverse_iterator.splice(other)
-
Behaves like
linked_list_iterator.rsplice.
linked_list_reverse_iterator.truncate()
-
Behaves like
linked_list_iterator.rtruncate.
big.version
-
Support for version metadata objects.
Version(s=None, *, epoch=None, release=None, release_level=None, serial=None, post=None, dev=None, local=None)
-
Constructs a
Versionobject, which represents a version number.You may define the version one of two ways:
- by passing in a version string to the
spositional parameter. Example:Version("1.3.24rc37") - by passing in keyword-only arguments setting the specific fields of the version.
Example:
Version(release=(1, 3, 24), release_level="rc", serial=37)
big's
Versionobjects conform to the PEP 440 version scheme, parsing version strings using that PEP's official regular expression.Versionobjects support the following features:- They're immutable once constructed.
- They support the following read-only properties:
epochreleasemajor(release[0])minor(a safe version ofrelease[1])micro(a safe version ofrelease[2])release_levelserialpostdevlocal
Versionobjects are hashable.Versionobjects support ordering and comparison; you can ask if twoVersionobjects are equal, or if one is less than the other.str()on aVersionobject returns a normalized version string for that version.repr()on aVersionobject returns a string that, ifeval'd, reconstructs that object.Versionobjects normalize themselves at initialization time:- Leading zeroes on version numbers are stripped.
- Trailing zeroes in
release(and trailing.0strings in the equivalent part of a version string) are stripped. - Abbreviations and alternate names for
release_levelare normalized.
- Don't tell anybody, but, you can also pass a
sys.version_infoobject or apackaging.Versionobject into the constructor instead of a version string. Shh!
When constructing a
Versionby passing in a strings, the string must conform to this scheme, where square brackets denote optional substrings and names in angle brackets represent parameterized substrings:[<epoch>!]<major>(.<minor_etc>)*[<release_level>[<serial>]][.post<post>][.dev<dev>][+<local>]All fields should be non-negative integers except for:
<major>(.<minor_etc>)*is meant to connote a conventional dotted version number, like1.2or1.5.3.8. This section can contain only numeric digits and periods ('.'). You may have as few or as many periods as you prefer. Trailing.0entries will be stripped.<release_level>can only be be one of the following strings:a, meaning an alpha release,b, meaning a beta release, orrc, meaning a release candidate. For a final release, skip therelease_level(and theserial).
<local>represents an arbitrary sequence of alphanumeric characters punctuated by periods.
Alternatively, you can construct a
Versionobject by passing in these keyword-only arguments:-
epoch -
A non-negative
intorNone. Represents an "epoch" of version numbers. A version number with a higher "epoch" is always a later release, regardless of all other fields. -
release -
A tuple containing one or more non-negative integers. Represents the conventional part of the version number; the version string
1.3.8would translate toVersion(release=(1, 3, 8)). -
release_level -
A
strorNone. If it's astr, it must be one of the following strings:a, meaning an alpha release,b, meaning a beta release, orrc, meaning a release candidate.
-
serial -
A non-negative
intorNone. Represents how many releases there have been at thisrelease_level. (The name is taken from Python'ssys.version_info.) -
post -
A non-negative
intorNone. Represents "post-releases", extremely minor releases made after a release:Version(release=(1, 3, 5)) < Version(release=(1, 3, 5), post=1) -
dev -
A non-negative
intorNone. Represents an under-development release. Higherdevnumbers represent later releases, but any release wheredevis notNonecomes before any release wheredevisNone. In other words:Version(release=(1, 3, 5), dev=34) < Version(release=(1, 3, 5), dev=35) Version(release=(1, 3, 5), dev=35) < Version(release=(1, 3, 5)) -
local -
A
tupleof one or morestrobjects containing only one or more alphanumeric characters orNone. Represents a purely local version number, allowing for minor build and patch differences but with no API or ABI changes.
Version.format(s)-
Returns a formatted version of
s, substituting attributes fromselfintosusingstr.format_map.For example,
Version("1.3.5").format('{major}.{minor}')
returns the string
'1.3'.
- by passing in a version string to the
Tutorials
The big string
-
Python's
tokenizeandre(regular expression) modules both had to solve an API problem. In both cases, you submit a large string to them, and they split it up and return little substrings--tiny little slices of the big string. Often, the user needs to know where those little slices came from. How do you communicate that?What
tokenizeandredid was add extra information accompanying the string. But they took different--and incompatible--approaches. They both represent "where" the little bitty strings came from differently:tokenize.tokenizereturns aTokenInfoobject containing the line and column numbers of the string it contains).searchandmatchmethods on a compiled regular expression return aMatchobject which tells you the index where the string started in the original string.
This is sufficient--barely. It's also fragile. What if you further subdivide the string? What if you join the text with the antecedent or subsequent text from the original? Now you have to clumsily track these offsets yourself. And if you want line and column information, the re module's
Matchobject is of no help.And what if you're parsing your text yourself, rather than using
tokenizeorre? If you split up a string into lines using thesplitlinesmethod on a string, you have to track the line numbers yourself. Worse yet, if you split by lines, then usereto subdivide the string, you have to mate your offset tracking with there.Matchobject's tracking. What a pain!big's
stringobject solves all that. It's a drop-in replacement for Python'sstrobject, and in fact is a subclass ofstr. What it gives you: any time you extract a substring of astringobject, the substring knows its own offset, line number, and column number relative to the original string. You don't need to figure it out yourself, and you don't need to store the information separately using some fragile external representation. Any time you have astringobject, you automatically know where it came from. (You can even specify a "source" for the text--the original filename or what have you--and thestringobject will retain that too.)This makes producing syntax error messages effortless. If
sis astringobject, and represents a syntax error because it was an unexpected token in the middle of a text you're parsing, you can simply write this:raise SyntaxError(f'{s.where}: unexpected token {s}')
whereis a property, an automatically-formatted string containing the line and column information for the string. And if you specified a "source", it contains that too. For example, if you initialized thestringwith "source" set to/home/larry/myscript.py, andswas the tokenwhule(whoops! mistypedwhile!), from line number 12, column number 15, the text of the exception would read:"/home/larry/myscript.py" line 12 column 15: unexpected token 'whule'Tomorrow's methods, today
big supports older versions of Python; as of this writing it supports all the way back to 3.6. (The Python core development team dropped support for 3.6 several years ago!)
The
stringobject supports all the methods of thestrobject. At the moment there's a newstrmethod as of version 3.7,isascii. Rather than only provide that in 3.7+,stringmakes that available in 3.6 too.Naughty modules not honoring the subclass
It was important that
stringnot only be a drop-in replacement forstr. The only way for that to work: it had to literally be a subclass ofstr. There's a lot of code that saysif isinstance(obj, str):
and if
stringobjects failed that test they'd break code.This has an unfortunate side-effect. CPython ships with modules in its standard library written in C that check to see "is this object a
strobject?" And if the object passes that test, they use low-level C API calls on thestrobject to interact with it. The problem is, these low-level C API calls ignore the fact that this is a subclass ofstr, and they sidestep the overloaded behaviors of thestringobject. This means that, for example, when they extract a substring from the object, they don't get astringobject preserving the offsets, they just get a plain oldstrobject.Fixing this in CPython would be worthwhile, but it'd be a lot of work and it would only benefit the future. We want to solve our problem today. So big provides workarounds for the two worst offenders:
reandtokenize. big'sstringobject has acompilemethod that is a drop-in replacement forre.compile, and all the methods you call on it will returnstringobjects instead ofstrobjects. Thestringobject also has a method calledgenerate_tokensthat produces the same output astokenize.generate_tokens, except (of course!) all the strings returned in itsTokenInfoobjects arestringobjects.Unfortunately, there's one more wrinkle. The objects returned by CPython's
remodule don't let you instantitate them, nor subclass them. They deliberately set an internal flag that means "Python code is not permitted to subclass this class". This means it's impossible forString.compileto return objects that passisinstancetests.String.compilereturns aPatternobject, but it's not an instance ofre.Pattern, andisinstancetests will fail. big was forced to reimplement these objects, and we ensure they behave identically to the originals, but CPython makes this facet of incompatibility unfixable.
The big linked_list
-
Background
A linked list is a fundamental data structure in computer science, second only perhaps to the array and the record. And yet Python has never officially shipped with a linked list!
There is a linked list hidden in the standard library of CPython;
collections.dequeis implemented internally using a linked list. So it's possible to use adequewhere you'd want a real linked list--like, a use case where you frequently insert and remove values in the middle of the list. This would have better performance than doing it with, say, the classic Pythonlist, where inserts and removals from the middle of the list are an O(n) operation. However, thedequeAPI makes it inconvenient to use as a linked list.In 2025, I wanted a linked list for a project. I surveyed the linked lists available for Python at the time, decided I didn't want to use any of them--so I wrote my own. Now you get to use it too!
Overview
big's
linked_listitself behaves externally like alistor adeque; you insert/append/prepend values to the list, and it stores them in order and manages the storage.Where big's
linked_listshines is in its iterators.linked_listiterators are more like "database cursors"; they act like a moveable virtual head of the list, centered on any value you like.Also, unlike Python's other data structures,
linked_listexplicitly supports modifying the list during iteration. You can have as many iterators iterating over a list as you like, and you can add or remove nodes anywhere to your heart's content.In addition,
linked_listsupports thread-safety through automatic internal locking.Implementation details
Internally a big
linked_listis a traditional doubly-linked-list. The list is stored in a series of nodes; each node contains forwards and backwards references, to the next and previous nodes respectively, as well as a reference to your value. This classic design makes its performance predictable: inserting and removing elements anywhere in the list is O(1), whereas accessing elements by index is O(n).There are acutally two types of node in a big
linked_list: "data" nodes, which store a value, and "special" nodes, which don't store a value. Why are these "special" nodes needed? Several reasons. First,linked_listmakes a design choice that's uncommon but not exactly rare for linked lists: the "head" and "tail" nodes are "special" nodes in the linked list. When you create a newlinked_listobject, it contains two nodes, not zero: the newly-created list already contains "head" and "tail" nodes. This makes for a nice implementation; every insert and delete simply updates four references, rather than needing lots of "if we're pointed at the head" special cases all over the place.There's a third type of "special" node: a deleted node. If an iterator is pointing at a data node containing a value X, and you remove X from the linked list, the data is removed but the node stays in place. That node is demoted to a "special" node--again, "data" nodes store a reference to a value, "special" nodes don't. This change is harmless; the iterator can continue pointing to it indefinitely, or can iterate forward or backwards without difficulty. The fact that the node was demoted is invisible to the user of the iterator if all they're doing is conventional iteration. However, this implementation choice will have ramifications for the "iterators as database cursors" APIs, as we'll see shortly.
(In case you're wondering: once the last iterator departs a "special" node resulting from a deleted value,
linked_listremoves the node.)linked_listmethodslinked_listprovides a superset of the union of the APIs oflistandcollections.deque. Every method call supported by bothlistanddequeis supported bylinked_list, and you can read the documentation for those types to see the basics.However, there are also some important changes. First and foremost, for
listanddequemethods that return an index into the list, thelinked_listequivalent returns an iterator. This is a superior API, due to the "database cursor" features oflinked_listiterators. It's also better for performance, as this reduces accessing values by index, which is O[n] onlinked_list.In addition,
linked_listcontains many "reversed" versions of methods. These are named by taking the original method name and prepending it withr. For example:extendis complemented withrextend, which inserts the values from the iterable in front of the head of the list in forwards order.findis complemented byrfind, which searches for a value starting at the end of the list and searching backwards.
linked_listalso supports many of Python's "magic methods":__add__:t + xreturns a new list containing the contents oftappended withx;xmust be an iterable.__bool__:bool(t)returnsTrueiftcontains any values, orFalseiftis empty.__contains__:v in tevaluates toTrueif the valuevis int.__copy__:copy.copy(t)returns a shallow copy of the list.__delitem__:del t[3]will remove the fourth value int. Also supports slices.__deepcopy__:copy.deepcopy(t)returns a deep copy of the list.__eq__and the other five "rich comparison" methods:t == t2is true if and only iftandt2are of the same type and contain the same values in the same order.__getitem__:t[3]evaluates to the fourth value int. Also supports slices.__iadd__:t += xappends the contents of iterablextot.__imul__:t *= nresults intcontainingncopies of its own contents.__iter__:iter(t)returns a forward iterator overt.__len__:len(t)returns the number of items int. Iftis empty, this is 0.__mul__:t * nreturn a new list containingncopies of the contents oft.__reversed__:reversed(t)returns a reverse iterator overt.__repr__:repr(t)produces a custom repr showing the current contents of the list.__setitem__:t[3] = vwill overwrite the fourth value intwithv. Also supports slices.
Finally,
linked_listsupports methods that lets you move nodes directly from one list to another, rather than inserting new nodes. If you're moving lots of nodes, this can be a huge performance win. The relevant methods:cutlets you specify a range of nodes to remove from alinked_list. You can specify the start and stop for the range of nodes to cut, as iterators. The nodes are removed from the linked list, and returned in their own new linked list.splicelets you move all the nodes of onelinked_listinto another. After splicing linked list A into linked list B, A will be empty, and B will contain all of A's nodes, in order.rcutandrspliceare "reversed" versions ofcutandsplice.
linked_listiteratorsWhile developing
linked_list, it occured to me: the usual use case for a linked list involves an arbitrarily-long sequence of data, which you iterate over and process. For example, compilers generally represent the program being compiled as a linked list of "basic blocks".When using a linked list for these sorts of use cases, you generally operate on a pointer to the linked list node under current consideration. In Python parlance, you iterate over the list, getting a reference to each node in the list in turn. You perform your computation on that node, then iterate to the next one.
However, you often want to modify the list while you're doing this. You may want to remove the node, or insert new nodes, or both--replace the node with something else. But idiomatic Python iterators don't let you do anything like that. All they know how to do is "advance to the next value and yield it". This was a genius design choice for Python, but for our linked list it's simply not enough.
linked_listsolves this by making its iterators far more powerful. One way of describing this is like a database cursors: alinked_listiterator points at a value (or "row"), and lets you modify the list (or "table") relative to that value. I think of it more like a moveable virtual list "head"; the iterator points at a value, and provides APIs that let it behave like alinked_listpointed at that value.linked_listiterators provide nearly the entire API thatlinked_listitself provides, though modified to make sense given the context of pointing at any arbitrary node in the list.Indexing
Iterators support all operations you can perform by indexing into a linked list; you can get, set, and delete values.
However, the meaning of the index is slightly different for an iterator. Negative indices don't start at the end and work backwards; instead, they start at the current node and work backwards. If
tis a linked list containingrange(5), anditis an iterator pointing at value2, indexing would look like this:it | v [head] <-> [0] <-> [1] <-> [2] <-> [3] <-> [4] <-> [tail] it[-2] it[-1] it[0] it[1] it[2]If you advanced
itonce, so it pointed to the value3, indexing would now look like this:it | v [head] <-> [0] <-> [1] <-> [2] <-> [3] <-> [4] <-> [tail] it[-3] it[-2] it[-1] it[0] it[1]Indexing into or past the "head" and "tail" nodes raises an
IndexError.You can also use slices, e.g.
it[-3:5:2]. There are two important differences from slicing intolistordequeobjects:- First, negative indices in slices work like negative indices normally, retreating backwards into the list.
- Second, slices into
linked_listiterators don't clamp for you. Indexing into or past the "head" and "tail" nodes raises anIndexError, rather than silently clamping the indices to a legal range.
Method calls
You can also make method calls on the iterator, to operate on the list starting at the current node. These operations always operate relative to the current node, rather than relative to the beginning (or end) of the list. Also, as a rule, methods that operate on one or more nodes always operate on the current node.
Here are some examples. In these examples,
itis always a forwards iterator:it.poppops and returns the value the iterator currently points at, then moves the iterator back one node.it.rpoppops and returns the value the iterator currently points at, then moves the iterator forward one node.it.appendinserts a value after the current node.it.prependinserts a value before the current node.it.findsearches for a value, starting at the current node and continuing forwards.it.rfindsearches for a value, starting at the current node and continuing backwards.it.truncatedeletes all values at or after the current node. Whenit.truncateis done,itwill be pointing at "tail", andit[-1]will be unchanged.
Magic methods
linked_listiterators also implements many of the magic methods supported bylinked_list. For example, ifitis a forward iterator pointing at an arbitrary node:__bool__:bool(it)returnsTrueifitis not pointed at "tail".__contains__:v in itevaluates toTrueif the valuevis found at or afteritin the list.__eq__:it == it2is true if and only ifitandit2point to the same node. Iterators don't support relative comparison (less-than, etc).__iter__:iter(it)returns a copy ofit.__len__:len(t)returns the number of items at or afteritin the list. Ifitpoints to "tail", this returns 0.__reversed__:reversed(t)returns a reverse iterator pointing at the same node asit.
Special nodes
There are two rules that apply to iterators when interacting with special nodes:
- Special nodes never have a value.
- When an iterator navigates through a
linked_list, it automatically skips over special nodes.
Let's see specifically how iterators interact with special nodes. We'll create a new empty linked list, then create an iterator over that linked list, and call
nexton it twice:t = LinkedList((1,)) it = iter(t) value_a = next(it, None) value_b = next(it, None)
When this is done,
value_awill be1, andvalue_bwill beNone. How does this work internally?Our iterator
itstarted out pointing at the "head" node. When you callnext(it, None)the first time, it advances to1and returns it. The iterator is now pointing at the node for the value1. Callingnext(it, None)the second time advances to the "tail" node; this would normally raiseStopIteration, but the second argument tonextis a "default value" it will return instead of raising. So this second call tonext(it, None)just returnsNone. After this secondnextcall,itpoints to the "tail" node.Deleted nodes
If an iterator is pointing at a value, and that value is deleted, the node is demoted from a "data" node to a "special" node. The iterator continues to point to it. Consider this example:
t = linked_list([1, 2, 3, 4, 5]) it = t.find(3) del t[2]
The internal layout of the list and iterator now looks like this:
it | v [head] <-> [1] <-> [2] <-> [special] <-> [4] <-> [5] <-> [tail]Here
itpoints to a "special" node, where the value2used to be. (To be clear: the value of the node isn't the string"special". As mentioned before, special nodes have no value. We just put the word "special" there to annotate that as a special node.)If you now iterated over the linked list:
for i in t: print(i)
you'd see 1, 2, 4, and 5, like you'd expect. The special node is still there, but remember the rule: linked list iterators automatically skip over special nodes.
When you have an iterator pointed at a special node, you can do almost anything you can do with an iterator pointed at a normal node. You can:
- navigate, using
nextorpreviousorfindorrfindormatchorrmatch - create new iterators using
beforeorafter - insert new values using
appendorprependorextendorextendleft - attempt to remove values using
removeorrremove
What can't you do when pointing at a special node? Any operation that attempts to interact with the value of the current node will raise
SpecialNodeError(a subclass ofLookupError). For example:- Evaluating
it[0]. - Evaluating
it[-1:1]. - Popping the current value using
it.pop()orit.rpop().
If you're worried about whether your iterator is pointing at a special node, you can check the
specialproperty. That returnsNonefor a normal node,"head"for the head node,"tail"for the tail node, and"special"for any other special node.Reverse iterators
linked_listobjects also support reverse iteration. You create a reverse iterator by callingreversedon the list. You can also create a reverse iterator by callingreversedon a forwards iterator; this returns a reverse iterator pointing at the same node.Conceptually, a reverse iterator behaves identically to a forwards iterator, except the reverse iterator "sees" the list backwards. If you have a linked list that looks like this:
[head] <-> [1] <-> [2] <-> [special] <-> [3] <-> [4] <-> [5] <-> [tail]a reverse iterator would see the list like this:
[tail] <-> [5] <-> [4] <-> [3] <-> [special] <-> [2] <-> [1] <-> [head]Apart from this behavioral change, reverse iterators behave identically to forwards iterators. They support the exact same APIs with the same arguments.
This makes the behavior of a reverse iterator easy to predict. For example, if
fiis a forwards iterator, andriis a reverse iterator on the same list:- A newly-created reverse iterator points to the "tail" node, and
ri.reset()resetsriso it points at the "tail" node again. - A reverse iterator becomes exhausted once it reaches the "head" node.
ri.exhaust()movesriso it points at the "head" node. ri.append()inserts before the current node,ri.prepend()inserts after the current node. Remember, from the perspective ofri, it's inserting those nodes in the correct places!ri[1]evaluates to the previous value in the list, andri[-1]evaluates to the next value in the list.
One thing that doesn't change: when inserting multiple nodes (
splice,extend,rextend), the nodes are always inserted in forwards order. Effectively, iffiandripoint to the same node,fi.extend(X)andri.rextend(X)would do the same thing, andfi.rextend(X)andri.extend(X)would also do the same thingInvariants
- An iterator pointing at a node will continue to point at that node until it takes action to move to a new node.
- If you use an iterator to append a new value, and nobody deletes that value, and you subsequently advance that
iterator with
next()enough times, the iterator will yield that value.- If you use an iterator to prepend a new value, and nobody deletes that value, and you subsequently advance
that iterator with
previous()enough times, the iterator will yield that value.
- If you use an iterator to prepend a new value, and nobody deletes that value, and you subsequently advance
that iterator with
- As a rule, actions on iterators that act on multiple nodes include the node they're pointing at.
- iterator[0] always refers to the node the iterator is currently pointing at, even if it's a special node. If the index is non-zero, it skips over special nodes.
- You can't ever insert a node before head. You can't ever insert a node after tail.
The big Log
-
tl;dr
Here's the "elevator pitch" for why you want to use
Log.Do you ever do print-style debugging? Of course you do. The big
Logmakes print-style debugging so much better!Logautomatically prepends each log message with the elapsed time so far, as well as the name of the thread that logged the message.- Want to write to a file instead? Maybe a temporary file
with a dynamically-generated filename, so old logs don't
get overwritten? Maybe you want to buffer all the output
until the program exits, then print it all at once? Or
maybe you're done with debugging and you just want to switch
all logging off?
Logmakes it effortless to switch between all these options--or even to log to multiple places at the same time. You can easily write to a temporary file and print to the screen, or any combination of outputs. And the API is so easy, it's even easy to remember. Logadds some nice formatting options; you can call attention to one message by calling theboxmethod, which draws a box around the value in the output. You can also indent and dedent the log output using theenterandexitmethods.- If your program is multithreaded, you've probably observed
print statements interleaving. Python's
print()doesn't format the entire message then write it to stdout all at once; it does it a bit at a time, including the ending newline.Logonly writes complete messages. - By default
Loguses threading, which means it spends a lot less time in each call than callingprintwould. So, less overhead for your program, not to mention less Heisenberg-uncertainty around synchronization due to the time spent inprint. (Yes,printis written in C, so its code runs very fast. But logging using aLogin threaded mode does way less work. According to the wall clock, logging usingLogconsumes way less time in your thread than logging usingprint.)
Overview
big's
Logobject is a high-performance logging mechanism, suitable for debugging. It has a very convenient interface, and it's easy to get up and logging with very little configuration. It's thread-safe, ensuring logged messages are rendered atomically (unlike callingprintfrom multiple threads, which can interleave messages.) And with default parameters, logging a message is very quick--perhaps 5x faster than callingprint.Although it's really designed for "print-style" debugging--where you print out all the state in your program that you need to diagnose a problem--it's usable for classic application logging, like Python's
loggingmodule. That said, it has a very different interface, and is missing a lot of features as compared to theloggingmodule, so it's hardly a drop-in replacement.tl;dr
The downside of feature-rich classes and functions is that it can be hard to remember how to use them. So here's all you need to remember to be productive with
Log:When you create your
Logobject, just pass in the objects where you want the log to go--print, a file path, an open file handle, alist, whatever. Then call thatLogobject to log messages; it behaves likeprint.Getting started
Let's start by showing you a whole Python script that exercises many of the big
Logfeatures. We'll show you the script, then its output, and then we'll go over it line by line and discuss each of the features independently.Here's the script:
import big.all as big j = big.Log() j.print("Hello, world! 2 + 2 =", 2 + 2) j("Hello, world, round 2! 4 + 4 =", 4 + 4) j.write("This was written without formatting!\nLine breaks work too!\n") j.box(f"Today's important number is {6 * 7}!") with j.enter("Newline subsystem"): j("Line one!\nAnd here's line two.\n Leading spaces are preserved too!\nAnd here's the final line")
If you copy that to a new file, and run it (with big installed), you'll see this on the output:
=============================================================================== Log start at 2026/02/15 13:46:51.410751 PST =============================================================================== [000.0004777570 MainThread] Hello, world! 2 + 2 = 4 [000.0004971740 MainThread] Hello, world, round 2! 4 + 4 = 8 This was written without formatting! Line breaks work too! [000.0005088170 MainThread] +------------------------------------------------ [000.0005088170 MainThread] | Today's important number is 42! [000.0005088170 MainThread] +------------------------------------------------ [000.0005117220 MainThread] +-----+------------------------------------------ [000.0005117220 MainThread] |start| Newline subsystem [000.0005117220 MainThread] +-----+------------------------------------------ [000.0005174830 MainThread] Line one! [000.0005174830 MainThread] And here's line two. [000.0005174830 MainThread] Leading spaces are preserved too! [000.0005174830 MainThread] And here's the final line [000.0005208190 MainThread] +-----+------------------------------------------ [000.0005208190 MainThread] | end | Newline subsystem [000.0005208190 MainThread] +-----+------------------------------------------ =============================================================================== Log finish at 2026/02/15 13:46:51.411292 PST ===============================================================================Let's break it down, line by line.
To use
Log, simply instantiate aLogobject:j = big.Log()
With all default parameters, your
Logwill send the log to stdout viabuiltins.print. However, it'll use a separate thread to do the actual printing. Logging a message will lightly pre-format the message, then send it to theLogobject's internal thread; that thread finishes the formatting and sends it toprint.The usual method call to log a message is the
Log.printmethod:j.print("Hello, world! 2 + 2 =", 2 + 2)
Log.printhas a similar signature tobuiltins.print, although it doesn't support thefileparameter.This is so common, there's a shortcut: calling the
Loginstance itself behaves identically to calling theprintmethod:j("Hello, world, round 2! 4 + 4 =", 4 + 4)
The three lines we've examined so far produce these first five lines of the output:
=============================================================================== Log start at 2026/02/15 13:14:13.648816 PST =============================================================================== [000.0004797210 MainThread] Hello, world! 2 + 2 = 4 [000.0005004200 MainThread] Hello, world, round 2! 4 + 4 = 8As you can see,
Logautomatically adds a "start" banner showing the current local time that the log was started. (There's a symmetric "end" banner produced when the log is closed.) These banners are only emitted if some logging operation produces non-empty formatted output.Also, every printed log message gets a "prefix", showing the elapsed time so far (since the start of the log) and the thread that logged the message. The default prefix is fixed width; it's the thing enclosed in
'[...]'at the start of each line (after the start banner).(Obviously, if you run this script, your times will be different. Yours will be in the future... unless you've borrowed Guido's time machine.)
Let's explore the other
Logmethods that append messages to the log. The simplest one iswrite, which only takes onestrargument, and writes that string to the log without any further formatting whatsoever:j.write("This was written without formatting!\nLine breaks work too!\n")
This is useful in case you've carefully formatted some text by hand and you want to dump it straight into the log. (Or you simply want to suppress the per-line prefix.) Passing an empty string to
Log.writeis a no-op; it does not start the log and it does not emit banners.Logalso supports a method calledboxthat writes a message with a three-sided box drawn around it:j.box(f"Today's important number is {6 * 7}!")
If you want to call attention to a logged message, use
boxinstead ofprint. Note that theboxmethod also only takes a singlestrobject. (But formatting it is no problem--just use an f-string!)The above two lines in the script produce these five lines in the output:
This was written without formatting! Line breaks work too! [000.0005261250 MainThread] +------------------------------------------------ [000.0005261250 MainThread] | Today's important number is 42! [000.0005261250 MainThread] +------------------------------------------------You can see, the message logged with
Log.writedoesn't get the "prefix". And it's easy to pick out the "boxed" message due to the linesLogdraws around it.Let's finish up, examining the final two lines of the script.
Logsupports two intertwined methods,enterandexit.enterlogs a message in a box, then adds an indent that applies to subsequent messages.exitoutdents, then re-logs the message from the most recententerin another box. This is a great way to add some structure to your log, showing visually how your program enters and exits conceptual subsystems (modules, functions, classes, algorithms, what have you).For convenience,
Log.enteralso returns a "context manager". If you use a call toLog.enteras the argument to awithstatement, it'll automatically callLog.exitwhen you exit thewithstatement. We use that feature in the example script.The last line demonstrates one more feature of
Log. AllLogmethods that write to the log support newline characters, not justLog.write. But when you log a message using the other methods--methods that write the "prefix"--the lines are split up before they're formatted for the log, and each line gets the prefix.Here are the last two lines of the script:
with j.enter("Newline subsystem"): j("Line one!\nAnd here's line two.\n Leading spaces are preserved too!\nAnd here's the final line")
Those two lines produce these thirteen lines in the output:
[000.0005117220 MainThread] +-----+------------------------------------------ [000.0005117220 MainThread] |enter| Newline subsystem [000.0005117220 MainThread] +-----+------------------------------------------ [000.0005174830 MainThread] Line one! [000.0005174830 MainThread] And here's line two. [000.0005174830 MainThread] Leading spaces are preserved too! [000.0005174830 MainThread] And here's the final line [000.0005208190 MainThread] +-----+------------------------------------------ [000.0005208190 MainThread] |exit | Newline subsystem [000.0005208190 MainThread] +-----+------------------------------------------ =============================================================================== Log finish at 2026/02/15 13:46:51.411292 PST ===============================================================================Notice how the logged message inside the "enter" / "exit" section are indented by four spaces. If you nested another
with j.enterinside the first one, text logged inside that would be indented by eight spaces.Once the script exited,
Logautomatically closed the log for you, including writing the "end" banner with the end time of the log.Other methods
Log has three more methods; none of these write messages to the log.
Log.flush(block=True)flushes the log. In some configurations, the log will buffer up messages, instead of sending them right away. Aflushcall will flush all buffered messages. Ifblockis true (the default), theflushcall won't exit until all messages have been flushed. Ifblockis false,flushmay start the flushing process but won't necessarily wait until it's complete.Log.close(block=True)closes the log. If the log is dirty, it flushes first. When a log is closed, it ignores all attempts to write log messages to the log. No error is reported--the messages are simply ignored. The only way to re-open a closed log is to call itsresetmethod. If a log is still in its initial state when you close it, it closes directly without emitting the start banner, end banner, or any destination events besidesregister. Theblockparameter works the same as it does forflush.Log.reset()resets a log to its initial state. Resetting the log is the only way to re-open a closed log. After reset, the log behaves like a fresh log and won't emit the start banner until some operation produces non-empty formatted output.Every
Logalso registers an "atexit" handler. The "atexit" handler closes theLogif it hasn't been closed already.Destinations
All our logging so far has just gone to
builtins.print, which means it shows up onstdout. But what if you want to send the log somewhere else?All positional arguments to the
Logconstructor specify destinations. A destination is simply an object that the log sends messages to. You can specify as many destinations as you like when creating yourLoginstance; however you can't add or remove destinations later.As we've already seen, if you don't specify any destinations when you construct the
Log, the default is to log tobuiltins.print. You can explicitly configure your log to to this by passing inprintas a positional argument:j = Log(print)
But there are lots of other options! If you specify an list,
Logwill append all the logged message strings to that list:a = [] j = Log(a)
If you specify a string,
Logwill assume that string represents a filename, and append the log to that file:j = Log('/tmp/log.txt')
(You can also specify a
pathlib.Pathobject--that works the same.)When writing to a file,
Logwill buffer up messages until the log is flushed, then open the file, flush all messages with one big write, and close the file.Some more options:
- If you pass in an open file handle (like the result of calling
open),Logwill write log messages to the file handle. - If you pass in a callable,
Logwill call the callable once for every formatted message, with just one argument, the formatted log message. - There's a special sentinel value in the
big.logmodule:big.log.TMPFILE. If you specify that as a destination,Logwill send the log to a file with a dynamically generated name in your configured temporary directory. The filename starts with the name of the log; this is followed by the start time, then the process ID, and finally ends with.txt. Every time the log resets it switches to a new filename. - A destination of
Nonedoesn't log anywhere. If you want to create aLogobject that doesn't actually write the log anywhere, construct it asbig.Log(None).
If you pass in multiple destinations,
Logwill send all the log messages to all of them. You could log toTMPFILE,print, and an array, all at the same time!Finally, if you like the "buffer-until-flush" behavior of files, but want to use that with another destination, wrap that destination with a call to
Log.Buffer(). Without an argument,Log.Buffer()buffers all messages until flush, then writes them all tobuiltins.print. If you specify a destination (as a positional parameter), that destination is where the log messages are sent when the log is flushed. See the Custom Destinations section below for more information on this and similar advanced techniques.Configuration via keyword-only parameters
You configure your
Loginstance by supplying keyword arguments to theLogconstructor. Note that, like the list of destinations, these can only be specified as arguments to the constructor; you can't change any of these settings once theLoginstance is initialized.The most important setting is the
threadingparameter. This configures whether or not theLoginstance uses a worker thread to do all the actual logging. By defaultthreadingis true, which means the log uses an external thread to format and write the log messages. Log messages and other operations are sent to that thread via aqueue.Queue, which ensures log messages are sent to the log in a high-performance and thread-safe way.If you pass in
threading=Falseto theLogconstructor, the log won't use an external thread. Instead, every logging call will write to the log immediately; it uses athreading.Lockto guarantee atomicity.You can set the name of the log with the
nameparameter. This is shown in the start and end banners.indentdefaults to 4; this is how many spacesenterindents the log.widthdefaults to 79; this is the column that "line" formats are truncated at.clockis the clock used to compute the time at which log messages were logged. It should return an integer, which is the number of nanoseconds since some previous event. By default it usestime.monotonic_ns. (Well, on Python 3.7+ anyway. On Python 3.6 it simulatestime.monotonic_nsby callingtime.monotonicand multiplying it by a billion.)timestamp_clockis the clock used to produce timestamps used in the log. It should return a float, which is the number of seconds since the UNIX epoch. The default value istime.time.timestamp_formatis a callable that should convert atimestamp_clockvalue into a pleasing human-readable format; the default value isbig.time.timestamp_human.prefixis the string that will be inserted in front of every line of a logged message (exceptLog.write). It's formatted using the Line formatting rules as described in the next section. Theformatsparameter establishes the formatting applied to log messages of various types; this is quite involved, so see the Formats section below for documentation on that.Line formatting
Several configuration settings for
Loguse what's called "line formatting". This formatting method lets you substitute in dynamic values at runtime when writing to the log; the values are recomputed and substituted every timeLogformats a message for the log. For example, theprefixis freshly reformatted for every log message.The formatting is done using the
formatmethod on the string.Here are the values you can substitute when formatting the
prefix:elapsed: The elapsed time since the log was started, as float-seconds.format: The format being applied to this log message.line: The string used for repeating horizontal lines. See next section, Formats.name: The "name" of the log passed in to the Log constructor.thread: A handle to the thread that logged this message.time: The time of the log message, as float-seconds-since-epoch.timestamp: Thetimevalue, formatted using the "timestamp_format" callable passed in to the Log constructor.
There are three more (
line,message, andprefix) you can use when formatting a "format", see next section.Formats and format dicts
The formatting of the various formatted log messages is specified using a "format".
Logsupports those six formats, and has default values for each of those formats; you can change how those messages are formatted, or add your own.The
formatsparameter to theLogconstructor should specify a dict. A key in this dict is the name of a format; a custom format name must be a valid Python identifier, and can't collide with an existingLogmethod name. The matching value should be a "format dict", which specifies the formatting to use for this message.A "format dict" in turn supports only two fields, and one is optional. The required one is
"template", which should be a string; this string is formatted with the "line formatting" rules specified in the previous section, with some additional values. The optional one is"line", and you'll see what that's used for in a moment.The string value of
"template"is allowed to be a multi-line string. Each line will be split up and formatted independently, using the "line formatting" approach above. But a template supports three additional values:line: The value of the "line" value from the format dict. If{line}is the last thing on a template line (either immediately before a '\n', or immediately before the end of the template string), the "line" value will be repeatedly appended to the formatted string until the line is >=Log.width, and then the line will then be truncated atLog.widthcharacters. This is howboxand the other methods draw those horizontal lines.message: The message that was logged. You can specify multiple lines containing{message}, however, all lines containing{message}must be contiguous. The message will be split by lines, and then the template lines containing{message}will be "zipped" together with the lines of the message. For example, the first template line containing{message}will get the first line of the message, the second template line containing{message}will get the second line of the message, etc. If there are more template lines than lines in the message, it skips the subsequent template lines; if there are more lines in the message than template lines, the last template line is repeated.prefix: A string pre-formatted using theprefixformat string passed in to theLogconstructor.
Predefined and user-defined formats
Logcomes with six pre-defined formats:print: used (by default) byLog.printandLog.__call__box: used byLog.boxenter: used byLog.enterexit: used byLog.exitstart: used to produce the "start banner" whenever the log is startedend: used to produce the "end banner" whenever the log is closed
You can override the default formatting for these by passing in a new format dict for that value to the
formatsparameter to theLogconstructor. For example, to remove the prefix for lines printed byLog.printandLog.__call__, construct yourLogas follows:j = big.Log(..., formats={"print": {"template": "{message}"}})
You can also suppress the
start,end,enter, andexitbanners by setting any of those formats toNonein theformatsdict you pass in. This log won't print the start and end banners:j = big.Log(..., formats={"start": None, "end": None})
You can also add your own formats! Here's some sample code that defines a new format we'll call "critical":
j = big.Log(formats={"critical": {"template": "{prefix} =critical=start{line}\n{prefix} == {message}\n{prefix} =critical=end{line}", "line": "=-"}})
You'll notice, the template contains several newline characters, but doesn't end with a newline.
Logautomatically adds a trailing newline character for you.How do you use it?
Log.printandLog.__call__both take aformatkeyword-only parameter, which specifies the format to apply to the message. So you can use it like this:j("My hair is on fire!", format="critical")
However,
Logalso adds a method to the log instance for every user-defined format. You can just callj.peanutdirectly:j.critical("We're almost out of coffee!")
This line, using the
"critical"format specified above, produces the following output in the log:[000.0002390010 MainThread] =critical=start=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= [000.0002390010 MainThread] == We're almost out of coffee! [000.0002390010 MainThread] =critical=end=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=(And yes, you could use custom formats to simulate "log levels" with
Log.)Custom Destinations
Logsupports sending the log to lots of different kinds of objects, called "destinations". The actual implementation wraps each of these diverse object types in a wrapper object that handles logging to that "destination" object. These wrapper objects are all subclasses of a base class calledLog.Destination. You can write your ownDestinationsubclasses, andLogwill happily send the log to those too!First, declare your subclass. You probably need to write an
__init__; that's fine, but you must call the base class init.class MyDestination(Log.Destination): def __init__(self, o): super().__init__() ...
The methods on
Destinationsubclasses called by theLogobject are referred to as "events". They generally map to method calls on theLogobject, and represent the user calling that method on theLog.Second, all
Destinationsubclasses must implement awritemethod, like so:def write(elapsed, thread, formatted): ...
The
elapsedparameter is the elapsed time since the log was started / reset, in nanoseconds. Thethreadparameter is thethreading.Threadhandle for the thread that wrote this message, orNoneif the message isn't associated with a particular thread (like the "start banner" and "end banner" messages). Finally,formattedis the formatted string to be written to the log.Destination.writeis called directly to implementLog.write.Next there are seven optional
Destinationmethods representing higher-level events sent by theLogobject:def log(self, elapsed, thread, format, message, formatted): ... def flush(self): ... def reset(self): ... def start(self, start_time_ns, start_time_epoch): ... def end(self, elapsed): ... def enter(self, elapsed, thread, message): ... def exit(self, elapsed, thread): ...
Destination.logis called forLog.print,Log.__call__, and thestart,end,enter, andexitbanners when those banners produce non-empty formatted text.Destination.enterandDestination.exitare separate events that bracket changes in indentation depth.Destination.startandDestination.endreport lifecycle changes for the log itself. If you don't overrideDestination.log, the base class implementation callsself.write(elapsed, thread, formatted).Destinationalso supports optional methods handling the twoLogmethods that don't send a log message:def reset(self): ... def flush(self): ...
Finally,
Destinationobjects support aregistermethod, which is called when they're passed in to theLogconstructor. If you override this method, you must call the base class method, passing in theownerparameter, like so:def register(self, owner): super().register(owner) ...
Logobjects obey a certain lifecycle. If the log ever produces non-empty formatted output, theDestinationmethods will be called in this order:register | v start | +<---------------------------------+ | | v | write | log | enter | exit | flush | | | | | +----------------------------------+ | v [flush] | v endThe final
flush, beforeend, is optional; it's only called if theLogis dirty. If a log is created and then closed without ever producing non-empty formatted output, destinations only receive the initialregisterevent.Final notes on
LogThere's no guarantee that log messages will be logged in strict chronological order. It usually happens that way, but it's not guaranteed. If two threads both send a message at the same time, they might arrive in any order--and it's possible the later of the two arrives first. This is a known behavior, and is unlikely to change in future versions.
Bound inner classes
-
Overview
One feature missing from Python pertains to "inner classes"--classes defined inside other classes.
Consider this Python code:
class Outer(object): def think(self): pass o = Outer() o.think()
We've defined a function
thinkinside classOuter. When you callo.think, Python automatically passes in theoobject as the first parameter (by convention calledself). In object-oriented parlance,ois bound tothink, and indeed Python calls the objecto.thinka bound method:>>> o.think <bound method Outer.think of <__main__.Outer object at 0x########>>And if you refer to
Outer.think--if you get your reference to thethinkfunction from the class instead of an instance of the class--you just get the normal function.>>> Outer.think <function Outer.think at 0x7b5f4f49f110>But there's no similar mechanism for a class defined inside another class. Let's change our example and add a class inside
Outer:class Outer(object): def think(self): pass class Inner(object): def __init__(self): pass o = Outer() o.think() i = o.Inner()
But classes defined inside classes don't behave the same as functions defined inside classes. No matter how you reference
Inner, you get the same class object, whether you access it through the outer class (Outer.Inner) or through an instance of the outer class (o.Inner):>>> Outer.Inner <class '__main__.Outer.Inner'> >>> o.Inner <class '__main__.Outer.Inner'>And if you call
o.Inner(), Python won't automatically pass inoas an argument into the way it does foro.think(). If you want it passed in, you have to pass it in yourself, like so:class Outer(object): def think(self): pass class Inner(object): def __init__(self, outer): self.outer = outer o = Outer() o.method() i = o.Inner(o)
This seems redundant. You don't have to pass in
oexplicitly to method calls, why should you have to pass it in explicitly to inner classes?Well--now you don't have to! You can just decorate the inner class with
@big.BoundInnerClass, and theBoundInnerClassdecorator takes care of the rest.Using bound inner classes
Let's modify the above example to use our
BoundInnerClassdecorator:from big import BoundInnerClass class Outer(object): def think(self): pass @BoundInnerClass class Inner(object): def __init__(self, outer): self.outer = outer o = Outer() o.method() i = o.Inner()
Notice that
Inner.__init__now takes anouterparameter. But you didn't have to pass it in yourself! When you callo.Inner(),ois automatically passed in as theouterargument toInner.__init__. That's what the@BoundInnerClassdecorator does for you.Decorating an inner class like this always inserts a second positional parameter, after
self. And, likeself, you don't have to use the nameouter; you can use any name you like. (But we'll always use the nameouterin this documentation.)Inheritance
Bound inner classes get slightly complicated when mixed with inheritance. It's not all that difficult, you merely need to obey some rules:
-
Rule 1: A bound inner class can inherit normally from any unbound class.
Rule 2: If your bound inner class calls
super().__init__, and its parent class is also a bound inner class, don't pass inoutermanually. When you instantiate a bound inner class,outerwill be automatically passed in to all__init__methods of every bound inner parent class.Rule 3: A bound inner class can only inherit from a parent bound inner class if the parent is defined in the same outer class or a base of the outer class. If Child inherits from Parent, and Child and Parent are both decorated with
@BoundInnerClass(or@UnboundInnerClass), both classes must be defined in the same outer class (e.g.Outer) or in a base class of Child's outer class. This is a type relation constraint; bound inner classes guarantee that "outer" is an instance of the outer class.Rule 3a: A corollary of rule 3: A subclass of a bound inner class, whether bound or unbound, can only be defined in the same outer class or a subclass of the outer class.
Rule 4: Directly inheriting from a bound inner class is unsupported. If
ois an instance ofOuter, andOuter.Inneris an inner class decorated with@BoundInnerClass, don't write a class that directly inherits fromo.Inner, for exampleclass Mistake(o.Inner). You should always inherit from the unbound version, like this:class GotItRight(Outer.Inner)Rule 5: An inner class that inherits from a bound inner class, and which also wants to be bound to the outer object, should be decorated with
BoundInnerClass.Rule 6: An inner class that inherits from a bound inner class, but doesn't want to be bound to the outer object, should be decorated with
UnboundInnerClass.
Restating the last two rules: every class that descends from any class decorated with
BoundInnerClassmust itself be decorated with eitherBoundInnerClassorUnboundInnerClass. Which one you use depends on what behavior you want--whether or not you want your inner subclass to automatically get theouterinstance passed in to its__init__.Here's a simple example using inheritance with bound inner classes:
from big import BoundInnerClass, UnboundInnerClass class Outer(object): @BoundInnerClass class Parent(object): def __init__(self, outer): self.outer = outer @UnboundInnerClass class Child(Parent): def __init__(self): super().__init__() o = Outer() child = o.Child()
We followed the rules:
Outer.Parentinherits from object; since object isn't a bound inner class, there are no special rules about inheritanceOuter.Parentneeds to obey.- Since
Outer.Childinherits from aBoundInnerClass, it must be decorated with eitherBoundInnerClassorUnboundInnerClass. It doesn't want the outer object passed in, so it's decorated withUnboundInnerClass. Child.__init__callssuper().__init__, but doesn't pass inouter.- Both
ParentandChildare defined in the same class.
Note that, because
Childis decorated withUnboundInnerClass, it doesn't take anouterparameter. Nor does it pass in anouterargument when it callssuper().__init__. But when the constructor forParentis called, the correctouterparameter is passed in--like magic!If you wanted
Childto also get the outer argument passed in to its__init__, just decorate it withBoundInnerClassinstead ofUnboundInnerClass, like so:from big import BoundInnerClass class Outer(object): @BoundInnerClass class Parent(object): def __init__(self, outer): self.outer = outer @BoundInnerClass class Child(Parent): def __init__(self, outer): super().__init__() assert self.outer == outer o = Outer() child = o.Child()
Again,
Child.__init__doesn't need to explicitly pass inouterwhen callingsuper.__init__, but the correct value forouterdoes get passed in toParent.__init__.You can see more complex examples of using inheritance with
BoundInnerClass(andUnboundInnerClass) in the big test suite.Miscellaneous notes
-
A bound inner class is a subclass of the original (unbound) class.
o.Inneris a subclass ofOuter.Inner. -
Bound inner classes bound to different outer instances are different classes. This is symmetric with methods; if you have two objects
aandbthat are instances of the same class,a.BoundInnerClass != b.BoundInnerClass, just asa.method != b.method. -
If you refer to a inner class directly from the outer class (like
Outer.Inner) rather than an instance (likeo.Inner) you get the original (unbound) class.- You might be able call
Outer.Innerdirectly, to construct anInnerobject without using a bound version of the class. You'll have to pass in the outer parameter by hand--just like you'd have to pass in theselfparameter by hand when calling a method via the class (Outer.method) rather than via an instance of the class (o.method). This won't work ifOuter.Inneris a subclass of another bound inner class, and calls itssuper().__init__. The injection of the outer instance argument happens when the class is bound, and this handles injecting the argument for all the base classes too. Without this binding mechanism getting involved, theouterargument won't get supplied when calling the base class's__init__.
- You might be able call
-
Bound inner classes are cached in the outer object, which both provides a small speedup and ensures that
isinstancerelationships are consistent. This is an explicit feature, and you're permitted to rely on it.- If you use slots on your outer class, you must add a slot for BoundInnerClass to store its cache. Just add BOUNDINNERCLASS_OUTER_SLOTS to your slots tuple, like so:
__slots__ = ('x', 'y', 'z') + BOUNDINNERCLASS_OUTER_SLOTS
-
Binding only goes one level deep. If you had a bound inner class
Cdefine inside another bound inner classB, which in turn was defined inside a classA, the constructor forCwould be called with theBobject, but not theAobject. -
If you support Python 3.6, and you define bound inner child classes, you'll need to wrap all the bound inner base classes of those child classes with
big.boundinnerclass.bound_inner_base. For example:class Child(bound_inner_base(Parent)):This is unnecessary in Python 3.7+. However, usingbound_inner_baseworks fine in all versions of Python supported by big. -
The rewrite of bound inner classes that shipped with big version 0.13 removed some old provisos:
- You may now rename your inner classes, even after
decorating them with
@BoundInnerClass(or@UnboundInnerClass). In previous versions this would break some internal mechanisms, but renaming your classes is now explicitly supported behavior. - The race condition around creating and caching the bound version of an inner class from multiple threads has been prevented, by adding just a little internal locking. The implementation doesn't need to lock very often, so the performance cost is negligible.
- It's no longer required to call
super().__init__in bound inner subclasses! In fact it was probably never necessary. It's totally up to you whether or not you callsuper().__init__.
- You may now rename your inner classes, even after
decorating them with
-
The multi- family of string functions
-
This family of string functions was inspired by Python's
str.split,str.rsplit, andstr.splitlinesmethods. These string splitting methods are well-designed and often do what you want. But they're surprisingly narrow and opinionated. What if your use case doesn't map neatly to one of these functions?str.splitsupports two very specific modes of operation--unless you want to split your string in exactly one of those two modes, you probably can't usestr.splitto solve your problem.So what can you use? There's
re.split, but that can be hard to use.1 Regular expressions can be difficult to get right, and the semantics ofre.splitare subtly different from the usual string splitting functions. Not to mention, it doesn't support reverse!Now there's a new answer:
multisplit. The goal ofmultisplitis to be the be-all end-all string splitting function. It's designed to supercede every mode of operation provided bystr.split,str.rsplit, andstr.splitlines, and it can even replacestr.partitionandstr.rpartitiontoo.multisplitdoes it all!The downside of
multisplit's awesome flexibility is that it can be hard to use... after all, it takes five keyword-only parameters. However, these parameters and their defaults are designed to be easy to remember.The best way to cope with
multisplit's complexity is to use it as a building block for your own text splitting functions. For example, big usesmultisplitto implementmultipartition,normalize_whitespace,lines, and several other functions.The values returned or yielded by these functions are slices of the original object, or in some cases adjacent slices joined with
+. All slices are returned in left-to-right order; this even includes zero-length strings, which are sliced from the contextually correct spot.Using
multisplitTo use
multisplit, pass in the string you want to split, the separators you want to split on, and tweak its behavior with its five keyword arguments. It returns an iterator that yields string segments from the original string in your preferred format. The separator list is optional; if you don't pass one in, it defaults to an iterable of whitespace separators (eitherbig.whitespaceorbig.ascii_whitespace, as appropriate).The cornerstone of
multisplitis theseparatorsargument. This is an iterable of strings, of the same type (strorbytes) as the string you want to split (s).multisplitwill split the string at each non-overlapping instance of any string specified inseparators.multisplitlets you fine-tune its behavior via five keyword-only parameters:keeplets you include the separator strings in the output, in a number of different formats.separatelets you specify whether adjacent separator strings should be grouped together (likestr.splitoperating on whitespace) or regarded as separate (likestr.splitwhen you pass in an explicit separator).striplets you strip separator strings from the beginning, end, or both ends of the string you're splitting. It also supports a special progressive mode that duplicates the behavior ofstr.splitwhen you useNoneas the separator.maxsplitlets you specify the maximum number of times to split the string, exactly like themaxsplitargument tostr.split.reversemakesmultisplitbehave likestr.rsplit, starting at the end of the string and working backwards. (This only changes the behavior ofmultisplitif you usemaxsplit, or if your string contains overlapping separators.)
To make it slightly easier to remember, all these keyword-only parameters default to a false value. (Well, technically,
maxsplitdefaults to the special value-1, for compatibility withstr.split. But that's its special "don't do anything" magic value. All the other keyword-only parameters default toFalse.)multisplitalso inspiredmultistripandmultipartition, which also take this sameseparatorsarguments. There are also other big functions that take aseparatorsargument, for examplecomment_markersforlines_filter_line_comment_lines.)Demonstrations of each
multisplitkeyword-only parameterTo give you a sense of how the five keyword-only parameters changes the behavior of
multisplit, here's a breakdown of each of these parameters with examples.maxsplit-
maxsplitspecifies the maximum number of times the string should be split. It behaves the same as themaxsplitparameter tostr.split.The default value of
-1means "split as many times as you can". In our example here, the string can be split a maximum of three times. Therefore, specifying amaxsplitof-1is equivalent to specifying amaxsplitof2or greater:>>> list(big.multisplit('apple^banana_cookie', ('_', '^'))) # "maxsplit" defaults to -1 ['apple', 'banana', 'cookie'] >>> list(big.multisplit('apple^banana_cookie', ('_', '^'), maxsplit=0)) ['appleXbananaYcookie'] >>> list(big.multisplit('apple^banana_cookie', ('_', '^'), maxsplit=1)) ['apple', 'bananaYcookie'] >>> list(big.multisplit('apple^banana_cookie', ('_', '^'), maxsplit=2)) ['apple', 'banana', 'cookie'] >>> list(big.multisplit('apple^banana_cookie', ('_', '^'), maxsplit=3)) ['apple', 'banana', 'cookie']
maxsplithas interactions withreverseandstrip. For more information, see the documentation regarding those parameters below.
keep-
keepindicates whether or notmultisplitshould preserve the separator strings in the strings it yields. It supports four values: false, true, and the special valuesALTERNATINGandAS_PAIRS.When
keepis false,multisplitthrows away the separator strings; they won't appear in the output.>>> list(big.multisplit('apple#banana-cookie', ('#', '-'))) # "keep" defaults to False ['apple', 'banana', 'cookie'] >>> list(big.multisplit('apple-banana#cookie', ('#', '-'), keep=False)) ['apple', 'banana', 'cookie']
When
keepis true,multisplitkeeps the separators, appending them to the end of the separated string:>>> list(big.multisplit('apple$banana~cookie', ('$', '~'), keep=True)) ['appleX', 'bananaY', 'cookie']
When
keepisALTERNATING,multisplitkeeps the separators as separate strings. The first string yielded is always a non-separator string, and from then on it always alternates between a separator string and a non-separator string. Put another way, if you store the output ofmultisplitin a list, entries with an even-numbered index (0, 2, 4, ...) are always non-separator strings, and entries with an odd-numbered index (1, 3, 5, ...) are always separator strings.>>> list(big.multisplit('appleXbananaYcookie', ('X', 'Y'), keep=big.ALTERNATING)) ['apple', 'X', 'banana', 'Y', 'cookie']
Note that
ALTERNATINGalways emits an odd number of strings, and the first and last strings yielded are always non-separator strings. Likestr.split, if the string you're splitting starts or ends with a separator string,multisplitwill emit an empty string at the beginning or end, to preserve the "always begin and end with non-separator string" invariant:>>> list(big.multisplit('1a1z1', ('1',), keep=big.ALTERNATING)) ['', '1', 'a', '1', 'z', '1', '']
Finally, when
keepisAS_PAIRS,multisplitkeeps the separators as separate strings. But it doesn't yield bare strings; instead, it yields 2-tuples of strings. Every 2-tuple contains a non-separator string followed by a separator string.If the original string starts with a separator, the first 2-tuple will contain an empty non-separator string and the separator:
>>> list(big.multisplit('^apple-banana^cookie', ('-', '^'), keep=big.AS_PAIRS)) [('', '^'), ('apple', '-'), ('banana', '^'), ('cookie', '')]
The last 2-tuple will always contain an empty separator string:
>>> list(big.multisplit('apple*banana+cookie', ('*', '+'), keep=big.AS_PAIRS)) [('apple', '*'), ('banana', '+'), ('cookie', '')] >>> list(big.multisplit('apple*banana+cookie***', ('*', '+'), keep=big.AS_PAIRS, strip=True)) [('apple', '*'), ('banana', '+'), ('cookie', '')]
(This rule means that
AS_PAIRSalways emits an even number of strings. Contrast that withALTERNATING, which always emits an odd number of strings, and the last string it emits is always a non-separator string. Put another way: if you ignore the tuples, the list of strings emitted byAS_PAIRSis the same as those emitted byALTERNATING, exceptAS_PAIRSappends an empty string.)Because of this rule, if the original string ends with a separator, and
multisplitdoesn'tstripthe right side, the final tuple emitted byAS_PAIRSwill be a 2-tuple containing two empty strings:>>> list(big.multisplit('appleXbananaYcookieX', ('X', 'Y'), keep=big.AS_PAIRS)) [('apple', 'X'), ('banana', 'Y'), ('cookie', 'X'), ('', '')]
This looks strange and unnecessary. But it is what you want. This odd-looking behavior is discussed at length in the section below, titled Why do you sometimes get empty strings when you split?
The behavior of
keepcan be affected by the value ofseparate. For more information, see the next section, onseparate.
separate-
separateindicates whether multisplit should consider adjacent separator strings insas one separator or as multiple separators each separated by a zero-length string. It can be either false or true.>>> list(big.multisplit('apple=?banana?=?cookie', ('=', '?'))) # separate defaults to False ['apple', 'banana', 'cookie'] >>> list(big.multisplit('apple=?banana?=?cookie', ('=', '?'), separate=False)) ['apple', 'banana', 'cookie'] >>> list(big.multisplit('apple=?banana?=?cookie', ('=', '?'), separate=True)) ['apple', '', 'banana', '', '', 'cookie']
If
separateandkeepare both true values, and your string has multiple adjacent separators,multisplitwill viewsas having zero-length non-separator strings between the adjacent separators:>>> list(big.multisplit('appleXYbananaYXYcookie', ('X', 'Y'), separate=True, keep=True)) ['appleX', 'Y', 'bananaY', 'X', 'Y', 'cookie'] >>> list(big.multisplit('appleXYbananaYXYcookie', ('X', 'Y'), separate=True, keep=big.AS_PAIRS)) [('apple', 'X'), ('', 'Y'), ('banana', 'Y'), ('', 'X'), ('', 'Y'), ('cookie', '')]
strip-
stripindicates whether multisplit should strip separators from the beginning and/or end ofs. It supports five values: false, true,big.LEFT,big.RIGHT, andbig.PROGRESSIVE.By default,
stripis false, which means it doesn't strip any leading or trailing separators:>>> list(big.multisplit('%|apple%banana|cookie|%|', ('%', '|'))) # strip defaults to False ['', 'apple', 'banana', 'cookie', '']
Setting
stripto true strips both leading and trailing separators:>>> list(big.multisplit('%|apple%banana|cookie|%|', ('%', '|'), strip=True)) ['apple', 'banana', 'cookie']
big.LEFTandbig.RIGHTtellmultistripto only strip on that side of the string:>>> list(big.multisplit('.?apple.banana?cookie.?.', ('.', '?'), strip=big.LEFT)) ['apple', 'banana', 'cookie', ''] >>> list(big.multisplit('.?apple.banana?cookie.?.', ('.', '?'), strip=big.RIGHT)) ['', 'apple', 'banana', 'cookie']
big.PROGRESSIVEduplicates a specific behavior ofstr.splitwhen usingmaxsplit. It always strips on the left, but it only strips on the right if the string is completely split. Ifmaxsplitis reached before the entire string is split, andstripisbig.PROGRESSIVE,multisplitwon't strip the right side of the string. Note in this example how the trailing separatorYisn't stripped from the input string whenmaxsplitis less than3.>>> list(big.multisplit('^apple^banana_cookie_', ('^', '_'), strip=big.PROGRESSIVE)) ['apple', 'banana', 'cookie'] >>> list(big.multisplit('^apple^banana_cookie_', ('^', '_'), maxsplit=0, strip=big.PROGRESSIVE)) ['apple^banana_cookie_'] >>> list(big.multisplit('^apple^banana_cookie_', ('^', '_'), maxsplit=1, strip=big.PROGRESSIVE)) ['apple', 'banana_cookie_'] >>> list(big.multisplit('^apple^banana_cookie_', ('^', '_'), maxsplit=2, strip=big.PROGRESSIVE)) ['apple', 'banana', 'cookie_'] >>> list(big.multisplit('^apple^banana_cookie_', ('^', '_'), maxsplit=3, strip=big.PROGRESSIVE)) ['apple', 'banana', 'cookie'] >>> list(big.multisplit('^apple^banana_cookie_', ('^', '_'), maxsplit=4, strip=big.PROGRESSIVE)) ['apple', 'banana', 'cookie']
reverse-
reversespecifies wheremultisplitstarts parsing the string--from the beginning, or the end--and in what direction it moves when parsing the string--towards the end, or towards the beginning_ It only supports two values: when it's false,multisplitstarts at the beginning of the string, and parses moving to the right (towards the end of the string). But whenreverseis true,multisplitstarts at the end of the string, and parses moving to the left (towards the beginning of the string).This has two noticable effects on
multisplit's output. First, this changes which splits are kept whenmaxsplitis less than the total number of splits in the string. Whenreverseis true, the splits are counted starting on the right and moving towards the left:>>> list(big.multisplit('apple-banana|cookie', ('-', '|'), reverse=True)) # maxsplit defaults to -1 ['apple', 'banana', 'cookie'] >>> list(big.multisplit('apple-banana|cookie', ('-', '|'), maxsplit=0, reverse=True)) ['apple-banana|cookie'] >>> list(big.multisplit('apple-banana|cookie', ('-', '|'), maxsplit=1, reverse=True)) ['apple-banana', 'cookie'] >>> list(big.multisplit('apple-banana|cookie', ('-', '|'), maxsplit=2, reverse=True)) ['apple', 'banana', 'cookie'] >>> list(big.multisplit('apple-banana|cookie', ('-', '|'), maxsplit=3, reverse=True)) ['apple', 'banana', 'cookie']
The second effect is far more subtle. It's only relevant when splitting strings containing multiple overlapping separators. When
reverseis false, and there are two (or more) overlapping separators, the string is split by the leftmost overlapping separator. Whenreverseis true, and there are two (or more) overlapping separators, the string is split by the rightmost overlapping separator.Consider these two calls to
multisplit. The only difference between them is the value ofreverse. They produce different results, even though neither one usesmaxsplit.>>> list(big.multisplit('appleXYZbananaXYZcookie', ('XY', 'YZ'))) # reverse defaults to False ['apple', 'Zbanana', 'Zcookie'] >>> list(big.multisplit('appleXYZbananaXYZcookie', ('XY', 'YZ'), reverse=True)) ['appleX', 'bananaX', 'cookie']
Reimplementing library functions using
multisplitHere are some examples of how you could use
multisplitto replace some common Python string splitting methods. These exactly duplicate the behavior of the originals.def _multisplit_to_split(s, sep, maxsplit, reverse): separate = sep != None if separate: strip = False else: sep = big.ascii_whitespace if isinstance(s, bytes) else big.whitespace strip = big.PROGRESSIVE result = list(big.multisplit(s, sep, maxsplit=maxsplit, reverse=reverse, separate=separate, strip=strip)) if not separate: # ''.split() == ' '.split() == [] if result and (not result[-1]): result.pop() return result def str_split(s, sep=None, maxsplit=-1): return _multisplit_to_split(s, sep, maxsplit, False) def str_rsplit(s, sep=None, maxsplit=-1): return _multisplit_to_split(s, sep, maxsplit, True) def str_splitlines(s, keepends=False): linebreaks = big.ascii_linebreaks if isinstance(s, bytes) else big.linebreaks l = list(big.multisplit(s, linebreaks, keep=keepends, separate=True, strip=False)) if l and not l[-1]: # yes, ''.splitlines() returns an empty list l.pop() return l def _partition_to_multisplit(s, sep, reverse): if not sep: raise ValueError("empty separator") l = tuple(big.multisplit(s, (sep,), keep=big.ALTERNATING, maxsplit=1, reverse=reverse, separate=True)) if len(l) == 1: empty = b'' if isinstance(s, bytes) else '' if reverse: l = (empty, empty) + l else: l = l + (empty, empty) return l def str_partition(s, sep): return _partition_to_multisplit(s, sep, False) def str_rpartition(s, sep): return _partition_to_multisplit(s, sep, True)
You wouldn't want to use these, of course--Python's built-in functions are so much faster!
Why do you sometimes get empty strings when you split?
Sometimes when you split using
multisplit, you'll get empty strings in the return value. This might be unexpected, violating the Principle Of Least Astonishment. But there are excellent reasons for this behavior.Let's start by observing what
str.splitdoes.str.splitreally has two major modes of operation: when you don't pass in a separator (or pass inNonefor the separator), and when you pass in an explicit separator string. In this latter mode, the documentation says it regards every instance of a separator string as an individual separator splitting the string. What does that mean? Watch what happens when you have two adjacent separators in the string you're splitting:>>> '1,2,,3'.split(',') ['1', '2', '', '3']
What's that empty string doing between
'2'and'3'? Here's how you should think about it: when you pass in an explicit separator,str.splitsplits at every occurance of that separator in the string. It always splits the string into two places, whenever there's a separator. And when there are two adjacent separators, conceptually, they have a zero-length string in between them:>>> '1,2,,3'[4:4] ''
The empty string in the output of
str.splitrepresents the fact that there were two adjacent separators. Ifstr.splitdidn't add that empty string, the output would look like this:['1', '2', '3']
But then it'd be indistinguishable from splitting the same string without two separators in a row:
>>> '1,2,3'.split(',') ['1', '2', '3']
This difference is crucial when you want to reconstruct the original string from the split list.
str.splitwith a separator should always be reversable usingstr.join, and with that empty string there it works correctly:>>> ','.join(['1', '2', '3']) '1,2,3' >>> ','.join(['1', '2', '', '3']) '1,2,,3'
Now take a look at what happens when the string you're splitting starts or ends with a separator:
>>> ',1,2,3,'.split(',') ['', '1', '2', '3', '']
This might seem weird. But, just like with two adjacent separators, this behavior is important for consistency. Conceptually there's a zero-length string between the beginning of the string and the first comma. And
str.joinneeds those empty strings in order to correctly recreate the original string.>>> ','.join(['', '1', '2', '3', '']) ',1,2,3,'
Naturally,
multisplitlets you duplicate this behavior. When you wantmultisplitto behave just likestr.splitdoes with an explicit separator string, just pass inkeep=False,separate=True, andstrip=False. That is, ifaandbare strings,big.multisplit(a, (b,), keep=False, separate=True, strip=False)
always produces the same output as
a.split(b)
For example, here's
multisplitsplitting the strings we've been playing with, using these parameters:>>> list(big.multisplit('1,2,,3', (',',), keep=False, separate=True, strip=False)) ['1', '2', '', '3'] >>> list(big.multisplit(',1,2,3,', (',',), keep=False, separate=True, strip=False)) ['', '1', '2', '3', '']
This "emit an empty string" behavior also has ramifications when
keepisn't false. The behavior ofkeep=Trueis easy to predict;multisplitjust appends the separators to the previous string segment:>>> list(big.multisplit('1,2,,3', (',',), keep=True, separate=True, strip=False)) ['1,', '2,', ',', '3'] >>> list(big.multisplit(',1,2,3,', (',',), keep=True, separate=True, strip=False)) [',', '1,', '2,', '3,', '']
The principle here is that, when you use
keep=True, you should be able to reconstitute the original string with''.join:>>> ''.join(['1,', '2,', ',', '3']) '1,2,,3' >>> ''.join([',', '1,', '2,', '3,', '']) ',1,2,3,'
keep=big.ALTERNATINGis much the same, except we insert the separators as their own segments, rather than appending each one to the previous segment:>>> list(big.multisplit('1,2,,3', (',',), keep=big.ALTERNATING, separate=True, strip=False)) ['1', ',', '2', ',', '', ',', '3'] >>> list(big.multisplit(',1,2,3,', (',',), keep=big.ALTERNATING, separate=True, strip=False)) ['', ',', '1', ',', '2', ',', '3', ',', '']
Remember,
ALTERNATINGoutput always begins and ends with a non-separator string. If the string you're splitting begins or ends with a separator, the output frommultisplitspecifyingkeep=ALTERNATINGwill correspondingly begin or end with an empty string.And, as with
keep=True, you can also recreate the original string by passing these arrays in to''.join:>>> ''.join(['1', ',', '2', ',', '', ',', '3']) '1,2,,3' >>> ''.join(['', ',', '1', ',', '2', ',', '3', ',', '']) ',1,2,3,'
Finally there's
keep=big.AS_PAIRS. The behavior here seemed so strange, initially I thought it was wrong. But I've given it a lot of thought, and I've convinced myself that this is correct:>>> list(big.multisplit('1,2,,3', (',',), keep=big.AS_PAIRS, separate=True, strip=False)) [('1', ','), ('2', ','), ('', ','), ('3', '')] >>> list(big.multisplit(',1,2,3,', (',',), keep=big.AS_PAIRS, separate=True, strip=False)) [('', ','), ('1', ','), ('2', ','), ('3', ','), ('', '')]
That tuple at the end, just containing two empty strings:
('', '')
It's so strange. How can that be right?
In short, it's similar to the
str.splitsituation. When called withkeep=AS_PAIRS,multisplitguarantees that the final tuple will contain an empty separator string. If the string you're splitting ends with a separator, it must emit the empty non-separator string, followed by the empty separator string.Think of it this way: with the tuple of empty strings there, you can easily convert one
keepformat into any another. (Provided that you know what the separators were--either the sourcekeepformat was not false, or you only used one separator string when callingmultisplit). Without that tuple of empty strings at the end, you'd also have to have anifstatement to add or remove empty stuff from the end.I'll demonstrate this with a simple example. Here's the output of
multisplitsplitting the string'1a1z1'by the separator'1', in each of the fourkeepformats:>>> list(big.multisplit('1a1z1', '1', keep=False)) ['', 'a', 'z', ''] >>> list(big.multisplit('1a1z1', '1', keep=True)) ['1', 'a1', 'z1', ''] >>> list(big.multisplit('1a1z1', '1', keep=big.ALTERNATING)) ['', '1', 'a', '1', 'z', '1', ''] >>> list(big.multisplit('1a1z1', '1', keep=big.AS_PAIRS)) [('', '1'), ('a', '1'), ('z', '1'), ('', '')]
Because the
AS_PAIRSoutput ends with that tuple of empty strings, we can mechanically convert it into any of the other formats, like so:>>> result = list(big.multisplit('1a1z1', '1', keep=big.AS_PAIRS)) >>> result [('', '1'), ('a', '1'), ('z', '1'), ('', '')] >>> [s[0] for s in result] # convert to keep=False ['', 'a', 'z', ''] >>> [s[0]+s[1] for s in result] # convert to keep=True ['1', 'a1', 'z1', ''] >>> [s for t in result for s in t][:-1] # convert to keep=big.ALTERNATING ['', '1', 'a', '1', 'z', '1', '']
If the
AS_PAIRSoutput didn't end with that tuple of empty strings, you'd need to add anifstatement to restore the trailing empty strings as needed.Other differences between multisplit and str.split
str.splitreturns an empty list when you split an empty string by whitespace:>>> ''.split() []
But not when you split by an explicit separator:
>>> ''.split('x') ['']
multisplitis consistent here. If you split an empty string, it always returns an empty string, as long as the separators are valid:>>> list(big.multisplit('')) [''] >>> list(big.multisplit('', ('a', 'b', 'c'))) ['']
Similarly, when splitting a string that only contains whitespace,
str.splitalso returns an empty list:>>> ' '.split() []
This is really the same as "splitting an empty string", because when
str.splitsplits on whitespace, the first thing it does is strip leading whitespace.If you
multisplita string that only contains whitespace, and you split on whitespace characters, it returns two empty strings:>>> list(big.multisplit(' ')) ['', '']
This is because the string conceptually starts with a zero-length string, then has a run of whitespace characters, then ends with another zero-length string. So those two empty strings are the leading and trailing zero-length strings, separated by whitespace. If you tell
multisplitto also strip the string, you'll get back a single empty string:>>> list(big.multisplit(' ', strip=True)) ['']
And
multisplitbehaves consistently even when you use different separators:>>> list(big.multisplit('ababa', 'ab')) ['', ''] >>> list(big.multisplit('ababa', 'ab', strip=True)) ['']
And I should know--
multisplitis implemented usingre.split!
Whitespace and line-breaking characters in Python and big
-
Overview
Several functions in big take a
separatorsargument, an iterable of separator strings. Examples of these functions includelinesandmultisplit. Although you can use any iterable of strings you like, most often you'll be separating on some form of whitespace. But what, exactly, is whitespace? There's more to this topic than you might suspect.The good news is, you can almost certainly ignore all the complexity. These days the only whitespace characters you're likely to encounter are spaces, tabs, newlines, and maybe carriage returns. Python and big handle all those easily.
With respect to big and these
separatorsarguments, big provides four values designed for use asseparators. All four of these are tuples containing whitespace characters:- When working with
strobjects, you'll want to use eitherbig.whitespaceorbig.linebreaks.big.whitespacecontains all the whitespace characters,big.linebreakscontains just the line-breaking whitespace characters. - big also has equivalents for working with
bytesobjects:bytes_whitespaceandbytes_linebreaks, respectively.
Apart from exceptionally rare occasions, these are all you'll ever need. And if that's all you need, you can stop reading this section now.
But what about those exceptionally rare occasions? You'll be pleased to know big handles them too. The rest of this section is a deep dive into these rare occasions.
Python
Here's the list of all characters recognized by Python
strobjects as whitespace characters:# char decimal hex name ########################################## '\t' , # 9 - 0x0009 - tab '\n' , # 10 - 0x000a - newline '\v' , # 11 - 0x000b - vertical tab '\f' , # 12 - 0x000c - form feed '\r' , # 13 - 0x000d - carriage return '\x1c' , # 28 - 0x001c - file separator '\x1d' , # 29 - 0x001d - group separator '\x1e' , # 30 - 0x001e - record separator '\x1f' , # 31 - 0x001f - unit separator ' ' , # 32 - 0x0020 - space '\x85' , # 133 - 0x0085 - next line '\xa0' , # 160 - 0x00a0 - non-breaking space '\u1680', # 5760 - 0x1680 - ogham space mark '\u2000', # 8192 - 0x2000 - en quad '\u2001', # 8193 - 0x2001 - em quad '\u2002', # 8194 - 0x2002 - en space '\u2003', # 8195 - 0x2003 - em space '\u2004', # 8196 - 0x2004 - three-per-em space '\u2005', # 8197 - 0x2005 - four-per-em space '\u2006', # 8198 - 0x2006 - six-per-em space '\u2007', # 8199 - 0x2007 - figure space '\u2008', # 8200 - 0x2008 - punctuation space '\u2009', # 8201 - 0x2009 - thin space '\u200a', # 8202 - 0x200a - hair space '\u2028', # 8232 - 0x2028 - line separator '\u2029', # 8233 - 0x2029 - paragraph separator '\u202f', # 8239 - 0x202f - narrow no-break space '\u205f', # 8287 - 0x205f - medium mathematical space '\u3000', # 12288 - 0x3000 - ideographic spaceThis list was derived by iterating over every character defined in Unicode, and testing to see if the
split()method on a Pythonstrobject splits at that character.The first surprise: this isn't the same as the list of all characters defined by Unicode as whitespace. It's almost the same list, except Python adds four extra characters:
'\x1c','\x1d','\x1e', and'\x1f', which respectively are called "file separator", "group separator", "record separator", and "unit separator". I'll refer to these as "the four ASCII separator characters".These characters were defined as part of the original ASCII standard, way back in 1963. As their names suggest, they were intended to be used as separator characters for data, the same way Ctrl-Z was used to indicate end-of-file in the CPM and earliest FAT filesystems. But the four ASCII separator characters were rarely used even back in the day. Today they're practically unheard of.
As a rule, printing these characters to the screen generally doesn't do anything--they don't move the cursor, and the screen doesn't change. So their behavior is a bit mysterious. A lot of people (including early Python programmers it seems!) thought that meant they're whitespace. This seems like an odd conclusion to me. After all, all the other whitespace characters move the cursor, either right or down or both; these don't move the cursor at all.
The Unicode standard is unambiguous: these characters are not whitespace. And yet Python's "Unicode object" behaves as if they are. So I'd say this is a bug; Python's Unicode object should implement what the Unicode standard says.
It seems that the C library used by GCC and clang on my workstation agree. I wrote a quick C program to print out what characters are and aren't whitespace, according to the C function isspace(). It seems the C library agrees with Unicode: it doesn't consider the four ASCII separator characters to be whitespace.
Here's the program, in case you want to try it yourself.
#include <stdio.h> #include <ctype.h> int main(int c, char *a[]) { int i; printf("\nisspace table.\nAdd the row and column numbers together (in hex).\n\n"); printf(" | 0 1 2 3 4 5 6 7 8 9 a b c d e f\n"); printf("-----+--------------------------------\n"); for (i = 0 ; i < 256 ; i++) { char *message = isspace(i) ? "Y" : "n"; if ((i % 16) == 0) printf("0x%02x |", i); printf(" %s", message); if ((i % 16) == 15) printf("\n"); } return 0; }Here's its output on my workstation:
isspace table. Add the row and column numbers together (in hex). | 0 1 2 3 4 5 6 7 8 9 a b c d e f -----+-------------------------------- 0x00 | n n n n n n n n n Y Y Y Y Y n n 0x10 | n n n n n n n n n n n n n n n n 0x20 | Y n n n n n n n n n n n n n n n 0x30 | n n n n n n n n n n n n n n n n 0x40 | n n n n n n n n n n n n n n n n 0x50 | n n n n n n n n n n n n n n n n 0x60 | n n n n n n n n n n n n n n n n 0x70 | n n n n n n n n n n n n n n n n 0x80 | n n n n n n n n n n n n n n n n 0x90 | n n n n n n n n n n n n n n n n 0xa0 | n n n n n n n n n n n n n n n n 0xb0 | n n n n n n n n n n n n n n n n 0xc0 | n n n n n n n n n n n n n n n n 0xd0 | n n n n n n n n n n n n n n n n 0xe0 | n n n n n n n n n n n n n n n n 0xf0 | n n n n n n n n n n n n n n n n0x1c through 0x1f are represented by the last four
ncharacters on the second line, the0x10line. The fact that they'rens tells you that this C standard library doesn't consider those characters to be whitespace.Like many bugs, this one has lingered for a long time. The behavior is present in Python 2, there's a ten-year-old issue on the Python issue tracker about this, and it's not making progress.
The second surprise has to do with
bytesobjects. Of course,bytesobjects represent binary data, and don't necessarily represent characters. Even if they do, they don't have any encoding associated with them. However, for convenience--and backwards-compatibility with Python 2--Python'sbytesobjects support several method calls that treat the data as if it were "ASCII-compatible".The surprise: These methods on Python
bytesobjects recognize a different set of whitespace characters. Here's the list of all bytes recognized by Pythonbytesobjects as whitespace:# char decimal hex name ####################################### '\t' , # 9 - 0x09 - tab '\n' , # 10 - 0x0a - newline '\v' , # 11 - 0x0b - vertical tab '\f' , # 12 - 0x0c - form feed '\r' , # 13 - 0x0d - carriage return ' ' , # 32 - 0x20 - spaceThis list was derived by iterating over every possible byte value, and testing to see if the
split()method on a Pythonbytesobject splits at that byte.The good news is, this list is the same as ASCII's list, and it agrees with Unicode. In fact this list is quite familiar to C programmers; it's the same whitespace characters recognized by the standard C function
isspace()(inctypes.h). Python has used this function to decide which characters are and aren't whitespace in 8-bit strings since its very beginning.Notice that this list doesn't contain the four ASCII separator characters. That these two types in Python don't agree only enhances the mystery.
Line-breaking characters
The situation is slightly worse with line-breaking characters. Line-breaking characters (aka linebreaks) are a subset of whitespace characters; they're whitespace characters that always move the cursor down to the next line. And, as with whitespace generally, Python
strobjects don't agree with Unicode about what is and is not a line-breaking character, and Pythonbytesobjects don't agree with either of those.Here's the list of all Unicode characters recognized by Python
strobjects as line-breaking characters:# char decimal hex name ########################################## '\n' , # 10 0x000a - newline '\v' , # 11 0x000b - vertical tab '\f' , # 12 0x000c - form feed '\r' , # 13 0x000d - carriage return '\x1c' , # 28 0x001c - file separator '\x1d' , # 29 0x001d - group separator '\x1e' , # 30 0x001e - record separator '\x85' , # 133 0x0085 - next line '\u2028', # 8232 0x2028 - line separator '\u2029', # 8233 0x2029 - paragraph separatorThis list was derived by iterating over every character defined in Unicode, and testing to see if the
splitlines()method on a Pythonstrobject splits at that character.Again, this is different from the list of characters defined as line-breaking whitespace in Unicode. And again it's because Python defines some of the four ASCII separator characters as line-breaking characters. In this case it's only the first three; Python doesn't consider the fourth, "unit separator", as a line-breaking character. (I don't know why Python draws this distinction... but then again, I don't know why it considers the first three to be line-breaking. It's all a mystery to me.)
Here's the list of all characters recognized by Python
bytesobjects as line-breaking characters:# char decimal hex name ####################################### '\n' , # 10 0x000a - newline '\r' , # 13 0x000d - carriage returnThis list was derived by iterating over every possible byte, and testing to see if the
splitlines()method on a Pythonbytesobject splits at that byte.It's here we find our final unpleasant surprise: the methods on Python
bytesobjects don't consider'\v'(vertical tab) and'\f'(form feed) to be line-break characters. I assert this is also a bug. These are well understood to be line-breaking characters; "vertical tab" is like a "tab", except it moves the cursor down instead of to the right. And "form feed" moves the cursor to the top left of the next "page", which requires advancing at least one line.How big handles this situation
To be crystal clear: the odds that any of this will cause a problem for you are extremely low. In order for it to make a difference:
- you'd have to encounter text using one of these six characters where Python disagrees with Unicode and ASCII, and
- you'd have to process the input based on some definition of whitespace, and
- it would have to produce different results than you might have other wise expected, and
- this difference in results would have to be important.
It seems extremely unlikely that all of these will be true for you.
In case this does affect you, big has a complete set of predefined whitespace tuples that will handle any of these situations. big defines a total of ten tuples, sorted into five categories.
In every category there are two values: one that contains
whitespace, the other containslinebreaks. Thewhitespacetuple contains all the possible values of whitespace--characters that move the cursor either horizontally, or vertically, or both, but don't print anything visible to the screen. Thelinebreakstuple contains the subset of whitespace characters that move the cursor vertically.The most important two values start with
str_:str_whitespaceandstr_linebreaks. These contain all the whitespace characters recognized by the Pythonstrobject.Next are two values that start with
unicode_:unicode_whitespaceandunicode_linebreaks. These contain all the whitespace characters defined in the Unicode standard. They're the same as thestr_tuples except we remove the four ASCII separator characters.Third, two values that start with
ascii_:ascii_whitespaceandascii_linebreaks. These contain all the whitespace characters defined in ASCII. (Note that these containstrobjects, notbytesobjects.) They're the same as theunicode_tuples, except we throw away all characters with a code point higher than 127.Fourth, two values that start with
bytes_:bytes_whitespaceandbytes_linebreaks. These contain all the whitespace characters recognized by the Pythonbytesobject. These tuples containbytesobjects, encoded using theasciiencoding. The list of characters is distinct from the other sets of tuples, and was derived as described above.Finally we have the two tuples that lack a prefix:
whitespaceandlinebreaks. These are the tuples you should use most of the time, and several big functions use them as default values. These are simply copies ofstr_whitespaceandstr_linebreaksrespectively.(big actually defines an additional ten tuples, as discussed in the very next section.)
The Unix, Mac, and DOS linebreak conventions
Historically, different platforms used different ASCII characters--or sequences of ASCII characters--to represent "go to the next line" in text files. Here are the most popular conventions:
\n - UNIX, Amiga, macOS 10+ \r - macOS 9 and earlier, many 8-bit computers \r\n - Windows, DOS(There are a couple more conventions, and a lot more history, in the Wikipedia article on newlines.)
Handling these differing conventions was a real mess, for a long time--not just for computer programmers, but in the daily lives of many computer users. It was a continual problem for software developers back in the 90s, particularly those who frequently switched back and forth between the two platforms. And it took a long time before software development tooling figured out how to seamlessly handle all the newline conventions.
Python itself went through several iterations on how to handle this, eventually implementing "universal newlines" support, added way back in Python 2.3.
These days the world seems to have converged on the UNIX standard,
'\n'; Windows supports it, and it's the default on every other modern platform. So in practice these days you probably don't have end-of-line conversion problems; as long as you're decoding files to Unicode, and you don't disable "universal newlines", it probably all works fine and you never even noticed.However! big strives to behave identically to Python in every way. And even today, Python considers the DOS linebreak sequence to be one linebreak, not two.
The Python
splitlinesmethod on a string splits the string at linebreaks. And if thekeependspositional parameter is True, it appends the linebreak character(s) at the end of each substring. A quick experiment withsplitlineswill show us what Python thinks is and isn't a linebreak. Sure enough,splitlinesconsiders '\n\r' to be two linebreaks, but it treats\r\nas a single linebreak:' a \n b \r c \r\n d \n\r e '.splitlines(True)
produces
[' a \n', ' b \r', ' c \r\n', ' d \n', '\r', ' e ']
Naturally, if you use big to split by lines, you get the same result:
list(big.multisplit(' a \n b \r c \r\n d \n\r e ', big.linebreaks, separate=True, keep=True))
How do we achieve this? big has one more trick. All of the tuples defined in the previous section--from
whitespacetoascii_linebreaks--also contain the DOS linebreak convention:'\r\n'(The equivalent
bytes_tuples contain thebytesequivalent,b'\r\n.)Because of this inclusion, when you use one of these tuples with one of the big functions that take
separators, it'll recognize\r\nas if it was one whitespace "character". (Just in case one happens to creep into your data.) And since functions likemultisplitare "greedy", preferring the longest matching separator, if the string you're splitting contains'\r\n', it'll prefer matching'\r\n'to just'\r'.If you don't want this behavior, just add the suffix
_without_crlfto the end of any of the ten tuples, e.g.whitespace_without_crlf,bytes_linebreaks_without_crlf.Whitespace and line-breaking characters for other platforms
What if you need to split text by whitespace, or by lines, but that text is in
bytesformat with an unusual encoding? big makes that easy too. If one of the builtin tuples won't work for you, you can can make your own tuple from scratch, or modify an existing tuple to meet your needs.For example, let's say you need to split a document by whitespace, and the document is encoded in code page 850 or code page 437. (These two code pages are the most common code pages in English-speaking countries.)
Normally the easiest thing would be to decode it a
strobject using the'cp850'or'cp437'text codec as appropriate, then operate on it normally. But you might have reasons why you don't want to decode it--maybe the document is damaged and doesn't decode properly, and it's easier to work with the encoded bytes than to fix it. If you want to process the text with a big function that accepts aseparatorargument, you could make your own custom tuples of whitespace characters. These two codepages have the same whitespace characters as ASCII, but they both add one more: value 255, "non-breaking space", a space character that is not line-breaking. (The intention is, this character should behave like a space, except you shouldn't break a line at this character when word wrapping.)It's easy to make the appropriate tuples yourself:
cp437_linebreaks = cp850_linebreaks = big.bytes_linebreaks cp437_whitespace = cp850_whitespace = big.bytes_whitespace + (b'\xff',)Those tuples would work fine as the
separatorsargument for any big function that takes one.What if you want to process a
bytesobject containing UTF-8? That's easy too. Just convert one of the existing tuples containingstrobjects usingbig.encode_strings. For example, to split a UTF-8 encoded bytes objectbusing the Unicode line-breaking characters, you could call:multisplit(b, encode_strings(unicode_linebreaks, encoding='utf-8'))Note that this technique probably won't work correctly for most other multibyte encodings, for example UTF-16. For these encodings, you should decode to
strbefore processing.Why? It's because
multisplitcould find matches in multibyte sequences straddling characters. Consider this example:>>> haystack = '\u0101\u0102' >>> needle = '\u0201' >>> needle in haystack False >>> >>> encoded_haystack = haystack.encode('utf-16-le') >>> encoded_needle = needle.encode('utf-16-le') >>> encoded_needle in encoded_haystack True
The character
'\u0201'doesn't appear in the original string, but the encoded version appears in the encoded string, as the second byte of the first character and the first byte of the second character:>>> encoded_haystack b'\x01\x01\x02\x01' >>> encoded_needle b'\x01\x02'
But you can avoid this problem if you know you're working in bytes on two-byte sequences. Split the bytes string into two-byte segments and operate on those.
- When working with
Word wrapping and formatting
-
big contains three functions used to reflow and format text in a pleasing manner. In the order you should use them, they are
split_text_with_code,wrap_words(),, and optionallymerge_columns. This trio of functions gives you the following word-wrap superpowers:- Paragraphs of text representing embedded "code" don't get word-wrapped. Instead, their formatting is preserved.
- Multiple texts can be merged together into multiple columns.
"text" vs "code"
The big word wrapping functions also distinguish between "text" and "code". The main distinction is, "text" lines can get word-wrapped, but "code" lines shouldn't. big considers any line starting with enough whitespace to be a "code" line; by default, this is four spaces. Any non-blank line that starting with four spaces is a "code" line, and any non-blank line that starts with less than four spaces is a "text" line.
In "text" mode:
- words are separated by whitespace,
- initial whitespace on the line is discarded,
- the amount of whitespace between words is irrelevant,
- individual newline characters are ignored, and
- more than two newline characters are converted into exactly two newlines (aka a "paragraph break").
In "code" mode:
- all whitespace is preserved, except for trailing whitespace on a line, and
- all newline characters are preserved.
Also, whenever
split_text_with_codeswitches between "text" and "code" mode, it emits a paragraph break.Split text array
A split text array is an intermediary data structure used by big.text functions to represent text. It's literally just an array of strings, where the strings represent individual word-wrappable substrings.
split_text_with_codereturns a split text array, andwrap_words()consumes a split text array.You'll see four kinds of strings in a split text array:
- Individual words, ready to be word-wrapped.
- Entire lines of "code", preserving their formatting.
- Line breaks, represented by a single newline:
'\n'. - Paragraph breaks, represented by two newlines:
'\n\n'.
Examples
This might be clearer with an example or two. The following text:
hello there! this is text. this is a second paragraph!would be represented in a Python string as:
"hello there!\nthis is text.\n\n\nthis is a second paragraph!"
Note the three newlines between the second and third lines.
If you then passed this string in to
split_text_with_code, it'd return this split text array:[ 'hello', 'there!', 'this', 'is', 'text.', '\n\n', 'this', 'is', 'a', 'second', 'paragraph!']
split_text_with_codemerged the first two lines together into a single paragraph, and collapsed the three newlines separating the two paragraphs into a "paragraph break" marker (two newlines in one string).Now let's add an example of text with some "code". This text:
What are the first four squared numbers? for i in range(1, 5): print(i**2) Python is just that easy!would be represented in a Python string as (broken up into multiple strings for clarity):
"What are the first four squared numbers?\n\n" + " for i in range(1, 5):\n\n\n" + " print(i**2)\n\nPython is just that easy!"
split_text_with_codeconsiders the two lines with initial whitespace as "code" lines, and so the text is split into the following split text array:['What', 'are', 'the', 'first', 'four', 'squared', 'numbers?', '\n\n', ' for i in range(1, 5):', '\n', '\n', '\n', ' print(i**2)', '\n\n', 'Python', 'is', 'just', 'that', 'easy!']
Here we have a "text" paragraph, followed by a "code" paragraph, followed by a second "text" paragraph. The "code" paragraph preserves the internal newlines, though they are represented as individual "line break" markers (strings containing a single newline). Every paragraph is separated by a "paragraph marker".
Here's a simple algorithm for joining a split text array back into a single string:
prev = None a = [] for word in split_text_array: if not (prev and prev.isspace() and word.isspace()): a.append(' ') a.append(word) text = "".join(a)
Of course, this algorithm is too simple to do word wrapping. Nor does it handle adding two spaces after sentence-ending punctuation. In practice, you shouldn't do this by hand; you should use
wrap_words.Merging columns
merge_columnsmerges multiple strings into columns on the same line.For example, it could merge these three Python strings:
[ "Here's the first\ncolumn of text.", "More text over here!\nIt's the second\ncolumn! How\nexciting!", "And here's a\nthird column.", ]
into the following text:
Here's the first More text over here! And here's a column of text. It's the second third column. column! How exciting!(Note that
merge_columnsdoesn't do its own word-wrapping; instead, it's designed to consume the output ofwrap_words.)Each column is passed in to
merge_columnsas a "column tuple":(s, min_width, max_width)
sis the string,min_widthis the minimum width of the column, andmax_widthis the minimum width of the column.As you saw above,
scan contain newline characters, andmerge_columnsobeys those when formatting each column.For each column,
merge_columnsmeasures the longest line of each column. The width of the column is determined as follows:- If the longest line is less than
min_widthcharacters long, the column will bemin_widthcharacters wide. - If the longest line is less than or equal to
min_widthcharacters long, and less than or equal tomax_widthcharacters long, the column will be as wide as the longest line. - If the longest line is greater than
max_widthcharacters long, the column will bemax_widthcharacters wide, and lines that are longer thanmax_widthcharacters will "overflow".
Overflow
What is "overflow"? It's a condition
merge_columnsmay encounter when the text in a column is wider than that column'smax_width.merge_columnsneeds to consider both "overflow lines", lines that are longer thanmax_width, and "overflow columns", columns that contain one or more overflow lines.What does
merge_columnsdo when it encounters overflow?merge_columnssupports three "strategies" to deal with this condition, and you can specify which one you want using itsoverflow_strategyparameter. The three strategies are:-
OverflowStrategy.RAISE: Raise anOverflowErrorexception. The default. -
OverflowStrategy.INTRUDE_ALL: Intrude into all subsequent columns on all lines where the overflowed column is wider than itsmax_width. The subsequent columns "make space" for the overflow text by not adding text on those overflowed lines; this is called "pausing" their output. -
OverflowStrategy.DELAY_ALL: Delay all columns after the overflowed column, not beginning any until after the last overflowed line in the overflowed column. This is like theINTRUDE_ALLstrategy, except that the columns "make space" by pausing their output until the last overflowed line.
When
overflow_strategyisINTRUDE_ALLorDELAY_ALL, and eitheroverflow_beforeoroverflow_afteris nonzero, these specify the number of extra lines before or after the overflowed lines in a column where the subsequent columns "pause".
Enhanced TopologicalSorter
-
Overview
big's
TopologicalSorteris a drop-in replacement forgraphlib.TopologicalSorterin the Python standard library (new in 3.9). However, the version in big has been greatly upgraded:prepareis now optional, though it still performs a cycle check.- You can add nodes and edges to a graph at any time, even while iterating over the graph. Adding nodes and edges always succeeds.
- You can remove nodes from graph
gwith the new methodg.remove(node). Again, you can do this at any time, even while iterating over the graph. Removing a node from the graph always succeeds, assuming the node is in the graph. - The functionality for iterating over a graph now lives in its own object called
a view. View objects implement the
get_ready,done, and__bool__methods. There's a default view built in to the graph object; theget_ready,done, and__bool__methods on a graph just call into the graph's default view. You can create a new view at any time by calling the newviewmethod.
Note that if you're using a view to iterate over the graph, and you modify the graph, and the view now represents a state that isn't coherent with the graph, attempting to use that view raises a
RuntimeError. (I'll define what I mean by view "coherence" in the next subsection.)This implementation also fixes some minor warts with the existing API:
- In Python's implementation,
static_orderandget_ready/doneare mutually exclusive. If you ever callget_readyon a graph, you can never callstatic_order, and vice-versa. The implementaiton in big doesn't have this restriction, because its implementation ofstatic_ordercreates and uses a new view object every time it's called. - In Python's implementation, you can only iterate over the graph once, or call
static_orderonce. The implementation in big solves this in several ways: it allows you to create as many views as you want, and you can call the newresetmethod on a view to reset it to its initial state.
View coherence
So what does it mean for a view to no longer be coherent with the graph? Consider the following code:
g = big.TopologicalSorter() g.add('B', 'A') g.add('C', 'A') g.add('D', 'B', 'C') g.add('B', 'A') v = g.view() g.ready() # returns ('A',) g.add('A', 'Q')
First this creates a graph
gwith a classic "diamond" dependency pattern. Then it creates a new viewv, and gets the currently "ready" nodes, which consists just of the node'A'. Finally it adds a new dependency:'A'depends on'Q'.At this moment, view
vis no longer coherent.'A'has been marked as "ready", but'Q'has not. And yet'A'depends on'Q'. All those statements can't be true at the same time! So viewvis no longer coherent, and any attempt to interact withvraises an exception.To state it more precisely: if view
vis a view on graphg, and you callg.add('Z', 'Y'), and neither of these statements is true in viewv:'Y'has been marked asdone.'Z'has not yet been yielded byget_ready.
then
vis no longer "coherent".(If
'Y'has been marked asdone, then it's okay to make'Z'dependent on'Y'regardless of what state'Z'is in. Likewise, if'Z'hasn't been yielded byget_readyyet, then it's okay to make'Z'dependent on'Y'regardless of what state'Y'is in.)Note that you can restore a view to coherence. In this case, removing either
YorZfromgwould resolve the incoherence betweenvandg, andvwould start working again.Also note that you can have multiple views, in various states of iteration, and by modifying the graph you may cause some to become incoherent but not others. Views are completely independent from each other.
Release history
0.13.1
2026/03/23
This is mostly a bugfix and polish release for 0.13, though I added one new helper class in big.template and a few small APIs.
linked_listgot new APIs and a heap of bug fixes! It's more correct than ever!- Added
move()/rmove()tolinked_list,linked_list_iterator, andlinked_list_reverse_iterator. Moves nodes internally inside a linked list--like acutfollowed by asplice, but cheaper. - Breaking API change:
spliceused to allow you to pass in tail forwhere, andrspliceused to allow you to pass in head forwhere, and honestly its behavior was a little weird when you did. Those values are no longer allowed. The rule is: you can't ever add nodes before head or after tail; sadly, in 0.13,spliceandrsplicegot it wrong. reverse()andsort()now move nodes rather than swapping values; this means iterators continue to point to the same value. (What about special nodes?reversereverses those too, just like data nodes;sortgroups special nodes with their subsequent data node, or tail.)- Fixed a number of iterator, locking, rotation, clearing, and cut/splice edge cases.
- The "head" and "tail" nodes are now instances of special classes that disallow writing to some attributes. This would have caught an obscure regression bug (which is also fixed) and should preclude similar bugs in the future.
- Added
stringgot one new feature and somestrcompatibility improvements:- Added
string.context: a property returning astring_contextobject.str(s.context)produces a "context string", showing the entire lineswas sliced from, and adding a second line below it with a line of carets ("^^^") calling attention tosin context. This can make error messages even nicer! The fullstring_contextobject contains the individual components, as well as the full context string for multi-line strings. (str(s.context)only shows the first line of context for multi-line slices.) - Lots of little bugfixes: reverse-slice edge cases,
join([]), signedzfill(),removesuffix(''),partition('')andrpartition(''), andreplace('', ...). - Added broader
__index__support wherestringmirrorsstrAPIs. - Improved support for stateless subclasses of
string. (If you want to subclassstringand add new attributes, you'll probably have a rough time. File a bug and maybe we can improve the interfaces for you.)
- Added
- Several quality-of-life improvements for the new
Logclass:- The log object no longer logs the start banner or end banner unless some operation actually logs some (formatted) output. If you never log a message, you don't get spurious (and uninteresting) start and end banners.
- Mapping
'enter'or'exit'toNonein theformatsdict you pass in to the constructor will suppress theenterandexitbanners respectively. Log.write('')is ignored; you have to log some text for real to cause the start and end banners to happen.- Note: I have a major, backwards-incompatible rewrite of
Logunder process. TheLoginterface will change some,Destinationwill change completely, andSinkwill change a whole lot too. You're gonna love it! (In the meantime... don't get too comfortable!)
- Added
Formatterto big.template.Formatteris a reusable formatter for multi-line text templates with clever support for repeated / stretched line-fill fields via "starred interpolations". StateManagerfixes in big.state:- If
on_exitraises an exception, the transition is aborted;stateremains unchanged, andnextis reset toNone. StateManagernow handles observers raising an exception. If any observer raises an exception,StateManagerremembers the first exception raised, continues calling the remaining observers, completes the transition, and then re-raises that first exception.- Observer lists are no longer cached internally--they're now snapshotted
at the start of every transition. This fixes an obscure edge case:
if you replaced one observer A with another observer B, and A == B even
though they're different objects, the
StateMachinewouldn't refresh its cache and would continue calling A. - Trimmed no-op
StateMachine.on_enterandStateMachine.on_exitmethods. They were useless in and of themselves, but I put them there on the theory that they'd help with autocomplete for these methods in subclasses when using advanced editors like PyCharm. But that's not a strong enough reason to keep 'em. Sorry, you'll just have to typedef on_enter(self):by hand yourself, like some sort of caveman.
- If
big.textmulti-function fixes and polish:multistrip,multisplit, andmultipartition/multirpartitionnow correctly accept one-shot iterables--like generators--for theirseparatorargument.multistrip: fixedstrip=PROGRESSIVEwhenmaxsplit=None.- Added
__index__support formaxsplitandcountparameters. - Documentation updates, reflecting these functions returning
slices of the original object (rather than guaranteed
strorbytesobjects). This has been true for a while, but the documentation was stale.
- Minor bugfixes in
parse_template_stringin big.template:- Improved error message for an unterminated comment; it now shows where the comment started, not where it ended.
- Now catch tokenization errors and re-raise a nicer exception.
0.13
2026/02/17
It's been more than a year... and I've been busy!
- Added three new modules:
- big.types, which contains core types,
- big.tokens, useful functions and values when working with Python's tokenizer, and
- big.template, functions that parse strings containing a simple template syntax.
- Added
linked_listto new module big.types.linked_listis a thoughtful implementation of a standard linked list data structure, with an API and UX modeled on Python'slistandcollections.dequeobjects. Unlike Python's builtins, you're permitted to add and remove values to alinked_listwhile iterating.linked_listalso supports locking. - Added
stringto new module big.types.stringis a subclass ofstrthat tracks line number and column number offsets for you. Just initialize one bigstringcontaining an entire file, and every substring of that string will know its line number, column number, and offset in characters from the beginning. big.linesand all the "lines modifier" functions are now deprecated;stringreplaces all of it (and it's a massive upgrade!).big.lineswill move to thedeprecatedmodule no sooner than March 2026, and will be removed no sooner than November 2026.- Added
strip_indentsandstrip_line_commentsto big.text. These provide the same functionality as the oldlines_strip_indentandlines_strip_line_commentsline modifier functions, but now operate on iterables of strings instead of "lines" iterators. - Added
Patternto big.text. This is a wrapper aroundre.Patternthat preserves slices of str subclasses. - Added
parse_template_stringandeval_template_stringto new module big.template.parse_template_stringparses a string containing Jinja-like interpolations, and returns an iterator that yields strings andInterpolationobjects. (This is similar to "t-strings" in Python 3.14+.)eval_template_stringcallsparse_template_stringto parse a string, then evaluates the expressions (and filters) usingeval. It returns the resulting string with all substitutions rendered.
- Rewrote
BoundInnerClass, and it's a huge improvement. The rewrite removes some old concerns:- You no longer need the
parent.clshack! (Well, you do if you support Python 3.6, but it's no longer needed in Python 3.7+. Bound inner class adds a new function,bound_inner_base, to help with the transition.) - The bound inner class implementation now relies on comparison by identity instead of by name, which means you may now add aliases and/or rename your inner classes to your heart's content.
- Bound inner classes no longer keep a strong reference to the outer instance; they use weakrefs. This reduces reference cycles, making it easier to reclaim abandoned bound inner class objects, albeit at the cost of adding a weakref "get ref" call every time a bound inner class is instantiated.
- Bound inner classes now have explicit support for slots!
- Bound inner classes now have accurate signatures,
preserving the signature of the original class's
__init__but with theouterparameter removed. BoundInnerClassadds locking, to prevent a race condition when caching the same bound inner class created simultaneously in multiple threads. It's rarely used and should have no real impact on performance.
- You no longer need the
- Added new functions to the big.boundinnerclass module:
unboundreturns the unbound base class ofclsifclsis a bound inner class.is_boundinnerclassreturns true if called on a class decorated with@BoundInnerClass, whether or not it has been bound to an instance.is_unboundinnerclassreturns true if called on a class decorated with@UnboundInnerClass, whether or not it has been bound to an instance.is_boundreturns true if called on a bound inner class that has been bound to an instance.bound_toreturns the instance that cls has been bound to, ifclsis a bound inner class bound to an instance.- [
type_bound_to](#type_bound_tocls)returns the instance thattype(o)has been bound to, iftype(o)is a bound inner class bound to an instance. bound_inner_baseis only needed to use BoundInnerClass with Python 3.6. It's unnecessary in Python 3.7+.
- Added
generate_tokensto new module big.tokens.generate_tokensis a convenience wrapper around Python'stokenize.generate_tokens, which has an abstruse "readline"-based interface.tokens.generate_tokensinstead lets you simply pass in a string object, and returns a generator yielding tokens. It also preserves slices of str subclasses--if the string you pass in is abig.stringobject, thestringvalues it yields will be slices from that originalbig.string! - The big.tokens module also contains definitions
for every token defined by any version
of Python supported by big (3.6+). big's version
always starts with
TOKEN_, e.g.token.COMMAisbig.tokens.TOKEN_COMMA. Tokens not defined in the currently running version of Python have a value ofTOKEN_INVALID, which is -1. - Added
iterator_contextto big.itertools.iterator_contextis like an extended version of Python'senumerate, directly inspired by Jinja's "loop special variables" and Mako's "loop context". It wraps an iterator and provides helpful metadata. - Added
iterator_filterto big.itertools.iterator_filteris a pass-through iterator that filters values. You pass in an iterator, and rules for what values you want to see / don't want to see, and it returns an iterator that only yields the values you want. - Rewrote the entire big.log module. I'd stopped
using the old
Logclass, yet on a couple recent projects I hacked up a quick-and-dirty log... clearly the oldLogwasn't solving my problem anymore. The newLogis designed explicitly for lightweight logging, mostly for debugging. It's simple to use, feature-rich, high-performance, and by default runs in "threaded" mode where logging calls are 5x faster than callingprint!- I added a backwards-compatible
OldLogto big.log in case anybody is using the oldLogclass. This provides the API and functionality of the oldLogclass, but is reimplemented on top of the newLog. Hopefully the way I did it will ease your transition to the obviously-superior newLog. The oldLoghas been relocated to the big.deprecated module. BothOldLogand the oldLogare deprecated, and will be removed someday, no earlier than March 2027.
- I added a backwards-compatible
- Added
ModuleManagerto big.builtin.ModuleManagerhelps you manage a module's namespace, making it easy to populate__all__and clean up temporary symbols. - Added
ClassRegistryto big.builtin.ClassRegistryhelps you use inheritance with heavily nested class hierarchies, by giving you a place to store references to base classes you can access later. Very useful withBoundInnerClass! - The string returned by
big.time.timestamp_humannow includes the timezone, using the local timezone by default. If you want to override that and use a specific timezone, you can pass in adatetime.timezoneobject via the newtzinfokeyword-only parameter. - Added support for Python 3.14, mainly to support t-strings:
python_delimitersnow recognizes all the new string prefixes containingt(orT).- big.tokens supports the new tokens associated with t-strings, although that's a new module anyway.
- Sped up
test/test_text.py. The tests confirm that big's list of whitespace characters is accurate. It used to test if a particular charactercwas whitespace by usinglen(f'a{c}b'.split()) == 2. D'oh! It's obviously much faster to simply ask it withc.isspace(). The resulting loop runs 3x faster... saving a whole 0.1 seconds on my workstation! Modifying the equivalent code for bytes instead of Unicode objects is also faster, but that optimization only saved 0.0000014 seconds. Hat tip to Eric V. Smith for his suggestions on how to make Big's test suite so much faster! split_quoted_stringsin big.text now obeys subclasses of str better. (It now works well withbig.stringfor example.)- Removed a bunch of old deprecated stuff:
- Old names for sets of characters:
whitespace_without_dosascii_whitespace_without_dosnewlinesnewlines_without_dosascii_newlinesascii_newlines_without_dosutf8_whitespaceutf8_whitespace_without_dosutf8_newlinesutf8_newlines_without_dos
- Old functions / classes / aliases:
split_quoted_stringslines_strip_commentsparse_delimitersand its associated stuff:Delimiter(a class)delimiter_parenthesesdelimiter_square_bracketsdelimiter_curly_bracesdelimiter_angle_bracketsdelimiter_single_quotedelimiter_double_quotesparse_delimiters_default_delimitersparse_delimiters_default_delimiters_bytes
- The old alias
lines_filter_comment_lines
- Old names for sets of characters:
- Updated copyright notices to 2026.
0.12.8
2025/01/06
- Added
search_pathto the big.file module.search_pathimplements "search path" functionality; given a list of directories, a filename, and optionally a list of file extensions to try, returns the first existing file that matches. multisplitandsplit_delimitersnow properly support subclasses ofstr. All strings yielded by these functions are now guaranteed to be slices of the originalsparameter passed in, or otherwise produced by making method calls on the originalsparameter that return strings.
0.12.7
2024/12/15
A teeny tiny new feature.
LineInfonow supports acopymethod, which returns a copy of theLineInfoobject in its current state.
0.12.6
2024/12/13
It's a big release tradition! Here's another small big release, less than a day after the last big big release.
- New feature:
decode_python_scriptnow supports "universal newlines". It accepts a newnewlineparameter which behaves identically to thenewlineparameter for Python's built-inopenfunction. - Bugfix: The universal newlines support for
read_python_filewas broken in 0.12.5; thenewlineparameter was simply ignored. It now works great--it passesnewlinetodecode_python_script. (Sorry I missed this; I use Linux and don't need to convert newlines.) - Added Python 3.13 to the list of supported releases. It was already supported and tested, it just wasn't listed in the project metadata.
Note: Whoops! Forgot to ever release 0.12.6 as a package. Oh well.
0.12.5
2024/12/13
-
-
Added
decode_python_scriptto the big.text module.decode_python_scriptscans a binary Python script and decodes it to Unicode--correctly. Python scripts can specify an explicit encoding in two diferent ways: a Unicode "byte order mark", or a PEP 263 "source file encoding" line.decode_python_scripthandles either, both, or neither. -
Added
read_python_fileto the big.file module.read_python_filereads a binary Python file from the filesystem and decodes it usingdecode_python_script. -
Added
python_delimitersto the big.text module. This is a new predefined set of delimiters for use withsplit_delimeters, enabling it to correctly process Python scripts.python_delimitersdefines all delimiters defined by Python, including all 100 possible string delimiters (no kidding!). If you want to parse the delimiters of Python code, and you don't want to use the Python tokenizer, you should usepython_delimiterswithsplit_delimiters.Note that defining
python_delimiterscorrectly was difficult, and big'sDelimitersAPI isn't expressive enough to express all of Python's semantics. At this point thepython_delimitersobject doesn't itself actually define all its semantics; rather, at module load time it's compiled into a special internal runtime format which is cached, and then there's manually-written code that tweaks this compiled form sopython_delimiterscan correctly handle Python's special cases. So, you're encouraged to usepython_delimiters, but if you modify it and use the modified version, the modified version won't inherit all those tweaks, and will lose the ability to handle many of Python's weirder semantics.Important note: When you use
python_delimiters, you must include the linebreak characters in the lines you split usingsplit_delimiters. This is necessary to support the comment delimiter correctly, and to enforce the no-linebreaks-inside-single-quoted-strings rule.There can be small differences in Python's syntax from one version to another.
python_delimitersis therefore version-sensitive, using the semantics appropriate for the version of Python it's being run under. If you want to parse Python delimiters using the semantics of another version of the language, use insteadpython_delimiters_version[s]wheresis a string containing the dotted Python major and minor version you want to use, for examplepython_delimiters_version["3.10"]to use Python 3.10 semantics. (At the moment there are no differences between versions; this is planned for future versions of big.) -
Added
python_delimiters_versionto the big.text module. This maps simple Python version strings ("3.6","3.13") topython_delimitersvalues implementing the semantics for that version. Currently all the values of this dict are identical, but that should change in the future. -
A breaking API change to
split_delimitersis coming.split_delimitersnow yields an object that can yield either three or four values. Previous to 0.12.5, thesplit_delimitersiterator always yielded a tuple of three values, calledtext,open, andclose. Butpython_delimitersrequired adding a fourth value,change.When
changeis true, we are changing from one delimiter to another, without entering a new nested delimiter. The canonical example of this is inside a Python f-string:`f"{abc:35}"`Here the colon (
:) is a "change" delimiter. Inside the curly braces inside the f-string, before the colon, the hash character (#) acts as a line comment character. But after the colon it's just another character. We've changed semantics, but we haven't pushed a new delimiter pair. The only way to accurately convey this behavior was to add this newchangefield to the values yielded bysplit_delimiters.The goal is to eventually transition to
split_delimitersyielding all four of these values (text,open,close, andchange). But this will be a gradual process; as of 0.12.5, existingsplit_delimiterscalls will continue to work unchanged.split_delimitersnow yields a custom object, calledSplitDelimitersValue. This object is configurable to yield either three or four values. The rules are:- If you pass in
yields=4tosplit_delimiters, the object it yields will yield four values. - If you pass in
delimiters=python_delimiterstosplit_delimiters, the object it yields will yield four values. (python_delimitersis new, so any calls using it must be new code, therefore this change won't break existing calls.) - Otherwise, the object yielded by
split_delimiterswill yield three values, as it did in versions prior to 0.12.5.
split_delimiterswill eventually change to always yielding four values, but big won't publish this change until at least June 2025. Six months after that change--at least December 2025--big will remove theyieldsparameter tosplit_delimiters. - If you pass in
-
Minor semantic improvement:
PushbackIteratorno longer evaluates the iterator you pass in in a boolean context. (All we really needed to do was compare it toNone, so now that's all we do.) -
A minor change to the
Delimiterobject used withsplit_delimiters: previously, thequotingandescapevalues had to agree, either both being true or both being false. However,python_delimitersnecessitated relaxing this restriction, as there are some delimiters (!inside curly braces in an f-string,:inside curly braces in an f-string) that are "quoting" but don't have an escape string. So now, the restriction is simply that ifescapeis true,quotingmust also be true.
-
0.12.4
2024/11/15
-
- New function in the
textmodule:format_map. This works like Python'sstr.format_mapmethod, except it allows nested curly-braces. Example:big.format_map("The {extension} file is {{extension} size} bytes.", {'extension': 'mp3', 'mp3 size': 8555}) - New method:
Version.formatis likestrftimebut forVersionobjects. You pass in a format string withVersionattributes in curly braces and it formats the string with values from thatVersionobject. - The
Versionconstructor now accepts apackaging.Versionobject as an initializer. Embrace and extend! linesnow takes two new arguments:clip_linebreaks, default is true. If true, it clips the linebreaks off the lines before yielding them, otherwise it doesn't. (Either way, the linebreaks are still stored ininfo.end.)source, default is an empty string.sourceshould represent the source of the line in a meaninful way to the user. It's stored in theLinesInfoobjects yielded bylines, and should be incorporated into error messages.
LineInfo.clip_leadingandLineInfo.clip_trailingnow automatically detect if you've clipped the entire line, and if so move all clipped text toinfo.trailing(and adjust thecolumn_numberaccordingly).LineInfo.clip_leadingandLineInfo.clip_trailing: Minor performance upgrade. Previously, if the user passed in the string to clip, the two functions would throw it away then recreate it. Now they just use the passed-in string.- Changed the word "newline" to "linebreak" everywhere. They mean the same thing, but the Unicode standard consistently uses the word "linebreak"; I assume the boffins on the committee thought about this a lot and argued and finally settled on this word for good (if unpublished?) reasons.
- Add explicit support (and CI coverage & testing) for Python 3.13. (big didn't need any changes, it was already 100% compatible with 3.13.)
p.s. 56
- New function in the
0.12.3
2024/09/17
-
Optimized
split_delimiters. The new version uses a much more efficient internal representation of how to react to the various delimiters when processing the text. Perfunctorytimeitexperiments suggest this newsplit_delimitersis maybe 5-6% faster than it was in 12.2.Minor breaking change:
split_delimitersnow consistently raisesSyntaxErrorfor mismatched delimiters. (Previously it would sometimes raiseValueError.)
0.12.2
2024/09/11
-
-
A minor semantic change to
lines_strip_indent: when it encounters a whitespace-only line, it clips the line to trailing in theLineInfoobject. It used to clip such lines to leading. But this changedLineInfo.column_numberin a nonsensical way.This behavior is policy going forward: if a lines modifer function ever clips the entire line, it must clip it to trailing rather than leading. It shouldn't matter one way or another, as whitespace-only lines arguably shouldn't have any explicit semantics. But it makes intuitive sense to me that their empty line should be at column number 1, rather than 9 or 13 or whatnot. (Especially considering that with
lines_strip_indenttheir indent value is synthetic anyway, inferred by looking ahead.) -
Major cleanup to the lines modifier test suites.
-
0.12.1
2024/09/07
-
In fine big tradition, here's an update published immediately after a big release.
Surprisingly, even though this is only a small update, it still adds two new packages to big: metadata and version.
There's sadly one breaking change.
big.metadata-
New package. A package containing metadata about big itself. Currently only contains one thing: version.
big.version-
New package. A package for working with version information.
lines_strip_line_comments-
This API has breaking changes.
The default value for
quoteshas changed. Now it's what it should always have been: empty. No quote marks are defined by default, which means the default behavior oflines_strip_line_commentsis now to simply truncate the line at the leftmost comment marker.Processing quote marks by default was always too opinionated for this function. Consider: having
'active as a quote marker meant that single-quotes need to be balanced,which means you can't process a line like this that only has one.
Wish I'd figured this out before the release yesterday! Hopefully this will only cause smiles, and no teeth-gnashing.
metadata.version-
New value. A
Versionobject representing the current version of big.
Version-
New class.
Versionrepresents a version number. You can construct them from PEP 440-compliant version strings, or specify them using keyword-only parameters.Versionobjects are immutable, ordered, and hashable.
0.12
-
2024/09/06
Lots of changes this time! Most of 'em are in the
big.textmodule, particularly thelinesand lines modifier functions. But plenty of other modules got in on the fun too.big even has a new module:
deprecated. Deprecated functions and classes get moved into this module. Note that the contents ofdeprecatedare not automatically imported intobig.all.The following functions and classes have breaking changes:
These functions have been renamed:
-
lines_filter_comment_linesis nowlines_filter_line_comment_lineslines_strip_commentsis nowlines_strip_line_commentsparse_delimitersis nowsplit_delimiters
big has five new functions:
-
Finally, here's an in-depth description of all changes in big 0.12, sorted by API name.
bytes_linebreaksandbytes_linebreaks_without_crlf-
Extremely minor change! Python's
bytesandstrobjects don't agree on which ASCII characters represent line breaks. Thestrobject obeys the Unicode standard, which means there are four:\n \v \f \rFor some reason, Python's
bytesobject only supports two:\n \rI have no idea why this is. We might fix it. And if we do, big is ready. It now calculates
bytes_linebreaksandbytes_linebreaks_without_crlfon the fly to agree with Python. If either (or both) work as newline characters for thesplitlinesmethod on abytesobject, they'll automatically be inserted into these iterables of bytes linebreaks.
combine_splits-
New function. If you split a string two different ways, producing two arrays that sum to the original string,
combine_splitswill merge those splits together, producing a new array that splits in every place any of the two split arrays had a split.Example:
>>> big.combine_splits("abcdefg", ['a', 'bcdef', 'g'], ['abc', 'd', 'efg']) ['a, 'bc', 'd', 'ef', 'g']
Delimiter-
This API has breaking changes.
Delimiteris a simple data class, representing information about delimiters tosplit_delimiters(previouslyparse_delimiters).split_delimitershas changed, and some of those changes are reflected in theDelimiterobject; also, some changes toDelimiterare simply better API choices.The old
Delimiterobject is deprecated but still available, asbig.deprecated.Delimiter. It should only be used withbig.deprecated.parse_delimiters, which is also deprecated.big.deprecated.Delimiterwill be removed whenbig.deprecated.parse_delimitersis removed, which will be no sooner than September 2025.Changes:
- The first argument to the old
Delimiterobject wasopen, and was stored as theopenattribute. These have both been completely removed. Now, the "open delimiter" is specified as a key in a dictionary of delimiters, mapping open delimiters toDelimiterobjects. - The old
Delimiterobject had a booleanbackslashattribute; if it was True, that delimiter allows escaping using a backslash. NowDelimiterhas anescapeparameter and attribute, specifying the escape string you want to use inside that set of delimiters. Delimiteralso now has two new attributes,quotingandmultiline. These default toFalseandTruerespectively; you can specify values for these with keyword-only arguments to the constructor.- The new
Delimiterobject is read-only after construction, and is hashable.
- The first argument to the old
encode_strings-
Slightly liberalized the types it accepts. It previously required
oto be a collection; nowocan be abytesorstrobject. Also, it now explicitly supportsset.
get_int_or_float-
Minor behavior change. If the
oyou pass in is afloat, or can be converted tofloat(but couldn't be converted directly to anint),get_int_or_floatwill experimentally convert thatfloatto anint. If the resultingintcompares equal to thatfloat, it'll return theint, otherwise it'll return thefloat.For example,
get_int_or_float("13.5")still returns13.5(afloat), butget_int_or_float("13.0")now returns13(anint). (Previously,get_int_or_float("13.0")would have returned13.0.)This better represents the stated aesthetic of the function--it prefers ints to floats. And since the int is exactly equal to the float, I assert this is completely backwards compatible.
Heap-
Minor updates to the documentation and to the text of some exceptions.
LineInfo-
This API has breaking changes.
Breaking change: the
LineInfoconstructor has a newlinespositional parameter, added in front of the existing positional parameters. This new first argument should be thelinesiterator that yielded thisLineInfoobject. It's stored in thelinesattribute. (Why this change? Thelinesobject contains information needed by the lines modifiers, for exampletab_width.)Minor optimization:
LineInfoobjects previously had many optional fields, which might or might not be added dynamically. Now all fields are pre-added. (This makes the CPython 3.13 runtime happier; it really wants you to set all your class's attributes in its__init__.)Minor breaking change: the original string stored in the
lineattribute now includes the linebreak character, if any. This means concatenating all theinfo.linestrings will reconstruct the originalspassed in tolines.New feature: while some methods used to update the
leadingattribute when they clipped leading text from the line, the "lines modifiers" are now very consistent about updatingleading, and the new symmetrical attributetrailing.New feature:
LineInfonow has anendattribute, which contains the end-of-line character that ended this line.These three attributes allow us to assert a new invariant: as long as you modify the contents of
line(e.g. turning tabs into spaces),info.leading + line + info.trailing + info.end == info.lineLineInfoobjects now always have these attributes:lines, which contains the base lines iterator.line, which contains the original unmodified line.line_number, which contains the line number of this line.column_number, which contains the starting column number of the first character of this line.indent, which contains the indent level of the line if computed, andNoneotherwise.leading, which contains the string stripped from the beginning of the line. Initially this is the empty string.trailing, which contains the string stripped from the end of the line. Initially this is the empty string.end, which is the end-of-line character that ended the current line. For the last line yielded,info.endwill always be the empty string. If the last character of the text split bylineswas an end-of-line character, the lastlineyielded will be the empty string, andinfo.endwill also be the empty string.match, which contains aMatchobject if this line was matched with a regular expression, andNoneotherwise.
LineInfo.clip_leadingandLineInfo.clip_trailingLineInfoalso has two new methods:LineInfo.clip_leadingandLineInfo.clip_trailing(line, s). These methods clip a leading or trailing substring from the currentline, and transfer it to the relevant field inLineInfo(eitherleadingortrailing).clip_leadingalso updates thecolumn_numberattribute.The name "clip" was chosen deliberately to be distinct from "strip". "strip" functions on strings remove substrings and throws them away; my "clip" functions on strings removes substrings and puts them somewhere else.
lines_filter_comment_lines-
lines_filter_comment_lineshas been renamed tolines_filter_line_comment_lines. For backwards compatibility, the function is also available under the old name; this old name will eventually be removed, but not before September 2025.
lines_filter_line_comment_lines-
This API has breaking changes.
New name for
lines_filter_comment_lines.Correctness improvements:
lines_filter_line_comment_linesnow enforces that single-quoted strings can't span lines, and multi-quoted strings must be closed before the end of the last line.Minor optimization: for every line, it used to
lstripa copy of the line, then use a regular expression to see if the line started with one of the comment characters. Now the regular expression itself skips past any leading whitespace.
lines_grep-
New feature:
lines_grephas always usedre.searchto examine the lines yielded. It now writes the result toinfo.match. (If you pass ininvert=Truetolines_grep,lines_grepstill writes to thematchattribute--but it always writesNone.)If you want to write the
re.Matchobject to another attribute, pass in the name of that attribute to the keyword-only parametermatch.
lines_rstripandlines_strip-
New feature:
lines_rstripandlines_stripnow both accept aseparatorsargument; this is an iterable of separators, like the argument tomultisplit. The default value ofNonepreserves the previous behavior, stripping whitespace.
lines_sort-
New feature:
lines_sortnow accepts akeyparameter, which is used as thekeyargument forlist.sort. The value passed in tokeyis the(info, line)tuple yielded by the upstream iterator. The default value preserves the previous behavior, sorting by theline(ignoring theinfo).
lines_strip_comments-
This function has been renamed
lines_strip_line_commentsand rewritten, see below. The old deprecated version will be available atbig.deprecated.lines_strip_commentsuntil at least September 2025.Note that the old version of
line_strip_commentsstill uses the current version ofLineInfo, so use of this deprecated function is still exposed to those breaking changes. (For example,LineInfo.linenow includes the linebreak character that terminated the current line, if any.)
lines_strip_indent-
Bugfix:
lines_strip_indentpreviously required whitespace-only lines to obey the indenting rules, which was a mistake. My intention was always forlines_strip_indentto behave like Python, and that includes not really caring about the intra-line-whitespace for whitespace-only lines. Nowlines_strip_indentbehaves more like Python: a whitespace-only line behaves as if it has the same indent as the previous line. (Not that the indent value of an empty line should matter--but this behavior is how you'd intuitively expect it to work.)
lines_strip_line_comments-
This API has breaking changes.
lines_strip_line_commentsis the new name for the oldlines_strip_commentslines modifier function. It's also been completely rewritten.Changes:
- The old function required quote marks and the escape string to be single characters. The new function allows quote marks and the escape string to be of any length.
- The old function had a slightly-smelly
triple_quotesparameter to support multiline strings. The new version supports separate parameters for single-line quote marks (quotes) and multiline quote marks (multiline_quotes). - The
backslashparameter has been renamed toescape. - The
rstripparameter has been removed. If you need to rstrip the line after stripping the comment, wrap yourlines_strip_line_commentscall with alines_rstripcall. - The old function didn't enforce that strings shouldn't
span lines--single-quoted and triple-quoted strings behaved
identically. The new version raises
SyntaxErrorif quoted strings using non-multiline quote marks contain newlines.
(
lines_strip_line_commentshas always been implemented usingsplit_quoted_strings; this is why it now supports multicharacter quote marks and escape strings. It also benefits from the new optimizations insplit_quoted_strings.)
multisplit-
Minor optimizations.
multisplitused to locally define a new generator function, then call it and return the generator. I promoted the generator function to module level, which means we no longer rebind it each timemultisplitis called. As a very rough guess, this can be as much as a 10% speedup formultisplitrun on very short workloads. (It's also never slower.)I also applied this same small optimization to several other functions in the
textmodule. In particular,merge_columnswas binding functions inside a loop (!!). (Dumb, huh!) These local functions are still bound insidemerge_columns, but now at least they're outside the loop.Another minor speedup for
multisplit: whenreverse=True, it used to reverse the results three times!multisplitnow explicitly observes and manages the reversed state of the result to avoid needless reversing.
parse_delimiters-
This function has been renamed
split_delimitersand rewritten, see below. The old version is still available, using the namebig.deprecated.parse_delimitersmodule, and will be available until at least September 2025.
Scheduler-
Code cleanups both in the implementation and the test suite, including one minor semantic change.
Cleaned up
Scheduler._next, the internal method call that implements the heart of the scheduler. The only externally visible change: the previous version would callsleep(0)every time it yielded an event. On modern operating systems this should yields the rest of the current thread's current time slice back to the OS's scheduler. This can make multitasking smoother, particularly in Python programs. But this is too opinionated for library code--if you want asleep(0)there, by golly, you can call that yourself when theSchedulerobject yields to you. I've restructured the code and eliminated this extraneoussleep(0).Also, rewrote big chunks of the test suite (
tests/test_scheduler.py). The multithreaded tests are now much better synchronized, while also becoming easier to read. Although it seems intractable to purge all race conditions from the test suite, this change has removed most of them.
split_delimiters-
This API has breaking changes.
split_delimitersis the new name for the oldparse_delimitersfunction. The function has also been completely re-tooled and re-written.Changes:
parse_delimiterstook an iterable ofDelimitersobjects, or strings of length 2.split_delimiterstakes a dictionary mapping open delimiter strings toDelimiterobjects, andDelimiterobjects no longer have an "open" attribute.split_delimitersnow accepts anstateparameter, which specifies the initial state of nested delimiters.split_delimitersno longer cares if there were unclosed open delimiters at the end of the string. (It used to raiseValueError.) This includes quote marks; if you don't want quoted strings to span multiple lines, it's up to you to detect it and react (e.g. raise an exception).- The internal implementation has changed completely.
parse_delimitersmanually parsed the input string character by character.split_delimitersusesmultisplit, so it zips past the uninteresting characters and only examines the delimiters and escape characters. It's always faster, except for some trivial calls (which are fast enough anyway). - Another benefit of using
multisplit: open delimiters, close delimiters, and the escape string may now all be any nonzero length. (In the face of ambiguity,split_delimiterswill always choose the longer delimiter.)
See also changes to
Delimiter.
split_quoted_strings-
This API has breaking changes.
split_quoted_stringshas been completely re-tooled and re-written. The new API is simpler, easier to understand, and conceptually clarified. It's a major upgrade!Changes:
- The value it yields is different:
- The old version yielded
(is_quote, segment), whereis_quotewas a boolean value indicating whether or notsegmentwas quoted. Ifsegmentwas quoted, it began and ended with (single character) quote marks. To reassemble the original string, join together all thesegmentstrings in order. - The new version yields
(leading_quote, segment, trailing_quote), whereleading_quoteandtrailing_quoteare either matching quote marks or empty. If they're true values, thesegmentstring is inside the quotes. To reassemble the original string, join together all the yielded strings in order.
- The old version yielded
- The
backslashparameter has been replaced by a new parameter,escape.escapeallows specifying the escape string, which defaults to '\' (backslash). If you specify a false value, there will be no escape character in strings. - By default
quotesonly contains'(single-quote) and"(double-quote). The previous version also recognized"""and'''as multiline quote marks by default; this is no longer true, as it's too opinionated and Python-specific. - The old version didn't actually distinguish between
single-quoted strings and triple-quoted strings. It
simply didn't care whether or not there were newlines
inside quoted strings. The new version raises a
SyntaxErrorif there's a newline character inside a string delimited with a quote marker fromquotes. - The old version accepted a stinky
triple_quotesparameter. That's been removed in favor of a new parameter,multiline_quotes.multiline_quotesis likequotes, except that newline characters are allowed inside their quoted strings. split_quoted_stringaccepts another new parameter,state, which sets the initial state of quoting.- Thd old implementation of
split_quoted_stringused a hand-coded parser, manually analyzing each character in the input text. Now it usesmultisplit, so it only bothers to examine the interesting substrings.multisplithas a large startup cost the first time you use a particular set of iterators, but this information is cached for subsequent calls. Bottom line, the new version is much faster for larger workloads. (It can be slower for trivial examples... where speed doesn't matter anyway.) - Another benefit of switching to
multisplit:quotesnow supports quote delimiters and an escape string of any nonzero length. In the case of ambiguity--if more than one quote delimiter matches at a time--split_quoted_stringwill always choose the longer delimiter.
- The value it yields is different:
split_title_case-
New function.
split_title_casesplits a string at word boundaries, assuming the string is in "TitleCase".
StateManager-
Small performance upgrade for
StateManager. observers.StateManageralways uses a copy of the observer list (specifically, a tuple) when calling the observers; this means it's safe to modify the observer list at any time.StateManagerused to always make a fresh copy every time you called an event; now it uses a cached copy, and only recomputes the tuple when the observer list changes.(Note that it's not thread-safe to modify the observer list from one thread while also dispatching events in another. Your program won't crash, but the list of observers called may be unpredictable based on which thread wins or loses the race. But this has always been true. As with many libraries, the
StateManagerAPI leaves locking up to you.)
p.s. I'm getting close to declaring big as being version 1.0. I don't want to do it until I'm done revising the APIs.
p.p.s. Updated copyright notices to 2024.
p.p.p.s. Yet again I thank Eric V. Smith for his willingness to humor me in my how-many-parameters-could-dance-on-the-head-of-a-pin API theological discussions.
-
0.11
-
released 2023/09/19
-
Breaking change: renamed almost all the old
whitespaceandnewlinestuples. Worse yet, one symbol has the same name but a different value:ascii_whitespace! I've also changed the suffix_without_dosto the more accurate and intuitive_without_crlf, and similarly changednewlinestolinebreaks. Sorry for all the confusion. This resulted from a lot of research into whitespace and newline characters, in Python, Unicode, and ASCII; please see the new tutorial Whitespace and line-breaking characters in Python and big to see what all the fuss is about. Here's a summary of all the changes to the whitespace tuples:RENAMED TUPLES (old name -> new name) ascii_newlines -> bytes_linebreaks ascii_whitespace -> bytes_whitespace newlines -> linebreaks ascii_newlines_without_dos -> bytes_linebreaks_without_crlf ascii_whitespace_without_dos -> bytes_whitespace_without_crlf newlines_without_dos -> linebreaks_without_crlf whitespace_without_dos -> whitespace_without_crlf REMOVED TUPLES utf8_newlines utf8_whitespace utf8_newlines_without_dos utf8_whitespace_without_dos UNCHANGED TUPLES (same name, same meaning) whitespace NEW TUPLES ascii_linebreaks ascii_whitespace str_linebreaks str_whitespace unicode_linebreaks unicode_whitespace ascii_linebreaks_without_crlf ascii_whitespace_without_crlf str_linebreaks_without_crlf str_whitespace_without_crlf unicode_linebreaks_without_crlf unicode_whitespace_without_crlf -
Changed
split_text_with_codeimplementation to useStateManager. (No API or semantic changes, just an change to the internal implementation.) -
New function in the
big.textmodule:encode_strings, which takes a container object containingstrobjects and returns an equivalent object containing encoded versions of those strings asbytes. -
When you call
multisplitwith a type mismatch between 's' and 'separators', the exception it raises now includes the values of 's' and 'separators'. -
Added more tests for
big.stateto exercise all the string arguments ofaccessoranddispatch. -
The exhaustive
multisplittester now lets you specify test cases as cohesive strings, rather than forcing you to split the string manually. -
The exhaustive
multisplittester is better at internally verifying that it's doing the right thing. (There are some internal sanity checks, and those are more accurate now.) -
Whoops! The name of the main class in
big.stateisStateManager. I accidentally wroteStateMachineinstead in the docs... several times. -
Originally the
multisplitparameter 'separators' was required. I changed it to optional a while ago, with a default ofNone. (If you pass inNoneit usesbig.str_whitespaceorbig.bytes_whitespace, depending on the type ofs.) But the documentation didn't reflect this change until... now. -
Improved the prose in The
multi-family of string functions tutorial. Hopefully now it does a better job of sellingmultisplitto the reader. -
The usual smattering of small doc fixes and improvements.
My thanks again to Eric V. Smith for his willingness to consider and discuss these issues. Eric is now officially a contributor to big, increasing the project's bus factor to two. Thanks, Eric!
0.10
-
released 2023/09/04
- Added the new
big.statemodule, with its excitingStateManagerclass! int_to_wordsnow supports the newordinalkeyword-only parameter, to produce ordinal strings instead of cardinal strings. (The number 1 as a cardinal string is'one', but as an ordinal string is'first').- Added the
pure_virtualdecorator tobig.builtin. - The documentation is now much prettier! I finally discovered a syntax
I can use to achieve a proper indent in Markdown, supported by both
GitHub and PyPI. You simply nest the text you want indented inside
an HTML description list as the description text, and skip the
description item (
<dl><dd>). Note that you need a blank line after the<dl><dd>line, or else Markdown will ignore the markup in the following paragraph. Thanks to Hugo van Kemenade for his help confirming this! Oh, and, Hugo also fixed the image markup so the big banner displays properly on PyPI. Thanks, Hugo!
- Added the new
0.9.2
-
released 2023/07/22
Extremely minor release. No new features or bug fixes.
- Fixed coverage, now back to the usual 100%. (This just required changing the tests, which didn't find any new bugs.)
- Made the tests for
Logdeterministic. They now use a fake clock that always returns the same values. - Added GitHub Actions integration. Tests and coverage are run in the cloud after every checkin. Thanks to Dan Pope for gently walking me through this!
- Fixed metadata in the
pyproject.tomlfile. - Added badges for testing, coverage, and supported Python versions.
0.9.1
-
released 2023/06/28
0.9
-
released 2023/06/15
-
Bugfix! If an outer class
Outerhad an inner classInnerdecorated with@BoundInnerClass, andois an instance ofOuter, andoevaluated to false in a boolean context,o.Innerwould be the unbound version ofInner. Now it's the bound version, as is proper. -
Modified
tests/test_boundinnerclasses.py:- Added regression test for the above bugfix (of course!).
- It now takes advantage of that newfangled "zero-argument
super". - Added testing of an unbound subclass of an unbound subclass.
-
0.8.3
-
released 2023/06/11
- Added
int_to_words. - All tests now insert the local big directory
onto
sys.path, so you can run the tests on your local copy without having to install. Especially convenient for testing with old versions of Python!
Note: tomorrow, big will be one year old!
- Added
0.8.2
-
released 2023/05/19
- Convert all iterator functions to use my new approach: instead of checking arguments inside the iterator, the function you call checks arguments, then has a nested iterator function which it runs and returns the result. This means bad inputs raise their exceptions at the call site where the iterator is constructed, rather than when the first value is yielded by the iterator!
0.8.1
-
released 2023/05/19
- Added
parse_delimiters(ed: nowsplit_delimiters) andDelimiter.
- Added
0.8
-
released 2023/05/18
- Major retooling of
strandbytessupport inbig.text.- Functions in
big.textnow uniformly acceptstrorbytesor a subclass of either. See the Support for bytes and str section for how it works. - Functions in
big.textare now more consistent about raisingTypeErrorvsValueError. If you mixbytesandstrobjects together in one call, you'll get aTypeError, but if you pass in an empty iterable (of a correct type) where a non-empty iterable is required you'll get aValueError.big.textgenerally tries to give theTypeErrorhigher priority; if you pass in a value that fails both the type check and the value check, thebig.textfunction will raiseTypeErrorfirst.
- Functions in
- Major rewrite of
re_rpartition. I realized it had the same "reverse mode" problem that I fixed inmultisplitback in version 0.6.10: the regular expression should really search the string in "reverse mode", from right to left. The difference is whether the regular expression potentially matches against overlapping strings. When in forwards mode, the regular expression should prefer the leftmost overlapping match, but in reverse mode it should prefer the rightmost overlapping match. Most of the time this produces the same list of matches as you'd find searching the string forwards--but sometimes the matches come out very different. This was way harder to fix withre_rpartitionthan withmultisplit, because Python'sremodule only supports searching forwards. I have to emulate reverse-mode searching by manually checking for overlapping matches and figuring out which one(s) to keep--a lot of work! Fortunately it's only a minor speed hit if you don't have overlapping matches. (And if you do have overlapping matches, you're probably just happyre_rpartitionnow produces correct results--though I did my best to make it performant anyway.) In the future, big will probably add support for the PyPI packageregex, which reimplements Python'sremodule but adds many features... including reverse mode! - New function:
reversed_re_finditer. Behaves almost identically to the Python standard library functionre.finditer, yielding non-overlapping matches ofpatterninstring. The difference is,reversed_re_finditersearchesstringfrom right to left. (Written as part of there_rpartitionrewrite mentioned above.) - Added
apostrophes,double_quotes,ascii_apostrophes,ascii_double_quotes,utf8_apostrophes, andutf8_double_quotesto thebig.textmodule. Previously the first four of these were hard-coded strings insidegently_title. (And the last two didn't exist!) - Code cleanup in
split_text_with_code, removed redundant code. I think it has about the same number ofifstatements; if anything it might be slightly faster. - Retooled
re_partitionandre_rpartitionslightly, should now be very-slightly faster. (Well,re_rpartitionwill be slower if your pattern finds overlapping matches. But at least now it's correct!) - Lots and lots of doc improvements, as usual.
- Major retooling of
0.7.1
-
released 2023/03/13
- Tweaked the implementation of
multisplit. Internally, it does the string splitting usingre.split, which returns alist. It used to iterate over the list and yield each element. But that meant keeping the entire list around in memory untilmultisplitexited. Now,multisplitreverses the list, pops off the final element, and yields that. This meansmultisplitdrops all references to the split strings as it iterates over the string, which may help in low-memory situations. - Minor doc fixes.
- Tweaked the implementation of
0.7
-
released 2023/03/11
- Breaking changes to the
Scheduler:- It's no longer thread-safe by default, which means it's much faster for non-threaded workloads.
- The lock has been moved out of the
Schedulerobject and into theRegulator. Among other things, this means that theSchedulerconstructor no longer takes alockargument. Regulatoris now an abstract base class.big.scheduleralso provides two concrete implementations:SingleThreadedRegulatorandThreadSafeRegulator.RegulatorandEventare now defined in thebig.schedulernamespace. They were previously defined inside theSchedulerclass.- The arguments to the
Eventconstructor were rearranged. (You shouldn't care, as you shouldn't be manually constructingEventobjects anyway.) - The
Schedulernow guarantees that it will only callnowandwakeon aRegulatorobject while holding thatRegulator's lock.
- Minor doc fixes.
- Breaking changes to the
0.6.18
-
released 2023/03/09
- Retooled
multisplitandmultistripargument verification code. Both functions now consistently check all their inputs, and use consistent error messages when raising an exception.
- Retooled
0.6.17
-
released 2023/03/09
- Fixed a minor crashing bug in
multisplit: if you passed in a list of separators (orseparatorswas of any non-hashable type), andreversewas true,multisplitwould crash. It usedseparatorsas a key into a dict, which meantseparatorshad to be hashable. multisplitnow verifies that thespassed in is eitherstrorbytes.- Updated all copyright date notices to 2023.
- Lots of doc fixes.
- Fixed a minor crashing bug in
0.6.16
-
released 2023/02/26
- Fixed Python 3.6 support! Some equals-signs-in-f-strings and some other anachronisms had crept in. 0.6.16 has been tested on all versions from 3.6 to 3.11 (as well as having 100% coverage).
- Made the
dateutilspackage an optional dependency. Only one function needs it,parse_timestamp_3339Z(). - Minor cleanup in
PushbackIterator(). It also uses slots now, which should make it a bit faster.
0.6.15
-
released 2023/01/07
- Added the new functions
datetime_ensure_timezone(d, timezone)anddatetime_set_timezone(d, timezone). These allow you to ensure or explicitly set a timezone on adatetime.datetimeobject. - Added the
timezoneargument toparse_timestamp_3339Z(). gently_title()now capitalizes the first letter after a left parenthesis.- Changed the secret
multirpartitionfunction slightly. Itsreverseparameter now means to un-reverse its reversing behavior. Stated another way,multipartition(reverse=X)andmultirpartition(reverse=not X)now do the same thing.
- Added the new functions
0.6.14
-
released 2022/12/11
- Improved the text of the
RuntimeErrorraised byTopologicalSorter.Viewwhen the view is incoherent. Now it tells you exactly what nodes are conflicting. - Expanded the tutorial on
multisplit.
- Improved the text of the
0.6.13
-
released 2022/12/11
- Changed
translate_filename_to_exfat(s)behavior: when modifying a string with a colon (':') not followed by a space, it used to convert it to a dash ('-'). Now it converts the colon to a period ('.'), which looks a little more natural. A colon followed by a space is still converted to a dash followed by a space.
- Changed
0.6.12
-
tagged 2022/12/04
- Bugfix: When calling
TopologicalSorter.print(), it sorts the list of nodes, for consistency's sakes and for ease of reading. But if the node objects don't support<or>comparison, that throws an exception.TopologicalSorter.print()now catches that exception and simply skips sorting. (It's only a presentation thing anyway.) - Added a secret (otherwise undocumented!) function:
multirpartition, which is likemultipartitionbut withreverse=True. - Added the list of conflicted nodes to the "node is incoherent" exception text.
Note: although version 0.6.12 was tagged, it was never packaged for release.
- Bugfix: When calling
0.6.11
-
tagged 2022/11/13
- Changed the import strategy. The top-level big module used
to import all its child modules, and
import *all the symbols from all those modules. But a friend (hi Mark Shannon!) talked me out of this. It's convenient, but if a user doesn't care about a particular module, why make them import it. So now the top-level big module contains nothing but a version number, and you can either import just the submodules you need, or you can import big.all to get all the symbols (like big itself used to do).
Note: although version 0.6.11 was tagged, it was never packaged for release.
- Changed the import strategy. The top-level big module used
to import all its child modules, and
0.6.10
-
released 2022/10/26
- All code changes had to do with
multisplit:- Fixed a subtle bug. When splitting with a separator that can overlap
itself, like
' x ',multisplitwill prefer the leftmost instance. But whenreverse=True, it must prefer the rightmost instance. Thanks to Eric V. Smith for suggesting the clever "reverse everything, callre.split, and un-reverse everything" approach. That let me fix this bug while still implementing on top ofre.split! - Implemented
PROGRESSIVEmode for thestripkeyword. This behaves likestr.strip: when splitting, strip on the left, then start splitting. If we don't exhaustmaxsplit, strip on the right; if we do exhaustmaxsplit, don't strip on the right. (Similarly forstr.rstripwhenreverse=True.) - Changed the default for
striptoFalse. It used to beNOT_SEPARATE. But this was too surprising--I'd forget that it was the default, and turning onkeepwouldn't return everything I thought I should get, and I'd head off to debugmultisplit, when in fact it was behaving as specified. The Principle Of Least Surprise tells me thatstripdefaulting toFalseis less surprising. Also, maintaining the invariant that all the keyword-only parameters tomultisplitdefault toFalseis a helpful mnemonic device in several ways. - Removed
NOT_SEPARATE(and the not-yet-implementedSTR_STRIP) modes forstrip. They're easy to implement yourself, and this removes some surface area from the already-too-bigmultisplitAPI.
- Fixed a subtle bug. When splitting with a separator that can overlap
itself, like
- Modernized
pyproject.tomlmetadata to makeflithappier. This was necessary to ensure thatpip install bigalso installs its dependencies.
- All code changes had to do with
0.6.8
-
released 2022/10/16
- Renamed two of the three freshly-added lines modifier functions:
lines_filter_containsis nowlines_containing, andlines_filter_grepis nowlines_grep.
- Renamed two of the three freshly-added lines modifier functions:
0.6.7
-
released 2022/10/16
- Added three new lines modifier functions
to the
textmodule:lines_filter_contains,lines_filter_grep, andlines_sort. gently_titlenow acceptsstrorbytes. Also added theapostrophesanddouble_quotesarguments.
- Added three new lines modifier functions
to the
0.6.6
-
released 2022/10/14
- Fixed a bug in
multisplit. I thought when usingkeep=AS_PAIRSthat it shouldn't ever emit a 2-tuple containing just empty strings--but on further reflection I've realized that that's correct. This behavior is now tested and documented, along with the reasoning behind it. - Added the
reverseflag tore_partition. whitespace_without_dosandnewlines_without_dosstill had the DOS end-of-line sequence in them! Oops!- Added a unit test to check that. The unit test also ensures that
whitespace,newlines, and all the variants (utf8_,ascii_, and_with_dos) exactly match the set of characters Python considers whitespace and newline characters.
- Added a unit test to check that. The unit test also ensures that
- Lots more documentation and formatting fixes.
- Fixed a bug in
0.6.5
-
released 2022/10/13
- Added the new
itertoolsmodule, which so far only containsPushbackIterator. - Added
lines_strip_comments[ed: nowlines_strip_line_commentsandsplit_quoted_stringsto thetextmodule.
- Added the new
0.6.1
-
released 2022/10/13
- I realized that
whitespaceshould contain the DOS end-of-line sequence ('\r\n'), as it should be considered a single separator when splitting etc. I added that, along withwhitespace_no_dos, and naturallyutf8_whitespace_no_dosandascii_whitespace_no_dostoo. - Minor doc fixes.
- I realized that
0.6
-
released 2022/10/13
A big upgrade!
- Completely retooled and upgraded
multisplit, and addedmultistripandmultipartition, collectively called Themulti-family of string functions. (Thanks to Eric Smith for suggestingmultipartition! Well, sort of.)[multisplit](#multisplits-separatorsnone--keepfalse-maxsplit-1-reversefalse-separatefalse-stripfalse)now supports five (!) keyword-only parameters, allowing the caller to tune its behavior to an amazing degree.- Also, the original implementation of
[multisplit](#multisplits-separatorsnone--keepfalse-maxsplit-1-reversefalse-separatefalse-stripfalse)got its semantics a bit wrong; it was inconsistent and maybe a little buggy. multistripis likestr.stripbut accepts an iterable of separator strings. It can strip from the left, right, both, or neither (in which case it does nothing).multipartitionis likestr.partition, but accepts an iterable of separator strings. It can also partition more than once, and supportsreverse=Truewhich causes it to partition from the right (likestr.rpartition).- Also added useful predefined lists of separators for use with all
the
multifunctions:whitespaceandnewlines, withascii_andutf8_versions of each, andwithout_dosvariants of all threenewlinesvariants.
- Added the
SchedulerandHeapclasses.Scheduleris a replacement for Python'ssched.schedulerclass, with a modernized interface and a major upgrade in functionality.Heapis an object-oriented interface to Python'sheapqmodule, used byScheduler. These are in their own modules,big.heapandbig.scheduler. - Added
linesand all thelines_modifiers. These are great for writing little text parsers. For more information, please see the tutorial onlinesand lines modifier functions. - Removed
stripped_linesandrstripped_linesfrom thetextmodule, as they're superceded by the far superiorlinesfamily. - Enhanced
normalize_whitespace. Added theseparatorsandreplacementparameters, and added support forbytesobjects. - Added the
countparameter tore_partitionandre_rpartition.
- Completely retooled and upgraded
0.5.2
-
released 2022/09/12
- Added
stripped_linesandrstripped_linesto thetextmodule. - Added support for
lento theTopologicalSorterobject.
- Added
0.5.1
-
released 2022/09/04
- Added
gently_titleandnormalize_whitespaceto thetextmodule. - Changed
translate_filename_to_exfatto handle translating':'in a special way. If the colon is followed by a space, then the colon is turned into' -'. This yields a more natural translation when colons are used in text, e.g.'xXx: The Return Of Xander Cage'is translated to'xXx - The Return Of Xander Cage'. If the colon is not followed by a space, turns the colon into'-'. This is good for tiresome modern gobbledygook like'Re:code', which will now be translated to'Re-code'.
- Added
0.5
-
released 2022/06/12
- Initial release.
-
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file big-0.13.1.tar.gz.
File metadata
- Download URL: big-0.13.1.tar.gz
- Upload date:
- Size: 5.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: python-requests/2.32.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
27c351f8e844801760119fcfe6e3659b13e05c18cebc4d12b1b435a675317932
|
|
| MD5 |
4252e6614718d08ae8eae143064a827b
|
|
| BLAKE2b-256 |
48149010efdb344ac13050bacdf5c4c060f89c5b6435397c3858b3750f51d6e7
|
File details
Details for the file big-0.13.1-py3-none-any.whl.
File metadata
- Download URL: big-0.13.1-py3-none-any.whl
- Upload date:
- Size: 290.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: python-requests/2.32.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
00a4f07dc2f19157a9e7e3d7ef3f87476d26f670fbf9f37887e4541e18c9cadb
|
|
| MD5 |
d2a8476f02865359190a5fd739ab57c1
|
|
| BLAKE2b-256 |
e9e445d5939cce7944ff2638aa49fab1017357e964a9957131eeda2cd9a8ab5d
|