Facilities to do with buffers, particularly CornuCopyBuffer, an automatically refilling buffer to support parsing of data streams.
Project description
Latest release 20200328: CornuCopyBuffer.takev: bugfix adjustment of buf.offset, was not always done. CornuCopyBuffer.getitem: add slice support, note how expensive it is to use.
Facilities to do with buffers, particularly CornuCopyBuffer, an automatically refilling buffer to support parsing of data streams.
Function chunky(bfr_func)
Decorator for a function accepting a leading CornuCopyBuffer
parameter.
Returns a function accepting a leading data chunks parameter
(bytes instances) and optional offset
and 'copy_offsets`
keywords parameters.
Example::
@chunky
def func(bfr, ...):
Class CopyingIterator
Wrapper for an iterator that copies every item retrieved to a callable.
Method CopyingIterator.__init__(self, I, copy_to)
Initialise with the iterator I
and the callable copy_to
.
Class CornuCopyBuffer
An automatically refilling buffer intended to support parsing of data streams.
Its purpose is to aid binary parsers
which do not themselves need to handle sources specially;
CornuCopyBuffer
s are trivially made from bytes
,
iterables of bytes
and file-like objects.
See cs.binary
for convenient parsing classes
which work against CornuCopyBuffer
s.
Attributes:
buf
: the first of any buffered leading chunks buffer of unparsed data from the input, available for direct inspection by parsers; normally however parsers will use.extend
and.take
.offset
: the logical offset of the buffer; this excludes buffered data and unconsumed input data
The primary methods supporting parsing of data streams are
.extend()
and take()
.
Calling .extend(min_size)
arranges that .buf
contains at least
min_size
bytes.
Calling .take(size)
fetches exactly size
bytes from .buf
and the
input source if necessary and returns them, adjusting .buf
.
len(CornuCopyBuffer
) returns the length of any buffered data.
bool(CornuCopyBuffer
) tests whether len() > 0.
Indexing a CornuCopyBuffer
accesses the buffered data only,
returning an individual byte's value (an int
).
A CornuCopyBuffer
is also iterable, yielding data in whatever
sizes come from its input_data
source, preceeded by the
current .buf
if not empty.
A CornuCopyBuffer
also supports the file methods .read
,
.tell
and .seek
supporting drop in use of the buffer in
many file contexts. Backward seeks are not supported. .seek
will take advantage of the input_data
's .seek method if it
has one, otherwise it will use reads.
Method CornuCopyBuffer.__init__(self, input_data, buf=None, offset=0, seekable=None, copy_offsets=None, copy_chunks=None)
Prepare the buffer.
Parameters:
input_data
: an iterable of data chunks (bytes-like instances); if your data source is a file see the .from_file factory; if your data source is a file descriptor see the .from_fd factory.buf
: if notNone
, the initial state of the parse bufferoffset
: logical offset of the start of the buffer, default 0seekable
: whetherinput_data
has a working.seek
method; the default is None meaning that it will be attempted on the first skip or seekcopy_offsets
: if notNone
, a callable for parsers to report pertinent offsets via the buffer's .report_offset methodcopy_chunks
: if notNone
, every fetched data chunk is copied to this callable
The input_data
is an iterable whose iterator may have
some optional additional properties:
seek
: if present, this is a seek method after the fashion offile.seek
; the buffer'sseek
,skip
andskipto
methods will take advantage of this if available.offset
: the current byte offset of the iterator; this is used during the buffer initialisation to computeinput_data_displacement
, the difference between the buffer's logical offset and the input data's logical offset; if unavailable during initialisation this is presumed to be0
.end_offset
: the end offset of the iterator if known.
Class FDIterator(_Iterator)
An iterator over the data of a file descriptor.
Note: the iterator works with an os.dup() of the file descriptor so that it can close it with impunity; this requires the caller to close their descriptor.
Method FDIterator.__init__(self, fd, offset=None, readsize=None, align=True)
Initialise the iterator.
Parameters:
fd
: file descriptoroffset
: the initial logical offset, kept up to date by iteration; the default is the current file position.readsize
: a preferred read size; if omitted thenDEFAULT_READSIZE
will be storedalign
: whether to align reads by default: if true then the iterator will do a short read to bring theoffset
into alignment withreadsize
; the default isTrue
Class FileIterator(_Iterator,SeekableIteratorMixin)
An iterator over the data of a file object.
Note: the iterator closes the file on __del__
or if its
.close
method is called.
Method FileIterator.__init__(self, fp, offset=None, readsize=None, align=False)
Initialise the iterator.
Parameters:
fp
: file objectoffset
: the initial logical offset, kept up to date by iteration; the default is 0.readsize
: a preferred read size; if omitted thenDEFAULT_READSIZE
will be storedalign
: whether to align reads by default: if true then the iterator will do a short read to bring theoffset
into alignment withreadsize
; the default isFalse
Class SeekableFDIterator(FDIterator,_Iterator,SeekableIteratorMixin)
An iterator over the data of a seekable file descriptor.
Note: the iterator works with an os.dup()
of the file
descriptor so that it can close it with impunity; this requires
the caller to close their descriptor.
Class SeekableFileIterator(FileIterator,_Iterator,SeekableIteratorMixin)
An iterator over the data of a seekable file object.
Note: the iterator closes the file on del or if its .close method is called.
Method SeekableFileIterator.__init__(self, fp, offset=None, **kw)
Initialise the iterator.
Parameters:
fp
: file objectoffset
: the initial logical offset, kept up to date by iteration; the default is the current file position.readsize
: a preferred read size; if omitted thenDEFAULT_READSIZE
will be storedalign
: whether to align reads by default: if true then the iterator will do a short read to bring theoffset
into alignment withreadsize
; the default isFalse
Class SeekableIteratorMixin
Mixin supplying a logical with a seek
method.
Class SeekableMMapIterator(_Iterator,SeekableIteratorMixin)
An iterator over the data of a mappable file descriptor.
Note: the iterator works with an mmap
of an os.dup()
of the
file descriptor so that it can close it with impunity; this
requires the caller to close their descriptor.
Method SeekableMMapIterator.__init__(self, fd, offset=None, readsize=None, align=True)
Initialise the iterator.
Parameters:
offset
: the initial logical offset, kept up to date by iteration; the default is the current file position.readsize
: a preferred read size; if omitted thenDEFAULT_READSIZE
will be storedalign
: whether to align reads by default: if true then the iterator will do a short read to bring theoffset
into alignment withreadsize
; the default isTrue
Release Log
Release 20200328: CornuCopyBuffer.takev: bugfix adjustment of buf.offset, was not always done. CornuCopyBuffer.getitem: add slice support, note how expensive it is to use.
Release 20200229: New CornuCopyBuffer.byte0() method consuming the next byte and returning it as an int. CornuCopyBuffer.takev: bugfix for size=0, logic refactor. CornuCopyBuffer: new .selfcheck method.
Release 20200130: CornuCopyBuffer.skip: bugfix adjustment of skipto for already buffered data.
Release 20191230.1: Docstring updates. Semantic changes were in the previous release.
Release 20191230: CornuCopyBuffer: accept a size of Ellipsis in .take and .extend methods, indicating "all the remaining data". CornuCopyBuffer: refactor the buffering, replacing .buf with .bufs as an array of chunks; this enables support for the new .push method and reduces memory copying.
Release 20181231: Small bugfix.
Release 20181108: New at_eof() method. Python 2 tweak to support incidental import by python 2 even if unused.
Release 20180823: Better handling of seekable and unseekable input data. Tiny bugfix for from_bytes sanity check.
Release 20180810:
Refactor SeekableFDIterator and SeekableFileIterator to subclass new SeekableIterator.
New SeekableMMapIterator to process a memory mapped file descriptor, intended for large files.
New CornuCopyBuffer.hint method to pass a length hint through to the input_data iterator
if it has a hint
method, causing it possibly to make a differently sized fetch.
SeekableIterator: new del method calling self.close() - subclasses must provide
a .close, which should be safe to call multiple times.
CornuCopyBuffer: add support for .offset and .end_offset optional attributes on the input_data iterator.
_BoundedBufferIterator: add .offset property plumbed to the underlying buffer offset.
New CornuCopyBuffer.from_mmap to make a mmap backed buffer so that large data can be returned without penalty.
Assorted fixes and doc improvements.
Release 20180805: Bugfixes for at_eof method and end_offset initialisation.
Release 20180726.1: Improve docstrings and release with better long_description.
Release 20180726: First PyPI release: CornuCopyBuffer and friends.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.