sharedbuffers·PyPI

Shared-memory structured buffers

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 4 - Beta
Intended Audience
- Developers
License
- OSI Approved :: BSD License
Operating System
- OS Independent
Programming Language
- Python
Topic
- Software Development :: Libraries :: Python Modules

Project description

sharedbuffers

This library implements shared-memory typed buffers that can be read and manipulated (and we’ll eventually support writes too) efficiently without serialization or deserialization.

The main supported implementation of obtaining shared memory is by memory-mapping files, but the library also supports mapping buffers (anonymous mmap objects) as well, albeit they’re harder to share among processes.

Supported primivite types:

int (up to 64 bit precision)

str (bytes)

unicode

frozenset

tuple / list

dict

buffer

date

datetime

numpy arrays

decimal

Primitive types can be cloned into their actual builtin objects (As specified by the mapped types), which is fast, but potentially memory-intensive. In addition, they can be proxied, in which case they will be built directly on top of the memory mapping, without the need for constructing the actual object. Proxied objects aim at supporting the same interface as the builtin containers.

Objects can be registered with schema serializers and thus composite types can be mapped as well. For this to function properly, objects need a class attribute specifying the attributes it holds and the type of the attributes. When an attribute doesn’t have a clearly defined type, it can be wrapped in a RTTI-containing container by specifying it as type object.

For example:

class SomeStruct(object):
    __slot_types__ = {
        'a' : int,
        'b' : float,
        's' : str,
        'u' : unicode,
        'fset' : frozenset,
        'l' : list,
        'o' : object,
    }
    __slots__ = __slot_types__.keys()

Adding __slot_types__, however, isn’t enough to make the object mappable. A schema definition needs to be created, which can be used to map files or buffers and obtain proxies to the information within:

class SomeStruct(object):
    __slot_types__ = {
        'a' : int,
        'b' : float,
        's' : str,
        'u' : unicode,
        'fset' : frozenset,
        'l' : list,
        'o' : object,
    }
    __slots__ = __slot_types__.keys()
    __schema__ = mapped_struct.Schema.from_typed_slots(__slot_types__)

Using the schema is thus straightforward:

s = SomeStruct()
s.a = 3
s.s = 'blah'
s.fset = frozenset([1,3])
s.o = 3
s.__schema__.pack(s) # returns a bytearray

buf = bytearray(1000)
s.__schema__.pack_into(s, buf, 10) # writes in offset 10 of buf, returns the size of the written object
p = s.__schema__.unpack_from(s, buf, 10) # returns a proxy for the object just packed into buf, does not deserialize
print p.a
print p.s
print p.fset

Typed objects can be nested, but for that a typecode must be assigned to each type in order for RTTI to properly identify the custom types:

SomeStruct.__mapped_type__ = mapped_struct.mapped_object.register_schema(
    SomeStruct, SomeStruct.__schema__, 'S')

From then on, SomeStruct can be used as any other type when declaring field types.

High-level typed container classes can be created by inheriting the proper base class. Currently, there are three kind of mappings supported: string-to-object, uint-to-object and a generic object-to-object. The first two are provided for efficiency’s sake; use the generic one when the others won’t do.

class StructArray(mapped_struct.MappedArrayProxyBase):
    schema = SomeStruct.__schema__
class StructNameMapping(mapped_struct.MappedMappingProxyBase):
    IdMapper = mapped_struct.StringIdMapper
    ValueArray = StructArray
class StructIdMapping(mapped_struct.MappedMappingProxyBase):
    IdMapper = mapped_struct.NumericIdMapper
    ValueArray = StructArray
class StructObjectMapping(mapped_struct.MappedMappingProxyBase):
    IdMapper = mapped_struct.ObjectIdMapper
    ValueArray = StructArray

The API for these high-level container objects is aimed at collections that don’t really fit in RAM in their pure-python form, so they must be built using an iterator over the items (ideally a generator that doesn’t put the whole collection in memory at once), and then mapped from the resulting file or buffer. An example:

with tempfile.NamedTemporaryFile() as destfile:
    arr = StructArray.build([SomeStruct(), SomeStruct()], destfile=destfile)
    print arr[0]

with tempfile.NamedTemporaryFile() as destfile:
    arr = StructNameMapping.build(dict(a=SomeStruct(), b=SomeStruct()).iteritems(), destfile=destfile)
    print arr['a']

with tempfile.NamedTemporaryFile() as destfile:
    arr = StructIdMapping.build({1:SomeStruct(), 3:SomeStruct()}.iteritems(), destfile=destfile)
    print arr[3]

When using nested hierarchies, it’s possible to unify references to the same object by specifying an idmap dict. However, since the idmap will map objects by their id(), objects must be kept alive by holding references to them while they’re still referenced in the idmap, so its usage is non-trivial. An example technique:

def all_structs(idmap):
    iter_all = iter(some_generator)
    while True:
        idmap.clear()

        sstructs = list(itertools.islice(iter_all, 10000))
        if not sstructs:
            break

        for ss in sstructs :
            # mapping from "s" attribute to struct
            yield (ss.s, ss)
        del sstructs

idmap = {}
name_mapping = StructNameMapping.build(all_structs(idmap),
    destfile = destfile, idmap = idmap)

The above code syncs the lifetime of objects and their idmap entries to avoid mapping issues. If the invariant isn’t maintained (objects referenced in the idmap are alive and holding a unique id() value), the result will be silent corruption of the resulting mapping due to object identity mixups.

There are variants of the mapping proxy classes and their associated id mapper classes that implement multi-maps. That is, mappings that, when fed with multiple values for a key, will return a list of values for that key rather than a single key. Their in-memory representation is identical, but their querying API returns all matching values rather than the first one, so multi-maps and simple mappings are binary compatible.

Multi-maps with string keys can also be approximate, meaning the original keys will be discarded and the mapping will only work with hashes, making the map much faster and more compact, at the expense of some inaccuracy where the returned values could have extra values corresponding to other keys whose hash collide with the one being requested.

Tests

Running tests can be done locally or on docker, using the script run-tests.sh:

$> virtualenv venv
$> . venv/bin/activate
$> sh ./run-tests.sh

Alternatively, running it on docker can be done with the following command:

$> docker run -v ${PWD}:/opt/sharedbuffers -w /opt/sharedbuffers python:2.7 /bin/sh run-tests.sh

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 4 - Beta
Intended Audience
- Developers
License
- OSI Approved :: BSD License
Operating System
- OS Independent
Programming Language
- Python
Topic
- Software Development :: Libraries :: Python Modules

Release history Release notifications | RSS feed

1.2.2

Jun 5, 2024

1.2.1

Oct 10, 2023

1.2.0

Aug 17, 2023

1.1.1

Jun 21, 2023

1.1.0

May 24, 2023

1.0.0

Aug 23, 2022

0.9.1

Mar 25, 2021

0.9.0

Feb 25, 2021

0.8.2

Sep 12, 2019

0.8.1

Sep 11, 2019

0.8.0

Aug 8, 2019

0.7.2

Jun 27, 2019

0.7.1

Jun 13, 2019

0.7.0

Jun 6, 2019

0.6.4

May 30, 2019

0.6.2

Feb 21, 2019

0.6.1

Jan 23, 2019

0.6.0

Dec 27, 2018

0.5.1

Dec 17, 2018

This version

0.5.0

Dec 11, 2018

0.4.9

Oct 30, 2018

0.4.8

May 28, 2018

0.4.7

Feb 22, 2018

0.4.6

Dec 18, 2017

0.4.5

Oct 12, 2017

0.4.4

Oct 2, 2017

0.4.3

Sep 28, 2017

0.4.2

Aug 14, 2017

0.4.1

Jul 18, 2017

0.4.0

Jul 12, 2017

0.3.3

Apr 25, 2017

0.3.2

Apr 7, 2017

0.3.1

Nov 9, 2016

0.3.0

Nov 8, 2016

0.2.1

Oct 19, 2016

0.2.0

Oct 11, 2016

0.1

Oct 5, 2016

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sharedbuffers-0.5.0.tar.gz (945.5 kB view details)

Uploaded Dec 11, 2018 Source

File details

Details for the file sharedbuffers-0.5.0.tar.gz.

File metadata

Download URL: sharedbuffers-0.5.0.tar.gz
Upload date: Dec 11, 2018
Size: 945.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.9.1 pkginfo/1.4.1 requests/2.7.0 setuptools/36.0.1 requests-toolbelt/0.8.0 tqdm/4.15.0 CPython/2.7.12

File hashes

Hashes for sharedbuffers-0.5.0.tar.gz
Algorithm	Hash digest
SHA256	`389905b4060f0f1c5d82792ea0f8c063f84afc98e55acf6fadf5ae1ec17f62ed`
MD5	`95f7e8a0d4c79b2b6abcbed513672538`
BLAKE2b-256	`def4e3cad9b3276f8b1dab511e43dbb289ba2336fd920331ab65b9be4c25e92c`

See more details on using hashes here.

sharedbuffers 0.5.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

sharedbuffers

Tests

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes