Skip to main content

A project for feeding various nested data formats into pandas

Project description

bamboo

PyPI Release Build Status

bamboo is a library for feeding nested data formats into pandas. The space of data representable in nested formats is larger than the space covered by pandas. pandas supports only data representable in a flat table (though things like multi-indexs allows certain types of tree formats to be efficiently projected into a table). Data which supports arbitrary nesting is not in general convertible to a pandas dataframe. In particular, data which contains multiple repetition structures (e.g. JSON arrays) that are not nested within each other will not be flattenable into a table.

The current data formats supported are:

  • JSON
  • Apache Avro
  • Apache Arrow
  • Profobuf (via PBD)

bamboo works by projecting a flattenable portion (a subset of the nested columns) of the data into a pandas dataframe. By projecting various combinations of columns, one can make use of all the relationships implied by the nested structure of the data.

Installation

To install from PyPI:

pip install bamboo-nested

Example

A minimal example of flattening a JSON string:

from bamboo import from_json

obj = [{'a': None, 'b': [1, 2], 'c': [5, 6]}, {'a': -1.0, 'b': [3, 4], 'c': [7, 8]}]
node = from_json(json.dumps(obj))
    > - a float64
    > - b []uint64
    > - c []uint64

Flattening just the values of column a:

df_a = node.flatten(include=['a'])
    >      a
    > 0  NaN
    > 1 -1.0

Flattening columns a and b (note that column a is repeated to match the corresponding elements of column b):

df_ab = node.flatten(include=['a', 'b'])
    >      a  b
    > 0  NaN  1
    > 1  NaN  2
    > 2 -1.0  3
    > 3 -1.0  4

Trying to flatten two repetition lists at the same level will lead to an error (as this structure is unflattenable without taking a Cartesian product):

df_bc = node.flatten(include=['b', 'c'])
    > ValueError: Attempted to flatten conflicting lists

Building

To build this project:

Building from source requires cmake (pip install cmake) and Boost.

python setup.py

Unit tests

To run the unit tests:

python setup.py test

or use nose:

nosetests python/bamboo/tests

Licensing

This project is licensed under the Apache 2.0 License. It uses the pybind11, Arrow, Avro, nlohmann JSON, and PBD projects. The licenses can be found in those projects' directories.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bamboo-nested-0.1.0.tar.gz (19.6 kB view details)

Uploaded Source

Built Distributions

bamboo_nested-0.1.0-cp37-cp37m-manylinux2014_x86_64.whl (24.4 MB view details)

Uploaded CPython 3.7m

bamboo_nested-0.1.0-cp37-cp37m-macosx_10_13_intel.whl (378.8 kB view details)

Uploaded CPython 3.7m macOS 10.13+ intel

bamboo_nested-0.1.0-cp36-cp36m-manylinux2014_x86_64.whl (24.4 MB view details)

Uploaded CPython 3.6m

bamboo_nested-0.1.0-cp36-cp36m-macosx_10_13_intel.whl (378.7 kB view details)

Uploaded CPython 3.6m macOS 10.13+ intel

bamboo_nested-0.1.0-cp35-cp35m-manylinux2014_x86_64.whl (24.4 MB view details)

Uploaded CPython 3.5m

bamboo_nested-0.1.0-cp35-cp35m-macosx_10_13_intel.whl (378.7 kB view details)

Uploaded CPython 3.5m macOS 10.13+ intel

File details

Details for the file bamboo-nested-0.1.0.tar.gz.

File metadata

  • Download URL: bamboo-nested-0.1.0.tar.gz
  • Upload date:
  • Size: 19.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.6.7

File hashes

Hashes for bamboo-nested-0.1.0.tar.gz
Algorithm Hash digest
SHA256 084596665071ff321a51676764ed712492d76e2c1530f35f5e18f7577ac2688b
MD5 a459983bb2a257e7b28c7755081e02cb
BLAKE2b-256 7b3432b30f2efaa7f4549cd84ec948318cd05f46cf0630917563ee3c207c416a

See more details on using hashes here.

File details

Details for the file bamboo_nested-0.1.0-cp37-cp37m-manylinux2014_x86_64.whl.

File metadata

  • Download URL: bamboo_nested-0.1.0-cp37-cp37m-manylinux2014_x86_64.whl
  • Upload date:
  • Size: 24.4 MB
  • Tags: CPython 3.7m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.6.7

File hashes

Hashes for bamboo_nested-0.1.0-cp37-cp37m-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 d891d60d52ae2fbe1d774baf1175829f040942806335114a950f11b52eec76e2
MD5 00aae4c41e2cca9fbcfc9f657c2fc4b9
BLAKE2b-256 aa1b4708d75379ff2908fd2c6ee4a0d9e1ed34f200ad285cc2eb49bdfb3acfee

See more details on using hashes here.

File details

Details for the file bamboo_nested-0.1.0-cp37-cp37m-macosx_10_13_intel.whl.

File metadata

  • Download URL: bamboo_nested-0.1.0-cp37-cp37m-macosx_10_13_intel.whl
  • Upload date:
  • Size: 378.8 kB
  • Tags: CPython 3.7m, macOS 10.13+ intel
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.15.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.4.0 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/2.7.17

File hashes

Hashes for bamboo_nested-0.1.0-cp37-cp37m-macosx_10_13_intel.whl
Algorithm Hash digest
SHA256 344d89470dd76e4b0a8c69410666d1c8b72c86e2148162de8abeead936520ad1
MD5 0213786ab89498c9ffa13e833d49e18b
BLAKE2b-256 f856b7adf54cf58df4198b96da08055dd35fe434de1935a73a9bf46bb9864efb

See more details on using hashes here.

File details

Details for the file bamboo_nested-0.1.0-cp36-cp36m-manylinux2014_x86_64.whl.

File metadata

  • Download URL: bamboo_nested-0.1.0-cp36-cp36m-manylinux2014_x86_64.whl
  • Upload date:
  • Size: 24.4 MB
  • Tags: CPython 3.6m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.6.7

File hashes

Hashes for bamboo_nested-0.1.0-cp36-cp36m-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 c592931c103b5bcfbb1cd26bfd1a1ffa952fe39c10ca7452787f46deb0800671
MD5 2fe1acf3fbc1f04550b65bf93f427ad7
BLAKE2b-256 ead011e670997c08c0967bcbc02c132bf8214dfcae5f13af3df0061cfcbbbe89

See more details on using hashes here.

File details

Details for the file bamboo_nested-0.1.0-cp36-cp36m-macosx_10_13_intel.whl.

File metadata

  • Download URL: bamboo_nested-0.1.0-cp36-cp36m-macosx_10_13_intel.whl
  • Upload date:
  • Size: 378.7 kB
  • Tags: CPython 3.6m, macOS 10.13+ intel
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.15.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.4.0 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/2.7.17

File hashes

Hashes for bamboo_nested-0.1.0-cp36-cp36m-macosx_10_13_intel.whl
Algorithm Hash digest
SHA256 9edfc4706574050e004bd5d6be517eca42712a70fcb238e6735472249c2987e3
MD5 3958a3a774bf50b3b32e5378a9effae8
BLAKE2b-256 f8dfe5ffa868803db2c9bd5235e06660cad4cc705afae1ba4bc89306f53a5b83

See more details on using hashes here.

File details

Details for the file bamboo_nested-0.1.0-cp35-cp35m-manylinux2014_x86_64.whl.

File metadata

  • Download URL: bamboo_nested-0.1.0-cp35-cp35m-manylinux2014_x86_64.whl
  • Upload date:
  • Size: 24.4 MB
  • Tags: CPython 3.5m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.6.7

File hashes

Hashes for bamboo_nested-0.1.0-cp35-cp35m-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 51e520da5924ac14d5ccd570ca216024e662653ce6faa5a6b692c10a24dbe123
MD5 e4cb5e616d4c36254b15b3cf9e4d0c74
BLAKE2b-256 f7382e16ca82248905dd68250fa47a2f2b88bb134d8620166b85d9aa3729abbf

See more details on using hashes here.

File details

Details for the file bamboo_nested-0.1.0-cp35-cp35m-macosx_10_13_intel.whl.

File metadata

  • Download URL: bamboo_nested-0.1.0-cp35-cp35m-macosx_10_13_intel.whl
  • Upload date:
  • Size: 378.7 kB
  • Tags: CPython 3.5m, macOS 10.13+ intel
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.15.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.4.0 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/2.7.17

File hashes

Hashes for bamboo_nested-0.1.0-cp35-cp35m-macosx_10_13_intel.whl
Algorithm Hash digest
SHA256 77ef507522c2d6bbb16b9b13fa0cf9bd3c096a8c4e8221cf7ed665b0bb12eefe
MD5 fa971d65c0d39acf866105f98add92ad
BLAKE2b-256 8e7c0604cc82a771d3bef4feca16e83f4f2102256ea8e83a8b1f9debf7f2f36a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page