Skip to main content

A project for feeding various nested data formats into pandas

Project description

<!---
Copyright (c) 2019 Michael Vilim

This file is part of the bamboo library. It is currently hosted at
https://github.com/mvilim/bamboo

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->

## bamboo

[![PyPI Release](https://img.shields.io/pypi/v/bamboo-nested.svg)](https://pypi.org/project/bamboo-nested/)
[![Build Status](https://travis-ci.org/mvilim/bamboo.svg?branch=master)](https://travis-ci.org/mvilim/bamboo)

bamboo is a library for feeding nested data formats into pandas. The space of data representable in nested formats is larger than the space covered by pandas. pandas supports only data representable in a flat table (though things like multi-indexs allows certain types of tree formats to be efficiently projected into a table). Data which supports arbitrary nesting is not in general convertible to a pandas dataframe. In particular, data which contains multiple repetition structures (e.g. JSON arrays) that are not nested within each other will not be flattenable into a table.

As a simple example

The current data formats supported are:
* JSON
* Apache Avro
* Apache Arrow
* Profobuf (via [PBD](https://github.com/mvilim/pbd))

bamboo works by projecting a flattenable portion (a subset of the nested columns) of the data into a pandas dataframe. By projecting various combinations of columns, one can make use of all the relationships implied by the nested structure of the data.

### Installation

To install from PyPI:

```
pip install bamboo-nested
```

### Example

A minimal example of flattening a JSON string:

```
from bamboo import from_json

obj = [{'a': None, 'b': [1, 2], 'c': [5, 6]}, {'a': -1.0, 'b': [3, 4], 'c': [7, 8]}]
node = from_json(json.dumps(obj))
> - a float64
> - b []uint64
> - c []uint64
```

Flattening just the values of column `a`:

```
df_a = node.flatten(include=['a'])
> a
> 0 NaN
> 1 -1.0
```

Flattening columns `a` and `b` (note that column `a` is repeated to match the corresponding elements of column `b`):

```
df_ab = node.flatten(include=['a', 'b'])
> a b
> 0 NaN 1
> 1 NaN 2
> 2 -1.0 3
> 3 -1.0 4
```

Trying to flatten two repetition lists at the same level will lead to an error (as this structure is unflattenable without taking a Cartesian product):

```
df_bc = node.flatten(include=['b', 'c'])
> ValueError: Attempted to flatten conflicting lists
```

### Building

To build this project:

Building from source requires cmake (`pip install cmake`) and Boost.

```
python setup.py
```

### Unit tests

To run the unit tests:

```
python setup.py test
```

or use nose:

```
nosetests python/bamboo/tests
```

### Licensing

This project is licensed under the [Apache 2.0 License](https://github.com/mvilim/bamboo/blob/master/LICENSE). It uses the pybind11, Arrow, Avro, nlohmann JSON, and PBD projects. The licenses can be found in those [projects' directories](https://github.com/mvilim/bamboo/blob/master/cpp/thirdparty).


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bamboo-nested-0.0.7.tar.gz (18.5 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

bamboo_nested-0.0.7-cp37-cp37m-manylinux1_x86_64.whl (11.8 MB view details)

Uploaded CPython 3.7m

bamboo_nested-0.0.7-cp37-cp37m-macosx_10_6_intel.whl (1.2 MB view details)

Uploaded CPython 3.7mmacOS 10.6+ Intel (x86-64, i386)

bamboo_nested-0.0.7-cp36-cp36m-manylinux1_x86_64.whl (11.8 MB view details)

Uploaded CPython 3.6m

bamboo_nested-0.0.7-cp36-cp36m-macosx_10_6_intel.whl (1.2 MB view details)

Uploaded CPython 3.6mmacOS 10.6+ Intel (x86-64, i386)

bamboo_nested-0.0.7-cp35-cp35m-manylinux1_x86_64.whl (11.8 MB view details)

Uploaded CPython 3.5m

bamboo_nested-0.0.7-cp35-cp35m-macosx_10_6_intel.whl (1.2 MB view details)

Uploaded CPython 3.5mmacOS 10.6+ Intel (x86-64, i386)

bamboo_nested-0.0.7-cp27-cp27mu-manylinux1_x86_64.whl (11.8 MB view details)

Uploaded CPython 2.7mu

bamboo_nested-0.0.7-cp27-cp27m-manylinux1_x86_64.whl (11.8 MB view details)

Uploaded CPython 2.7m

bamboo_nested-0.0.7-cp27-cp27m-macosx_10_6_intel.whl (1.2 MB view details)

Uploaded CPython 2.7mmacOS 10.6+ Intel (x86-64, i386)

File details

Details for the file bamboo-nested-0.0.7.tar.gz.

File metadata

  • Download URL: bamboo-nested-0.0.7.tar.gz
  • Upload date:
  • Size: 18.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/38.2.4 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.3

File hashes

Hashes for bamboo-nested-0.0.7.tar.gz
Algorithm Hash digest
SHA256 3d9e0cb8549030dc593aab7b10b39d96cd25e87ebd311e684b4d280d1b046cd3
MD5 f4603dbcb8d4e9bf08072305c809e614
BLAKE2b-256 a99cb5a01479a4bb623766697136cf818dc2b6be5b11845f76e8b5f9c9d154f6

See more details on using hashes here.

File details

Details for the file bamboo_nested-0.0.7-cp37-cp37m-manylinux1_x86_64.whl.

File metadata

  • Download URL: bamboo_nested-0.0.7-cp37-cp37m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 11.8 MB
  • Tags: CPython 3.7m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/38.2.4 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.3

File hashes

Hashes for bamboo_nested-0.0.7-cp37-cp37m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 2d87148fed48174bf2dbfe6678ed048b51829c14f4b26955a1444ba9fef27a07
MD5 7d1909fbcd089a00258a1c2c30161208
BLAKE2b-256 e6c67c6cd559dc675d0e8f9b502dbbdcb5a8be6b1504d40e7509ce01e98730bf

See more details on using hashes here.

File details

Details for the file bamboo_nested-0.0.7-cp37-cp37m-macosx_10_6_intel.whl.

File metadata

  • Download URL: bamboo_nested-0.0.7-cp37-cp37m-macosx_10_6_intel.whl
  • Upload date:
  • Size: 1.2 MB
  • Tags: CPython 3.7m, macOS 10.6+ Intel (x86-64, i386)
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/2.7.15

File hashes

Hashes for bamboo_nested-0.0.7-cp37-cp37m-macosx_10_6_intel.whl
Algorithm Hash digest
SHA256 6f0b80ba045aee4072ff6d66102417fa23db6dd53842bba0c1b38a257bad6bdb
MD5 4ff045132c27a1a3acc8c97d70b54c10
BLAKE2b-256 61165af20cf8114650b4481a2ef220d4389a2c6d5c77dc68464ff798e60d7bae

See more details on using hashes here.

File details

Details for the file bamboo_nested-0.0.7-cp36-cp36m-manylinux1_x86_64.whl.

File metadata

  • Download URL: bamboo_nested-0.0.7-cp36-cp36m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 11.8 MB
  • Tags: CPython 3.6m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/38.2.4 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.3

File hashes

Hashes for bamboo_nested-0.0.7-cp36-cp36m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 fdeaf3366a5d9055d90688ea386926a8aa29598eb8ec85ba00609b4772bf77aa
MD5 92e3ebe9caf5cbffce91a48bc21f2b6d
BLAKE2b-256 1be36ad4700d16993a3b144bc4bf1b54324a1b4f8b799582aeaad80160fc5571

See more details on using hashes here.

File details

Details for the file bamboo_nested-0.0.7-cp36-cp36m-macosx_10_6_intel.whl.

File metadata

  • Download URL: bamboo_nested-0.0.7-cp36-cp36m-macosx_10_6_intel.whl
  • Upload date:
  • Size: 1.2 MB
  • Tags: CPython 3.6m, macOS 10.6+ Intel (x86-64, i386)
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/2.7.15

File hashes

Hashes for bamboo_nested-0.0.7-cp36-cp36m-macosx_10_6_intel.whl
Algorithm Hash digest
SHA256 9128838b2f05958703e02434abc766fb3a66d9895c77b50721c3d7ee2a31b34f
MD5 da55798412de87d4a20b5867bb0dd39e
BLAKE2b-256 dbdec7a480bcc387c404cf90889021f112a994ad3fcb7dfbdb72f3023fb71c94

See more details on using hashes here.

File details

Details for the file bamboo_nested-0.0.7-cp35-cp35m-manylinux1_x86_64.whl.

File metadata

  • Download URL: bamboo_nested-0.0.7-cp35-cp35m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 11.8 MB
  • Tags: CPython 3.5m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/38.2.4 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.3

File hashes

Hashes for bamboo_nested-0.0.7-cp35-cp35m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 f1b8829ca3ff933640c2b71fad0b950f545e1d059e0c6af4c69e4a0a5ce2cb48
MD5 ab27ff60d2da42000d0e41f6465110ce
BLAKE2b-256 e652907be080e1af2a010b99f786c6b6270e825c7b4420a7a92cd0d4618a1101

See more details on using hashes here.

File details

Details for the file bamboo_nested-0.0.7-cp35-cp35m-macosx_10_6_intel.whl.

File metadata

  • Download URL: bamboo_nested-0.0.7-cp35-cp35m-macosx_10_6_intel.whl
  • Upload date:
  • Size: 1.2 MB
  • Tags: CPython 3.5m, macOS 10.6+ Intel (x86-64, i386)
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/2.7.15

File hashes

Hashes for bamboo_nested-0.0.7-cp35-cp35m-macosx_10_6_intel.whl
Algorithm Hash digest
SHA256 0e8087ad0bfd826f2377afe2e2a5c1422be42400619fd8ed3192e152ea0205d9
MD5 a766e3b5432b46c869b2cc846c7f5772
BLAKE2b-256 c00769de5d95b989f8cc475ff3808f6401c30e8052a14e451032a99821306f15

See more details on using hashes here.

File details

Details for the file bamboo_nested-0.0.7-cp27-cp27mu-manylinux1_x86_64.whl.

File metadata

  • Download URL: bamboo_nested-0.0.7-cp27-cp27mu-manylinux1_x86_64.whl
  • Upload date:
  • Size: 11.8 MB
  • Tags: CPython 2.7mu
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/38.2.4 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.3

File hashes

Hashes for bamboo_nested-0.0.7-cp27-cp27mu-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 b6ae3807e1108f05529007fb65f03d9b45c9886aebedd8838a4daa3b354e6a21
MD5 8b9e915f825f89f04c48ffb4461646b1
BLAKE2b-256 a8acdaa743d64f6a49dc8e709fe4b253a187d8fc3c7fcdf43d745c84040db74a

See more details on using hashes here.

File details

Details for the file bamboo_nested-0.0.7-cp27-cp27m-manylinux1_x86_64.whl.

File metadata

  • Download URL: bamboo_nested-0.0.7-cp27-cp27m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 11.8 MB
  • Tags: CPython 2.7m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/38.2.4 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.3

File hashes

Hashes for bamboo_nested-0.0.7-cp27-cp27m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 ddba2e7e065c78a958ff80c54d4e7685409f39dd11aef004b2b4aeae763ecba3
MD5 5bf9d9e84b7261ae38ea66c5093e8bf8
BLAKE2b-256 978c1cd5d295a3c7e74862aee98afb30d170b87c39c8f9c0fddf120108c60f37

See more details on using hashes here.

File details

Details for the file bamboo_nested-0.0.7-cp27-cp27m-macosx_10_6_intel.whl.

File metadata

  • Download URL: bamboo_nested-0.0.7-cp27-cp27m-macosx_10_6_intel.whl
  • Upload date:
  • Size: 1.2 MB
  • Tags: CPython 2.7m, macOS 10.6+ Intel (x86-64, i386)
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/2.7.15

File hashes

Hashes for bamboo_nested-0.0.7-cp27-cp27m-macosx_10_6_intel.whl
Algorithm Hash digest
SHA256 5235ed93d4c5fcabb8976103ad1d2044837ae1fcd899caf35eff818c3ea2fb73
MD5 248a2a21b6fe58d61644a6d06a8e8f06
BLAKE2b-256 cac2baf9132570ea1971d66f999c5885204f5d6223515ab7deb0fc8f2e057417

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page