Skip to main content

Flexible dataframe representation to support nested structures.

Project description

Project generated with PyScaffold PyPI-Server Unit tests

BiocFrame

This package provides BiocFrame class, an alternative to Pandas DataFrame's.

BiocFrame makes no assumption on the types of the columns, the minimum requirement is each column implements length: __len__ and slice: __getitem__ dunder methods. This allows BiocFrame to accept nested representations or any supported class as columns.

To get started, install the package from PyPI

pip install biocframe

Usage

To construct a BiocFrame object, simply provide the data as a dictionary.

from random import random
from biocframe import BiocFrame

obj = {
    "ensembl": ["ENS00001", "ENS00002", "ENS00003"],
    "symbol": ["MAP1A", "BIN1", "ESR1"],
}
bframe = BiocFrame(obj)
print(bframe)
## output
BiocFrame with 3 rows & 2 columns
┏━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ ensembl <list> ┃ symbol <list> ┃
┡━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ ENS00001       │ MAP1A         │
│ ENS00002       │ BIN1          │
│ ENS00003       │ ESR1          │
└────────────────┴───────────────┘

You can specify complex representations as columns, for example

obj = {
    "ensembl": ["ENS00001", "ENS00002", "ENS00002"],
    "symbol": ["MAP1A", "BIN1", "ESR1"],
    "ranges": BiocFrame({
        "chr": ["chr1", "chr2", "chr3"],
        "start": [1000, 1100, 5000],
        "end": [1100, 4000, 5500]
    }),
}

bframe2 = BiocFrame(obj, row_names=["row1", "row2", "row3"])
print(bframe2)
## output
BiocFrame with 3 rows & 3 columns
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ row_names ┃ ensembl <list> ┃ symbol <list> ┃ ranges <BiocFrame>                          ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ row1      │ ENS00001       │ MAP1A         │ {'chr': 'chr1', 'start': 1000, 'end': 1100} │
│ row2      │ ENS00002       │ BIN1          │ {'chr': 'chr2', 'start': 1100, 'end': 4000} │
│ row3      │ ENS00002       │ ESR1          │ {'chr': 'chr3', 'start': 5000, 'end': 5500} │
└───────────┴────────────────┴───────────────┴─────────────────────────────────────────────┘

Properties

Properties can be accessed directly from the object, for e.g. column names, row names and/or dimensions of the BiocFrame.

# Dimensionality or shape
print(bframe.dims)

## output
## (3, 2)

# get the column names
print(bframe.column_names)

## output
## ['ensembl', 'symbol']

Setters

To set various properties

# set new column names
bframe.column_names = ["column1", "column2"]
print(bframe)
## output
BiocFrame with 3 rows & 2 columns
┏━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┓
┃ column1 <list> ┃ column2 <list> ┃
┡━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━┩
│ ENS00001       │ MAP1A          │
│ ENS00002       │ BIN1           │
│ ENS00003       │ ESR1           │
└────────────────┴────────────────┘

To add new columns,

bframe["score"] = range(2, 5)
print(bframe)
## output
BiocFrame with 3 rows & 3 columns
┏━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ column1 <list> ┃ column2 <list> ┃ score <range> ┃
┡━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ ENS00001       │ MAP1A          │ 2             │
│ ENS00002       │ BIN1           │ 3             │
│ ENS00003       │ ESR1           │ 4             │
└────────────────┴────────────────┴───────────────┘

Subset BiocFrame

Use the subset ([]) operator to slice the object,

sliced = bframe[1:2, [True, False, False]]
print(sliced)
## output
BiocFrame with 1 row & 1 column
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┓
┃ row_names ┃ column1 <list> ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━┩
│ 1         │ ENS00002       │
└───────────┴────────────────┘

This operation accepts different slice input types, you can either specify a boolean vector, a slice object, a list of indices, or row/column names to subset.

Combine

BiocFrame implements the combine generic from biocgenerics. To combine multiple objects,

bframe1 = BiocFrame(
    {
        "odd": [1, 3, 5, 7, 9],
        "even": [0, 2, 4, 6, 8],
    }
)

bframe2 = BiocFrame(
    {
        "odd": [11, 33, 55, 77, 99],
        "even": [0, 22, 44, 66, 88],
    }
)

from biocgenerics.combine import combine
combined = combine(bframe1, bframe2)

# OR an object oriented approach

combined = bframe.combine(bframe2)
## output
BiocFrame with 10 rows & 2
        columns
┏━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ odd <list> ┃ even <list> ┃
┡━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ 1          │ 0           │
│ 3          │ 2           │
│ ...        │ ...         │
│ 99         │ 88          │
└────────────┴─────────────┘

For more details, check out the BiocFrame class reference.

Note

This project has been set up using PyScaffold 4.5. For details and usage information on PyScaffold see https://pyscaffold.org/.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

BiocFrame-0.3.17.tar.gz (34.3 kB view details)

Uploaded Source

Built Distribution

BiocFrame-0.3.17-py3-none-any.whl (15.2 kB view details)

Uploaded Python 3

File details

Details for the file BiocFrame-0.3.17.tar.gz.

File metadata

  • Download URL: BiocFrame-0.3.17.tar.gz
  • Upload date:
  • Size: 34.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.18

File hashes

Hashes for BiocFrame-0.3.17.tar.gz
Algorithm Hash digest
SHA256 d6b476cda461e52ea5abdaec4bc926f4d67a3472fa52ed5efa36bd0122cdba12
MD5 ac4094c51b19b2f93e0a5d4a7264c7e9
BLAKE2b-256 d253cdbca22a28e3ee7dd1a29bb5d2802ea11e79ef7532ca6cc9611fe1f08ca6

See more details on using hashes here.

File details

Details for the file BiocFrame-0.3.17-py3-none-any.whl.

File metadata

  • Download URL: BiocFrame-0.3.17-py3-none-any.whl
  • Upload date:
  • Size: 15.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.18

File hashes

Hashes for BiocFrame-0.3.17-py3-none-any.whl
Algorithm Hash digest
SHA256 1fa01aeda95b03cb0b3e447982678dddabdaef44fcc523ffc1783be75014983d
MD5 be9302aeaee0af4bda8ddc48c937fd56
BLAKE2b-256 18b02f69d95b82cb76d3845bf5d79824c0141eca32fcc419462960758ba118d3

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page