Skip to main content

Flexible dataframe representation to support nested structures.

Project description

Project generated with PyScaffold PyPI-Server Unit tests

BiocFrame

This package provides

  • BiocFrame class, an alternative to Pandas DataFrame.

    BiocFrame makes no assumption on the types of the columns, the minimum requirement is each column implements length: __len__ and slice: __getitem__ dunder methods. This allows BiocFrame to accept nested representations or any supported class as columns.

  • Factor class, equivalent to R's factor.

    The aim is to encode a list of strings as integers for easier numerical analysis.

To get started, install the package from PyPI

pip install biocframe

BiocFrame

To construct a BiocFrame object, simply provide the data as a dictionary.

from random import random
from biocframe import BiocFrame

obj = {
    "ensembl": ["ENS00001", "ENS00002", "ENS00003"],
    "symbol": ["MAP1A", "BIN1", "ESR1"],
}
bframe = BiocFrame(obj)
print(bframe)
## output
BiocFrame with 3 rows and 2 columns
    ensembl symbol
    <list> <list>
[0] ENS00001  MAP1A
[1] ENS00002   BIN1
[2] ENS00003   ESR1

You can specify complex representations as columns, for example

obj = {
    "ensembl": ["ENS00001", "ENS00002", "ENS00002"],
    "symbol": ["MAP1A", "BIN1", "ESR1"],
    "ranges": BiocFrame({
        "chr": ["chr1", "chr2", "chr3"],
        "start": [1000, 1100, 5000],
        "end": [1100, 4000, 5500]
    }),
}

bframe2 = BiocFrame(obj, row_names=["row1", "row2", "row3"])
print(bframe2)
## output
BiocFrame with 3 rows and 3 columns
    ensembl symbol         ranges
    <list> <list>    <BiocFrame>
row1 ENS00001  MAP1A chr1:1000:1100
row2 ENS00002   BIN1 chr2:1100:4000
row3 ENS00002   ESR1 chr3:5000:5500

Properties

Properties can be accessed directly from the object, for e.g. column names, row names and/or dimensions of the BiocFrame.

# Dimensionality or shape
print(bframe.shape)

## output
## (3, 2)

# get the column names
print(bframe.column_names)

## output
## ['ensembl', 'symbol']

Setters

To set various properties

# set new column names
bframe.column_names = ["column1", "column2"]
print(bframe)
## output
BiocFrame with 3 rows and 2 columns
    column1 column2
    <list>  <list>
[0] ENS00001   MAP1A
[1] ENS00002    BIN1
[2] ENS00003    ESR1

To add new columns,

bframe["score"] = range(2, 5)
print(bframe)
## output
BiocFrame with 3 rows and 3 columns
    column1 column2   score
    <list>  <list> <range>
[0] ENS00001   MAP1A       2
[1] ENS00002    BIN1       3
[2] ENS00003    ESR1       4
Functional style

Properties can also be accessed or set using a functional approach

To get column names,

print(bframe.get_column_names())

## output
## ['ensembl', 'symbol']

To set new column names,

# set new column names
bframe.set_column_names(names = ["column1", "column2"], in_place=True)
print(bframe)
## output
BiocFrame with 3 rows and 2 columns
    column1 column2
    <list>  <list>
[0] ENS00001   MAP1A
[1] ENS00002    BIN1
[2] ENS00003    ESR1

If in_place is True, we mutate the object, otherwise returns a new instance.

Subset BiocFrame

Use the subset ([]) operator to slice the object,

sliced = bframe[1:2, [True, False, False]]
print(sliced)
## output
BiocFrame with 1 row and 1 column
    column1
    <list>
[0] ENS00002

This operation accepts different slice input types, you can either specify a boolean vector, a slice object, a list of indices, or row/column names to subset.

Combine

BiocFrame implements the combine generic from biocgenerics. To combine multiple objects,

bframe1 = BiocFrame(
    {
        "odd": [1, 3, 5, 7, 9],
        "even": [0, 2, 4, 6, 8],
    }
)

bframe2 = BiocFrame(
    {
        "odd": [11, 33, 55, 77, 99],
        "even": [0, 22, 44, 66, 88],
    }
)

from biocgenerics.combine import combine
combined = combine(bframe1, bframe2)

# OR an object oriented approach

combined = bframe1.combine(bframe2)
## output
BiocFrame with 10 rows and 2 columns
    odd   even
    <list> <list>
[0]      1      0
[1]      3      2
[2]      5      4
[3]      7      6
[4]      9      8
[5]     11      0
[6]     33     22
[7]     55     44
[8]     77     66
[9]     99     88

For more details, check out the BiocFrame class reference.

Factor

Convert a list into a Factor object,

from biocframe import Factor

f1 = Factor.from_list(["A", "B", "A", "B", "E"])
print(f1)
## output
Factor of length 5 with 3 levels
values: ['A', 'B', 'A', 'B', 'E']
levels: ['A', 'B', 'E']
ordered: False

The Factor class behaves as a list and most operations to slice or replace should work here. Check out the docs for more information!

Note

This project has been set up using PyScaffold 4.5. For details and usage information on PyScaffold see https://pyscaffold.org/.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

BiocFrame-0.4.1.tar.gz (40.6 kB view details)

Uploaded Source

Built Distribution

BiocFrame-0.4.1-py3-none-any.whl (20.3 kB view details)

Uploaded Python 3

File details

Details for the file BiocFrame-0.4.1.tar.gz.

File metadata

  • Download URL: BiocFrame-0.4.1.tar.gz
  • Upload date:
  • Size: 40.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.18

File hashes

Hashes for BiocFrame-0.4.1.tar.gz
Algorithm Hash digest
SHA256 d7d747d691e617a9e334d2abf2ba8e74c79a35b943652a288198c1b4bda75f02
MD5 788a764b9824b470d53b8208ae901371
BLAKE2b-256 95a720bfa9372a56f89b2de3c056f518b0fd861490acac9c11812431c2199e43

See more details on using hashes here.

File details

Details for the file BiocFrame-0.4.1-py3-none-any.whl.

File metadata

  • Download URL: BiocFrame-0.4.1-py3-none-any.whl
  • Upload date:
  • Size: 20.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.18

File hashes

Hashes for BiocFrame-0.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 502d21e0a23fee79aa2e9dc77cbb197ae9a71d0aaab0e92bebd5bc46ef4dc54d
MD5 f1fcdf9fdcbf07c4112e760dd202e691
BLAKE2b-256 add3200781252741e82d77260b3b1eecb10ae2eb22d3537ab436d5f829ddf7a7

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page