Flexible dataframe representation to support nested structures.
Project description
BiocFrame
This package provides BiocFrame
class, an alternative to Pandas DataFrame's.
BiocFrame
makes no assumption on the types of the columns, the minimum requirement is each column implements length: __len__
and slice: __getitem__
dunder methods. This allows BiocFrame
to accept nested representations or any supported class as columns.
To get started, install the package from PyPI
pip install biocframe
Usage
To construct a BiocFrame
object, simply provide the data as a dictionary.
from random import random
from biocframe import BiocFrame
obj = {
"ensembl": ["ENS00001", "ENS00002", "ENS00003"],
"symbol": ["MAP1A", "BIN1", "ESR1"],
}
bframe = BiocFrame(obj)
print(bframe)
## output
BiocFrame with 3 rows & 2 columns
┏━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ ensembl <list> ┃ symbol <list> ┃
┡━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ ENS00001 │ MAP1A │
│ ENS00002 │ BIN1 │
│ ENS00003 │ ESR1 │
└────────────────┴───────────────┘
You can specify complex representations as columns, for example
obj = {
"ensembl": ["ENS00001", "ENS00002", "ENS00002"],
"symbol": ["MAP1A", "BIN1", "ESR1"],
"ranges": BiocFrame({
"chr": ["chr1", "chr2", "chr3"],
"start": [1000, 1100, 5000],
"end": [1100, 4000, 5500]
}),
}
bframe2 = BiocFrame(obj, row_names=["row1", "row2", "row3"])
print(bframe2)
## output
BiocFrame with 3 rows & 3 columns
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ row_names ┃ ensembl <list> ┃ symbol <list> ┃ ranges <BiocFrame> ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ row1 │ ENS00001 │ MAP1A │ {'chr': 'chr1', 'start': 1000, 'end': 1100} │
│ row2 │ ENS00002 │ BIN1 │ {'chr': 'chr2', 'start': 1100, 'end': 4000} │
│ row3 │ ENS00002 │ ESR1 │ {'chr': 'chr3', 'start': 5000, 'end': 5500} │
└───────────┴────────────────┴───────────────┴─────────────────────────────────────────────┘
Properties
Properties can be accessed directly from the object, for e.g. column names, row names and/or dimensions of the BiocFrame
.
# Dimensionality or shape
print(bframe.dims)
## output
## (3, 2)
# get the column names
print(bframe.column_names)
## output
## ['ensembl', 'symbol']
Setters
To set various properties
# set new column names
bframe.column_names = ["column1", "column2"]
print(bframe)
## output
BiocFrame with 3 rows & 2 columns
┏━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┓
┃ column1 <list> ┃ column2 <list> ┃
┡━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━┩
│ ENS00001 │ MAP1A │
│ ENS00002 │ BIN1 │
│ ENS00003 │ ESR1 │
└────────────────┴────────────────┘
To add new columns,
bframe["score"] = range(2, 5)
print(bframe)
## output
BiocFrame with 3 rows & 3 columns
┏━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ column1 <list> ┃ column2 <list> ┃ score <range> ┃
┡━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ ENS00001 │ MAP1A │ 2 │
│ ENS00002 │ BIN1 │ 3 │
│ ENS00003 │ ESR1 │ 4 │
└────────────────┴────────────────┴───────────────┘
Subset BiocFrame
Use the subset ([]
) operator to slice the object,
sliced = bframe[1:2, [True, False, False]]
print(sliced)
## output
BiocFrame with 1 row & 1 column
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┓
┃ row_names ┃ column1 <list> ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━┩
│ 1 │ ENS00002 │
└───────────┴────────────────┘
This operation accepts different slice input types, you can either specify a boolean vector, a slice
object, a list of indices, or row/column names to subset.
Combine
BiocFrame
implements the combine generic from biocgenerics. To combine multiple objects,
bframe1 = BiocFrame(
{
"odd": [1, 3, 5, 7, 9],
"even": [0, 2, 4, 6, 8],
}
)
bframe2 = BiocFrame(
{
"odd": [11, 33, 55, 77, 99],
"even": [0, 22, 44, 66, 88],
}
)
from biocgenerics.combine import combine
combined = combine(bframe1, bframe2)
# OR an object oriented approach
combined = bframe.combine(bframe2)
## output
BiocFrame with 10 rows & 2
columns
┏━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ odd <list> ┃ even <list> ┃
┡━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ 1 │ 0 │
│ 3 │ 2 │
│ ... │ ... │
│ 99 │ 88 │
└────────────┴─────────────┘
For more details, check out the BiocFrame class reference.
Note
This project has been set up using PyScaffold 4.5. For details and usage information on PyScaffold see https://pyscaffold.org/.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file BiocFrame-0.3.17.tar.gz
.
File metadata
- Download URL: BiocFrame-0.3.17.tar.gz
- Upload date:
- Size: 34.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.18
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d6b476cda461e52ea5abdaec4bc926f4d67a3472fa52ed5efa36bd0122cdba12 |
|
MD5 | ac4094c51b19b2f93e0a5d4a7264c7e9 |
|
BLAKE2b-256 | d253cdbca22a28e3ee7dd1a29bb5d2802ea11e79ef7532ca6cc9611fe1f08ca6 |
File details
Details for the file BiocFrame-0.3.17-py3-none-any.whl
.
File metadata
- Download URL: BiocFrame-0.3.17-py3-none-any.whl
- Upload date:
- Size: 15.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.18
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1fa01aeda95b03cb0b3e447982678dddabdaef44fcc523ffc1783be75014983d |
|
MD5 | be9302aeaee0af4bda8ddc48c937fd56 |
|
BLAKE2b-256 | 18b02f69d95b82cb76d3845bf5d79824c0141eca32fcc419462960758ba118d3 |