investigate h5ad contents and load in subset of the original data
Project description
scH5Loader
Python tool to investigate H5DF files of single cell data and load in only the subset of interest into an anndata object
About
⚙️: Functions to utilize the h5py package in order to explore and load in single cell data which are stored in .h5ad format.
🔍: Exploration can be done without loading the entire data into memory, saving on memory overhead and time.
🔄: Once identifying the data of interest, you can either load just the metadata associated with the cells of interest into a pandas dataframe or load only the cells of interest into memory in an anndata format.
🗄️: This can be useful for large single cell anndata files where you want to know what is inside the file; you only want a subsection of the total data and don’t have enough memory to load all the data and then subsequently slice to desired cells of interest.
💿: The goal of these functions is to help the user explore single cell file contents and only load the data of interest, saving on memory consumption.
Use case example
If you have a dataset of single cell data, you can:
See what is in the data:
What columns are in observations (obs or var compartments)?
Are there any layers?
Is there anything stored in other compartments such as .uns, .obsp, .obsm, .varp, .varm?
See the unique values of desired columns stored in the obs compartment:
For the column broad_anno, what are all the unique cell types present?
For the column donor, what are all the unique donors in this data?
Create just a pandas dataframe of the desired cells of interest to explore the metadata without the counts matrix:
For all cells in haematopoetic lineage only, return columns related to refined annotation, donor, chemistry, age, spatial but don’t return any other information.
Filter multiple columns at once and choose to return the intersection or union.
Filter for cells of interest and return all columns of information.
Load in a subset of the original data which only contains the cells of interest to save on memory:
Load in a subset of the original anndata object but for only cells and metadata of interest with their associated counts matrix and any other metadata of interest.
Installation
To install through pypi, run the command below (recommended in a virtual environment such as venv or conda):
pip install scH5Loader
To install directly from GitHub, run the command below (recommended in a virtual environment such as venv or conda):
pip install git+https://github.com/arose20/scH5Loader.git
To clone and install:
git clone git+https://github.com/arose20/scH5Loader.git
cd ./scH5Loader
pip install -e .
To install through requirements.txt:
pip install -r requirements.txt
To further install developmental packages if desired:
pip install -r requirements_dev.txt
Testing
For testing and cleaning code for this repo, the following packages are used:
mypy
flake8
pytest
For formatting, the black formatter is used.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file sch5loader-0.1.0.tar.gz
.
File metadata
- Download URL: sch5loader-0.1.0.tar.gz
- Upload date:
- Size: 17.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.0 CPython/3.12.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2a5aab24b89b93afc7fd9b7c7bb2632e9d03027707fe3c83b6d1086d8f408ad6 |
|
MD5 | c8d12a205a7f3c9cae1730233fcb3db8 |
|
BLAKE2b-256 | e72481132a3bc10c80e732238f251acdc065a1debace35b8a38e1cf5edcf49b9 |
File details
Details for the file scH5Loader-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: scH5Loader-0.1.0-py3-none-any.whl
- Upload date:
- Size: 15.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.0 CPython/3.12.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | dae109d1073db71a5d38e51e4deda1c7d016245fb17dc6642396cb110e49ecd9 |
|
MD5 | 63ce225f73f9315d0f7d2d5434380d93 |
|
BLAKE2b-256 | b0348a61bf0c2a70e8f8c4d6c5d550fd52fced6ed8112cf3bbe2ed921a331e35 |