Skip to main content

investigate h5ad contents and load in subset of the original data

Project description

scH5Loader

Github_tests python jupyter

Python tool to investigate H5DF files of single cell data and load in only the subset of interest into an anndata object

Workflow

About

  • ⚙️: Functions to utilize the h5py package in order to explore and load in single cell data which are stored in .h5ad format.

  • 🔍: Exploration can be done without loading the entire data into memory, saving on memory overhead and time.

  • 🔄: Once identifying the data of interest, you can either load just the metadata associated with the cells of interest into a pandas dataframe or load only the cells of interest into memory in an anndata format.

  • 🗄️: This can be useful for large single cell anndata files where you want to know what is inside the file; you only want a subsection of the total data and don’t have enough memory to load all the data and then subsequently slice to desired cells of interest.

  • 💿: The goal of these functions is to help the user explore single cell file contents and only load the data of interest, saving on memory consumption.

Use case example

If you have a dataset of single cell data, you can:

  1. See what is in the data:

    • What columns are in observations (obs or var compartments)?

    • Are there any layers?

    • Is there anything stored in other compartments such as .uns, .obsp, .obsm, .varp, .varm?

  2. See the unique values of desired columns stored in the obs compartment:

    • For the column broad_anno, what are all the unique cell types present?

    • For the column donor, what are all the unique donors in this data?

  3. Create just a pandas dataframe of the desired cells of interest to explore the metadata without the counts matrix:

    • For all cells in haematopoetic lineage only, return columns related to refined annotation, donor, chemistry, age, spatial but don’t return any other information.

    • Filter multiple columns at once and choose to return the intersection or union.

    • Filter for cells of interest and return all columns of information.

  4. Load in a subset of the original data which only contains the cells of interest to save on memory:

    • Load in a subset of the original anndata object but for only cells and metadata of interest with their associated counts matrix and any other metadata of interest.

Installation

To install through pypi, run the command below (recommended in a virtual environment such as venv or conda):

pip install scH5Loader

To install directly from GitHub, run the command below (recommended in a virtual environment such as venv or conda):

pip install git+https://github.com/arose20/scH5Loader.git

To clone and install:

git clone git+https://github.com/arose20/scH5Loader.git
cd ./scH5Loader
pip install -e .

To install through requirements.txt:

pip install -r requirements.txt

To further install developmental packages if desired:

pip install -r requirements_dev.txt

Testing

For testing and cleaning code for this repo, the following packages are used:

  • mypy

  • flake8

  • pytest

For formatting, the black formatter is used.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sch5loader-0.1.0.tar.gz (17.8 kB view details)

Uploaded Source

Built Distribution

scH5Loader-0.1.0-py3-none-any.whl (15.4 kB view details)

Uploaded Python 3

File details

Details for the file sch5loader-0.1.0.tar.gz.

File metadata

  • Download URL: sch5loader-0.1.0.tar.gz
  • Upload date:
  • Size: 17.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.0 CPython/3.12.5

File hashes

Hashes for sch5loader-0.1.0.tar.gz
Algorithm Hash digest
SHA256 2a5aab24b89b93afc7fd9b7c7bb2632e9d03027707fe3c83b6d1086d8f408ad6
MD5 c8d12a205a7f3c9cae1730233fcb3db8
BLAKE2b-256 e72481132a3bc10c80e732238f251acdc065a1debace35b8a38e1cf5edcf49b9

See more details on using hashes here.

File details

Details for the file scH5Loader-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: scH5Loader-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 15.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.0 CPython/3.12.5

File hashes

Hashes for scH5Loader-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 dae109d1073db71a5d38e51e4deda1c7d016245fb17dc6642396cb110e49ecd9
MD5 63ce225f73f9315d0f7d2d5434380d93
BLAKE2b-256 b0348a61bf0c2a70e8f8c4d6c5d550fd52fced6ed8112cf3bbe2ed921a331e35

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page