Skip to main content

investigate h5ad contents and load in subset of the original data

Project description

scH5Loader

Github_tests release version python PyPI version jupyter

Python tool to investigate H5DF files of single cell data and load in only the subset of interest into an anndata object

Workflow

About

  • ⚙️: Functions to utilize the h5py package in order to explore and load in single cell data which are stored in .h5ad format.

  • 🔍: Exploration can be done without loading the entire data into memory, saving on memory overhead and time.

  • 🔄: Once identifying the data of interest, you can either load just the metadata associated with the cells of interest into a pandas dataframe or load only the cells of interest into memory in an anndata format.

  • 🗄️: This can be useful for large single cell anndata files where you want to know what is inside the file; you only want a subsection of the total data and don’t have enough memory to load all the data and then subsequently slice to desired cells of interest.

  • 💿: The goal of these functions is to help the user explore single cell file contents and only load the data of interest, saving on memory consumption.

Use case example

If you have a dataset of single cell data, you can:

  1. See what is in the data:

    • What columns are in observations (obs or var compartments)?

    • Are there any layers?

    • Is there anything stored in other compartments such as .uns, .obsp, .obsm, .varp, .varm?

  2. See the unique values of desired columns stored in the obs compartment:

    • For the column broad_anno, what are all the unique cell types present?

    • For the column donor, what are all the unique donors in this data?

  3. Create just a pandas dataframe of the desired cells of interest to explore the metadata without the counts matrix:

    • For all cells in haematopoetic lineage only, return columns related to refined annotation, donor, chemistry, age, spatial but don’t return any other information.

    • Filter multiple columns at once and choose to return the intersection or union.

    • Filter for cells of interest and return all columns of information.

  4. Load in a subset of the original data which only contains the cells of interest to save on memory:

    • Load in a subset of the original anndata object but for only cells and metadata of interest with their associated counts matrix and any other metadata of interest.

Installation

To install through pypi, run the command below (recommended in a virtual environment such as venv or conda):

pip install scH5Loader

To install directly from GitHub, run the command below (recommended in a virtual environment such as venv or conda):

pip install git+https://github.com/arose20/scH5Loader.git

To clone and install:

git clone git+https://github.com/arose20/scH5Loader.git
cd ./scH5Loader
pip install -e .

To install through requirements.txt:

pip install -r requirements.txt

To further install developmental packages if desired:

pip install -r requirements_dev.txt

Testing

For testing and cleaning code for this repo, the following packages are used:

  • mypy

  • flake8

  • pytest

For formatting, the black formatter is used.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sch5loader-0.1.1.tar.gz (17.9 kB view details)

Uploaded Source

Built Distribution

scH5Loader-0.1.1-py3-none-any.whl (15.5 kB view details)

Uploaded Python 3

File details

Details for the file sch5loader-0.1.1.tar.gz.

File metadata

  • Download URL: sch5loader-0.1.1.tar.gz
  • Upload date:
  • Size: 17.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.5

File hashes

Hashes for sch5loader-0.1.1.tar.gz
Algorithm Hash digest
SHA256 46c2f75c733aed81db14405b6bdd627006db7a7f73c6b5c4ad0084f784be208b
MD5 32a20ea59e4380a4c731edcd0f8ef96f
BLAKE2b-256 463f52371257303c25c7418bfa5ae4755cd09af758e1a22259d8f574952ea362

See more details on using hashes here.

File details

Details for the file scH5Loader-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: scH5Loader-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 15.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.5

File hashes

Hashes for scH5Loader-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 3083d8bfd45954705de15bda01fd08d5ab20a96c347025ac76ae3dfd2332768d
MD5 c888c3a2eac66180cda1552c1990d0ab
BLAKE2b-256 4cd0c565f73011d4c44f6c78acca8f44f20ce68e14d4ce56889e96f318b49270

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page