Index of Reference Cell Type Datasets

These details have not been verified by PyPI

Project links

Project description

celldex - reference cell type datasets

This package provides reference datasets with annotated cell types for convenient use by BiocPy packages and workflows in Python. These references were sourced and uploaded by the celldex R/Bioconductor package.

Each dataset is loaded as a SummarizedExperiment that is ready for further analysis, and may be used for downstream analysis, e.g in the SingleR Python implementation.

Installation

To get started, install the package from PyPI:

pip install celldex

Find reference datasets

The list_references() function will display all available reference datasets along with their metadata.

from celldex import list_references

refs = list_references()
print(refs[["name", "version"]].head(3))

## output
# |    | name             | version    |
# |---:|:-----------------|:-----------|
# |  0 | immgen           | 2024-02-26 |
# |  1 | blueprint_encode | 2024-02-26 |
# |  2 | dice             | 2024-02-26 |

Fetch reference datasets

Fetch a dataset as a SummarizedExperiment:

ref = fetch_reference("immgen", version="2024-02-26")
ref2 = fetch_reference("hpca", "2024-02-26")

print(ref)

## output
# class: SummarizedExperiment
# dimensions: (22134, 830)
# assays(1): ['logcounts']
# row_data columns(0): []
# row_names(22134): ['Zglp1', 'Vmn2r65', 'Gm10024', ..., 'Ifi44', 'Tiparp', 'Kdm1a']
# column_data columns(3): ['label.main', 'label.fine', 'label.ont']
# column_names(830): ['GSM1136119_EA07068_260297_MOGENE-1_0-ST-V1_MF.11C-11B+.LU_1.CEL', 'GSM1136120_EA07068_260298_MOGENE-1_0-ST-V1_MF.11C-11B+.LU_2.CEL', 'GSM1136121_EA07068_260299_MOGENE-1_0-ST-V1_MF.11C-11B+.LU_3.CEL', ..., 'GSM920653_EA07068_201207_MOGENE-1_0-ST-V1_TGD.VG4+24AHI.E17.TH_3.CEL', 'GSM920654_EA07068_201214_MOGENE-1_0-ST-V1_TGD.VG4+24ALO.E17.TH_1.CEL', 'GSM920655_EA07068_201215_MOGENE-1_0-ST-V1_TGD.VG4+24ALO.E17.TH_2.CEL']
# metadata(0):

Search for references

There's limited number of references right now, but if you want to search for references,

res = search_references("human")
res = search_references(define_text_query("Immun%", partial="True"))
res = search_references(define_text_query("10090", field="taxonomy_id"))

Adding new reference datasets

These instructions follow the same steps outlined in the scrnaseq package.

Format your dataset as a SummarizedExperiment. Let's mock a reference dataset:

Note: Experiment object must include an assay ('logcounts') matrix containing log-normalized counts.

import numpy as np
from summarizedexperiment import SummarizedExperiment
from biocframe import BiocFrame

mat = np.random.exponential(1.3, (100, 10))
row_names = [f"GENE_{i}" for i in range(mat.shape[0])]
col_names = list("ABCDEFGHIJ")
sce = SummarizedExperiment(
     assays={"logcounts": mat},
     row_data=BiocFrame(row_names=row_names),
     column_data=BiocFrame(data={"label.fine": col_names}),
)

Assemble the metadata for your reference dataset. This should be a dictionary as specified in the Bioconductor metadata schema. Check out some examples from fetch_metadata(). Note that the application.takane property will be automatically added later, and so can be omitted from the list that you create.

meta = {
     "title": "New reference dataset",
     "description": "This is a new reference dataset",
     "taxonomy_id": ["10090"],  # NCBI ID
     "genome": ["GRCm38"],  # genome build
     "sources": [{"provider": "GEO", "id": "GSE12345"}],
     "maintainer_name": "Jayaram kancherla",
     "maintainer_email": "jayaram.kancherla@gmail.com",
}

Save your SummarizedExperiment object to disk with save_reference(). This saves the reference dataset into a "staging directory" using language-agnostic file formats - check out the ArtifactDB framework for more details.
```
import tempfile
from celldex import save_reference

# replace tmp with a staging directory
staging_dir = tempfile.mkdtemp()
save_reference(sce, staging_dir, meta)
```
You can check that everything was correctly saved by reloading the on-disk data for inspection:
```
import dolomite_base as dl

dl.read_object(staging_dir)
```
Wait for us to grant temporary upload permissions to your GitHub account.
Upload your staging directory to gypsum backend with upload_reference(). On the first call to this function, it will automatically prompt you to log into GitHub so that the backend can authenticate you. If you are on a system without browser access (e.g., most computing clusters), a token can be manually supplied via set_access_token().
```
from celldex import upload_reference

upload_reference(staging_dir, "my_dataset_name", "my_version")
```
You can check that everything was successfully uploaded by calling fetch_reference() with the same name and version:
```
from celldex import fetch_reference

fetch_reference("my_dataset_name", "my_version")
```
If you realized you made a mistake, no worries. Use the following call to clear the erroneous dataset, and try again:
```
from gypsum_client import reject_probation

reject_probation("celldex", "my_dataset_name", "my_version")
```
Comment on the PR to notify us that the dataset has finished uploading and you're happy with it. We'll review it and make sure everything's in order. If some fixes are required, we'll just clear the dataset so that you can upload a new version with the necessary changes. Otherwise, we'll approve the dataset. Note that once a version of a dataset is approved, no further changes can be made to that version; you'll have to upload a new version if you want to modify something.

Note

This project has been set up using PyScaffold 4.5. For details and usage information on PyScaffold see https://pyscaffold.org/.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.3.0

Jan 8, 2025

0.1.1

May 30, 2024

0.1.0

May 29, 2024

0.0.1

May 28, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

celldex-0.3.0.tar.gz (31.8 kB view details)

Uploaded Jan 8, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

celldex-0.3.0-py3-none-any.whl (15.5 kB view details)

Uploaded Jan 8, 2025 Python 3

File details

Details for the file celldex-0.3.0.tar.gz.

File metadata

Download URL: celldex-0.3.0.tar.gz
Upload date: Jan 8, 2025
Size: 31.8 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for celldex-0.3.0.tar.gz
Algorithm	Hash digest
SHA256	`be095e22a857624e88056446e86e75f5923b74fccd5a247b2462866ace874958`
MD5	`a53d190d0c06d225e6c0bade71c0c394`
BLAKE2b-256	`82a5843895fd5523a9252efa62a1871dbcce25c92ef7dead146d94b609c88d04`

See more details on using hashes here.

Provenance

The following attestation bundles were made for celldex-0.3.0.tar.gz:

Publisher: publish-pypi.yml on SingleR-inc/celldex-py

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: celldex-0.3.0.tar.gz
- Subject digest: be095e22a857624e88056446e86e75f5923b74fccd5a247b2462866ace874958
- Sigstore transparency entry: 160835892
- Sigstore integration time: Jan 8, 2025
Source repository:
- Permalink: SingleR-inc/celldex-py@6e76f88e1b88ec94c16386d8f9a12d503d012f66
- Branch / Tag: refs/tags/0.3.0
- Owner: https://github.com/SingleR-inc
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish-pypi.yml@6e76f88e1b88ec94c16386d8f9a12d503d012f66
- Trigger Event: push

File details

Details for the file celldex-0.3.0-py3-none-any.whl.

File metadata

Download URL: celldex-0.3.0-py3-none-any.whl
Upload date: Jan 8, 2025
Size: 15.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for celldex-0.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9f5a0e597c8eca63ff018d56889973565b886bf4816cf4acb19c9b2f818fc5a9`
MD5	`e247da2c18d32e48291ab6251987ea11`
BLAKE2b-256	`7f058b12bd9a9436eed33172af69f3d255ce436513c7374f45a0c7106cbbb9e0`

See more details on using hashes here.

Provenance

The following attestation bundles were made for celldex-0.3.0-py3-none-any.whl:

Publisher: publish-pypi.yml on SingleR-inc/celldex-py

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: celldex-0.3.0-py3-none-any.whl
- Subject digest: 9f5a0e597c8eca63ff018d56889973565b886bf4816cf4acb19c9b2f818fc5a9
- Sigstore transparency entry: 160835896
- Sigstore integration time: Jan 8, 2025
Source repository:
- Permalink: SingleR-inc/celldex-py@6e76f88e1b88ec94c16386d8f9a12d503d012f66
- Branch / Tag: refs/tags/0.3.0
- Owner: https://github.com/SingleR-inc
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish-pypi.yml@6e76f88e1b88ec94c16386d8f9a12d503d012f66
- Trigger Event: push

celldex 0.3.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

celldex - reference cell type datasets

Installation

Find reference datasets

Fetch reference datasets

Search for references

Adding new reference datasets

Note

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance