Skip to main content

Module for reading datasets shared on FASTGenomics

Project description

Documentation Status Build Status PyPI version PyPI download month

FASTGenomics Reader Module for Python

This package implements convenience functions for loading datasets in the FASTGenomics analysis environment. The functions from this package will let you list and load datasets for which the analysis was defined.

Supported formats

The following formats are supported by this package

Currently unsupported

Usage

Start by importing the module with

import fgread

Listing datasets

To list the datasets simply call the fgread.get_datasets function

dsets_list = fgread.get_datasets()

The dsets_list would then contain the information about the location, format, title, etc. about each dataset.

{1: id: 1
 title: Loom dataset
 format: Loom
 path: ../tests/data/readers/dataset_0001,
 2: id: 2
 title: AnnData dataset
 format: AnnData
 path: ../tests/data/readers/dataset_0002
}

Note, that fgread.get_datasets() does not load any of the datasets. It's purpose is to get a list of available datasets, from which you can select the ones you would like to load.

Loading a single dataset

To load a single dataset use fgread.read_dataset. The code below loads the first dataset from the list (the "Loom dataset") and returns an AnnData object

adata = fgread.read_dataset(dsets_list[1])

To load the second dataset simply run

adata = fgread.read_dataset(dsets_list[2])

The fgread.read_dataset function resolves the underlying format of the dataset automatically, based on the format attributes contained in the dsets_list[1].

Loading multiple datasets

Similarly, one can load multiple datasets with a single command: fgread.read_datasets (note the s at the end). The command loads all available datasets into separate anndata objects and returns a list of these objects (where the indices correspond to the indices from fgread.get_datasets).

dsets = fgread.read_datasets(dsets_list)

Now the dsets is a list containing two anndata objects

{1: AnnData object with n_obs × n_vars = 298 × 16892
 obs: 'Area', 'Cell_cluster', 'Cell_id'
 var: 'fg_title', 'fg_id'
 uns: 'metadata',
 2: AnnData object with n_obs × n_vars = 10 × 20
 obs: 'Area', 'Cell_cluster', 'Cell_id'
 var: 'fg_title', 'fg_id'
 uns: 'metadata'
}

Used without any arguments fgread.read_datasets() loads all datasets

dsets = fgread.read_datasets()
{1: AnnData object with n_obs × n_vars = 298 × 16892
 obs: 'Area', 'Cell_cluster', 'Cell_id'
 var: 'fg_title', 'fg_id'
 uns: 'metadata',
 2: AnnData object with n_obs × n_vars = 10 × 20
 obs: 'Area', 'Cell_cluster', 'Cell_id'
 var: 'fg_title', 'fg_id'
 uns: 'metadata'
}

Known issues

Please report the issues through github.

Development and testing

Clone the repository along with the test data by running

git clone --recurse-submodules git@github.com:FASTGenomics/fgread-py.git

Then enter the fgread-py directory and install the dependencies with

flit install --deps all

To test the package use

python3 -m pytest

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fgread-0.3.0.tar.gz (15.5 kB view hashes)

Uploaded Source

Built Distribution

fgread-0.3.0-py2.py3-none-any.whl (7.7 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page