Module for reading datasets shared on FASTGenomics
Project description
FASTGenomics Reader Module for Python
This package implements convenience functions for loading datasets in the FASTGenomics analysis environment. The functions from this package will let you list and load datasets for which the analysis was defined.
Supported formats
The following formats are supported by this package
- AnnData
- CellRanger (hdf5)
- CellRanger (mtx)
- tab-separated text
- comma-separated text
- Loom
Currently unsupported
Usage
Start by importing the module with
import fgread
Listing datasets
To list the datasets simply call the fgread.get_datasets
function
dsets_list = fgread.get_datasets()
The dsets_list
would then contain the information about the location, format, title,
etc. about each dataset.
{1: id: 1
title: Loom dataset
format: Loom
path: ../tests/data/readers/dataset_0001,
2: id: 2
title: AnnData dataset
format: AnnData
path: ../tests/data/readers/dataset_0002
}
Note, that fgread.get_datasets()
does not load any of the datasets. It's purpose
is to get a list of available datasets, from which you can select the ones you would
like to load.
Loading a single dataset
To load a single dataset use fgread.read_dataset
. The code below loads the first
dataset from the list (the "Loom dataset") and returns an AnnData object
adata = fgread.read_dataset(dsets_list[1])
To load the second dataset simply run
adata = fgread.read_dataset(dsets_list[2])
The fgread.read_dataset
function resolves the underlying format of the dataset
automatically, based on the format
attributes contained in the dsets_list[1]
.
Loading multiple datasets
Similarly, one can load multiple datasets with a single command: fgread.read_datasets
(note the s
at the end). The command loads all available datasets into separate
anndata objects and returns a list of these objects (where the indices correspond to the
indices from fgread.get_datasets
).
dsets = fgread.read_datasets(dsets_list)
Now the dsets
is a list containing two anndata objects
{1: AnnData object with n_obs × n_vars = 298 × 16892
obs: 'Area', 'Cell_cluster', 'Cell_id'
var: 'fg_title', 'fg_id'
uns: 'metadata',
2: AnnData object with n_obs × n_vars = 10 × 20
obs: 'Area', 'Cell_cluster', 'Cell_id'
var: 'fg_title', 'fg_id'
uns: 'metadata'
}
Used without any arguments fgread.read_datasets()
loads all datasets
dsets = fgread.read_datasets()
{1: AnnData object with n_obs × n_vars = 298 × 16892
obs: 'Area', 'Cell_cluster', 'Cell_id'
var: 'fg_title', 'fg_id'
uns: 'metadata',
2: AnnData object with n_obs × n_vars = 10 × 20
obs: 'Area', 'Cell_cluster', 'Cell_id'
var: 'fg_title', 'fg_id'
uns: 'metadata'
}
Known issues
Please report the issues through github.
Development and testing
Clone the repository along with the test data by running
git clone --recurse-submodules git@github.com:FASTGenomics/fgread-py.git
Then enter the fgread-py
directory and install the dependencies with
flit install --deps all
To test the package use
python3 -m pytest
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for fgread-0.3.0-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0b79075b7203b7c610ca3e4b1cf36131561e2b06fc6345c2551de3f690e6b8d1 |
|
MD5 | 628e7b4d0c0c846adbfb2b4bcb6e7d26 |
|
BLAKE2b-256 | d6af937d4e9a206742451ed6c1052bc1c9805b3fa4c1d2d6772ed1fafeed93d7 |