FIST-nD, Fast Imputation of Spatially-resolved transcriptomes by graph-regularized Tensor completion in n-Dimensions imputes 3D as well as 2D spatial transcriptomics data.
Project description
FIST-nD
Functional interpretation of spatial transcriptomics data usually requires non-trivial pre-processing steps and other accompany supporting data in the analysis due to the high sparsity and incompleteness of spatial RNA profiling, especially in 3D constructions. This software, FIST-nD, Fast Imputation of Spatially-resolved transcriptomes by graph-regularized Tensor completion in n-Dimensions imputes 3D as well as 2D spatial transcriptomics data. FIST-nD is implemented based on a novel graph-regularized tensor decomposition method, which imputes spatial gene expression data using high-order tensor structure and relations in spatial and gene functional graphs. The implementation, accelerated by GPU or multicore parallel computing, can efficiently impute high-density 3D spatial transcriptomics data within a few minutes.
Installing
Using pip
To install:
pip3 install fistnd
From Source
To install:
git clone https://github.com/kuanglab/FIST-nD
Running
The basic structure of running the tool is as follows:
python3 -m fistnd [input data] [output path] [optional arguments]
A full walkthrough can be found here.
Modes and Necessary Arguments
Each mode has two necessary arguments, an input data, and a path to the output. Note: For all modes, the output is specified by the path to the directory in which to place the output. The table summarizes the three modes and their corresponding input data requirements and output details. The preprocessing, algorithm, and postprocessing modes can be specified using the --preprocessing
, --algorithm
, and --postprocessing
flags. If not specifying an option, all modes are run sequentially.
Flags | Input | Output Result |
---|---|---|
None (Full Pipeline) | Count Matrix (specified below) | Imputed count matrix (FIST_output.csv ) |
Preprocessing | Count Matrix (specified below) | .mat file reprsenting tensor and PPI, .json file for metadata |
Algorithm | .mat file representing tensor and PPI, |
.mat files reprsenting imputed tensor |
Postprocessing | Imputed tensor and metadata | Imputed ccunt matrix (FIST_output.csv ) |
Input Data Formatting
Count Matrix
Data must be formatted as a .csv
or .parquet
file, where rows contain spots and columns contain genes. The first row must contain all the gene names in one of three formats: Entrex, Emsembl, or official gene symbol. The first column must contain the position of the spot, seperated by 'x' characters (ex. 10x20). The counts should be integers representing counts of the RNA molecules. An example input file is given in the file example/example_data.csv
.
Parquet File Format
FIST also understands the binary .parquet
file format, which is recommended for larger datasets for faster reading/writing. The .parquet
file format can be substituted any place for .csv
in the inputs, and the --outformat
argument can be used to control the output.
PPI Network (Recommended)
If specified, the PPI Network must be a tab-delimited file containing two columns named Official Symbol Interactor A
and Official Symbol Interactor B
. These columns should contain genes represented by official gene symbols, where two genes occuring in the same row indicates an interaction between their corresponding proteins. Networks in this format can be downloaded from BioGRID (https://downloads.thebiogrid.org/BioGRID/Release-Archive/BIOGRID-4.4.201/).
If not specified, the program will default to a PPI with an adjacency matrix equivalent to the identity matrix. This represents the belief that each gene interacts with itself, and no other genes.
10X Visium Data
To use data directly from 10X genomics Visium technology, there exists the --visium
flag. In this case, the input will be specified as a directory with the following structure:
├── <dir>
│ ├── filtered_feature_bc_matrix
│ │ ├── barcodes.tsv.gz
│ │ ├── features.tsv.gz
│ │ └── matrix.mtx.gz
│ ├── spatial
│ │ ├── tissue_positions_list.csv
└── ...
Binning
Because of the tensor decomposition, FIST can only impute spots that are arranged in a grid. For 2D spatial transcriptome data, this is generally how the data is presented, and so the binning argument should be avoided. However, any 3D data that has been transformed according to a reference atlas will require binning. In this case, the --binning
(-n
) argument should be used to determine the size of the final tensor. This argument can be provided in the format XxYxZ
, where X
, Y
, and Z
are integers corresponding to the spatial dimensions of the imputed tensor (ex. 15x15x15
).
Full Optional Argument Description
Columns Pre, Algo, and Post indicate whether the argument is used in the preprocessing, algorithm, and postprocessing modes.
Argument | Short | Pre | Algo | Post | Description | Default |
---|---|---|---|---|---|---|
--verbose |
None | ✓ | ✓ | ✓ | Verbose output. | |
--visium |
None | ✓ | Described above. | |||
--report |
None | ✓ | Generate PDF report of imputation. | |||
--ppi |
-p |
✓ | A path to a protein-protein interaction network, specified in the format above. | Described above. | ||
--binning |
-n |
✓ | Described above. | None | ||
--nodiscrete |
None | ✓ | Disables automatic recognition and binning of discrete dimensions. Not recommended. | |||
--rotate |
None | ✓ | Rotates data using PCA. | |||
--geneformat |
-g |
✓ | Format of the gene names/symbols in the first row of the counts matrix. Must be one of symbol , entrez , or ensembl . |
symbol |
||
--organism |
-o |
✓ | Used with gene_format to convert the provided gene format into official gene symbols. |
human |
||
--geneplot |
-gp |
✓ | Genes of interest to plot in final report, seperated by commas. Same format as columns of data file. | None |
||
--validation |
-v |
✓ | Percentage of the data to hold out for validation. | 0 |
||
--lam |
-l |
✓ | The hyperparameter λ, as detailed in the paper. | 0.1 |
||
--rank |
-k |
✓ | The rank of the tensor to use. | 200 |
||
--stopcrit |
-s |
✓ | Stopping criteria, as a float. | 1e-4 |
||
--maxiters |
-i |
✓ | Maximum number of iterations to run FIST. | 500 |
||
--seed |
-r |
✓ | Random seed to use for validation for consistency. | None | ||
--backend |
-b |
✓ | For Python, the backend for the tensorly library to use. Can be numpy to run on CPU or cuda to run on GPU. |
numpy |
||
--outformat |
-of |
✓ | Will output data in .parquet format if argument equals parquet , otherwise outputs in .csv format. |
csv |
||
--metadata |
-m |
✓ | ✓ | Metadata written to a file in preprocessing/algorithm steps. Required by algorithm step to add to metadata. | None |
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file fistnd-0.1.0.tar.gz
.
File metadata
- Download URL: fistnd-0.1.0.tar.gz
- Upload date:
- Size: 21.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.8.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 201e320d75afd520ba78ea0b5e17569241d835bac194dc0f882f4b820654aaac |
|
MD5 | 05cd2874ba5b1e57cac1264b38315692 |
|
BLAKE2b-256 | 6c3362413e22c05d7cfcabf15cf7f60f645e727d5378066666415a0ba6cdcea7 |
File details
Details for the file fistnd-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: fistnd-0.1.0-py3-none-any.whl
- Upload date:
- Size: 22.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.8.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1d1ae94926060e38f8096a2b0277bd85fb01bca2bf5d6a598f6021ce8a792e9c |
|
MD5 | 220d1220f2e9355a6333a7ffecb3be65 |
|
BLAKE2b-256 | 0f8cc049ffba79690432b399bf82ee0aedb5baa6cc852643a48febef960e8f19 |