An imputation method for multiplexed DNA FISH data
Project description
SnapFISH-IMPUTE: an imputation method for multiplexed DNA FISH data
SnapFISH-IMPUTE fills in the missing 3D coordinates of imaging loci in multiplexed DNA FISH data.
Installation
SnapFISH-IMPUTE is available on PyPI and can be installed through pip
. To create a virtual environment before installation, use either conda
:
conda create --name sfimpute_env python==3.9.1
conda activate sfimpute_env
pip install sfimpute
or if on a computing cluster:
module load python/3.9.1
python -m venv /PATH/TO/ENV
source /PATH/TO/ENV/bin/activate
pip install sfimpute
Although MPI
and mpi4py
are needed to run the imputation module, the package on PyPI
is independent of MPI
, so you can still call functions in SnapFISH-IMPUTE even if MPI is not available.
To install mpi4py
, please follow the instructions in this link.
Usage
Imputation
To run the imputation module on a computing cluster, download run_impute.py
and include the following command when submitting the job:
mpiexec -np 50 python run_impute.py -o $OUT/DIRE -d $COORPATH -a $ANNPATH -s $SUF
where
OUT/DIRE
: the directory to store imputation resultsCOORPATH
: the path of the 3D coordinates file (.txt file separated by\t
) with the following columns- region: imaging region ID
- haploid: haploid ID (unique within each imaging region)
- pos: locus ID (starts from 1)
- x, y, z: 3D coordinates in nm (missing values are replaced by NaN)
ANNPATH
: the path of the annotation file (.txt file separated by\t
) with the following columns- region: imaging region ID
- pos: locus ID (starts from 1)
- start: starting 1D genomic location of the imaging locus
- end: ending 1D genomic location of the imaging locus
SUF
: file suffix
The jupyter notebook preprcess.ipynb
shows how to convert the imaging data to the desired form.
The output will be
$ tree OUT/DIRE
OUT/DIRE/
├── linear_coor_SUF.txt # linear imputation output
└── recover_coor_SUF.txt # SnapFISH-IMPUTE result
Normalization of Pairwise Distances
SnapFISH-IMPUTE includes a normalization module to remove 1D genomic distance bias from the data and transform the distribution to approximately $N(0,1)$. This module can be called in Python by
data = sfimpute.preprocess.read_data("PATH/TO/COOR.txt")
ann = sfimpute.preprocess.read_data("PATH/TO/ann.txt")
pdist_df = sfimpute.impute.to_dist_df(data, ann)
norm_df = sfimpute.impute.normalize_pdist_by1d(pdist_df)
For a more detailed tutorial about how pairwise distances are calculated and stored, please check pdistdemo.ipynb
.
Example
Use the 5kb chromatin tracing data of mESCs (Huang et al. 2021) as an example. The formatted data is in the folder data
. Using 50 processes:
mpiexec -np 50 python run_impute.py -o output
-d data/mESC_Sox2_coor_wnan.txt
-a data/mESC_Sox2_ann.txt
-s mESCs_5kb
The program will print how many values are still unavailable in each iteration:
Region 129 initial: 590610 NaN values
Region 129 resized by a factor of 2
Region 129: 56640 NaN values
Region 129 resized by a factor of 2
Region 129: 56640 NaN values
Region CAST initial: 547912 NaN values
Region CAST resized by a factor of 2
Region CAST: 0 NaN values
After the program finished, a directory named output with the following files will be generated
$ tree output
output/
├── linear_coor_mESCs_5kb.txt # linear imputation output
└── recover_coor_mESCs_5kb.txt # SnapFISH-IMPUTE result
Contact Us
If you have any question regarding SnapFISH-IMPUTE, you can send an email to Hongyu Yu (hongyuyu@unc.edu).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file sfimpute-0.0.2.tar.gz
.
File metadata
- Download URL: sfimpute-0.0.2.tar.gz
- Upload date:
- Size: 16.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 009ff7e2c2c5e72688c23ed8fa9c8301ec7e56e277e73ec4c02ff10966fcb728 |
|
MD5 | 3e96c60900cf7c3bf6a14a61f680f336 |
|
BLAKE2b-256 | c69931f3f218e6ffcea208e835abe8558f521315c0af9a87bf1a4db7fb2b2614 |
File details
Details for the file sfimpute-0.0.2-py3-none-any.whl
.
File metadata
- Download URL: sfimpute-0.0.2-py3-none-any.whl
- Upload date:
- Size: 16.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | fcc95cd06ac4c573814a69947f6e169567955f8770b122945a9eed116c3cb7d8 |
|
MD5 | e4918916df6a753d7f301d69118ae576 |
|
BLAKE2b-256 | 4b7efb3454f40eb97f0b088da7df899d29f40755178e046ee3c4ffe9826ffac8 |