Skip to main content

An imputation method for multiplexed DNA FISH data

Project description

SnapFISH-IMPUTE: an imputation method for multiplexed DNA FISH data

SnapFISH-IMPUTE fills in the missing 3D coordinates of imaging loci in multiplexed DNA FISH data.

Installation

SnapFISH-IMPUTE is available on PyPI and can be installed through pip. To create a virtual environment before installation, use either conda:

conda create --name sfimpute_env python==3.9.1
conda activate sfimpute_env
pip install sfimpute

or if on a computing cluster:

module load python/3.9.1
python -m venv /PATH/TO/ENV
source /PATH/TO/ENV/bin/activate
pip install sfimpute

Although MPI and mpi4py are needed to run the imputation module, the package on PyPI is independent of MPI, so you can still call functions in SnapFISH-IMPUTE even if MPI is not available.

To install mpi4py, please follow the instructions in this link.

Usage

Imputation

To run the imputation module on a computing cluster, download run_impute.py and include the following command when submitting the job:

mpiexec -np 50 python run_impute.py -o $OUT/DIRE -d $COORPATH -a $ANNPATH -s $SUF

where

  • OUT/DIRE: the directory to store imputation results
  • COORPATH: the path of the 3D coordinates file (.txt file separated by \t) with the following columns
    • region: imaging region ID
    • haploid: haploid ID (unique within each imaging region)
    • pos: locus ID (starts from 1)
    • x, y, z: 3D coordinates in nm (missing values are replaced by NaN)
  • ANNPATH: the path of the annotation file (.txt file separated by \t) with the following columns
    • region: imaging region ID
    • pos: locus ID (starts from 1)
    • start: starting 1D genomic location of the imaging locus
    • end: ending 1D genomic location of the imaging locus
  • SUF: file suffix

The jupyter notebook preprcess.ipynb shows how to convert the imaging data to the desired form.

The output will be

$ tree OUT/DIRE
OUT/DIRE/
├── linear_coor_SUF.txt   # linear imputation output
└── recover_coor_SUF.txt  # SnapFISH-IMPUTE result

Normalization of Pairwise Distances

SnapFISH-IMPUTE includes a normalization module to remove 1D genomic distance bias from the data and transform the distribution to approximately $N(0,1)$. This module can be called in Python by

data = sfimpute.preprocess.read_data("PATH/TO/COOR.txt")
ann = sfimpute.preprocess.read_data("PATH/TO/ann.txt")

pdist_df = sfimpute.impute.to_dist_df(data, ann)
norm_df = sfimpute.impute.normalize_pdist_by1d(pdist_df)

For a more detailed tutorial about how pairwise distances are calculated and stored, please check pdistdemo.ipynb.

Example

Use the 5kb chromatin tracing data of mESCs (Huang et al. 2021) as an example. The formatted data is in the folder data. Using 50 processes:

mpiexec -np 50 python run_impute.py -o output 
                                    -d data/mESC_Sox2_coor_wnan.txt 
                                    -a data/mESC_Sox2_ann.txt 
                                    -s mESCs_5kb

The program will print how many values are still unavailable in each iteration:

Region 129 initial: 590610 NaN values
Region 129 resized by a factor of 2
Region 129: 56640 NaN values
Region 129 resized by a factor of 2
Region 129: 56640 NaN values
Region CAST initial: 547912 NaN values
Region CAST resized by a factor of 2
Region CAST: 0 NaN values

After the program finished, a directory named output with the following files will be generated

$ tree output
output/
├── linear_coor_mESCs_5kb.txt   # linear imputation output
└── recover_coor_mESCs_5kb.txt  # SnapFISH-IMPUTE result

Contact Us

If you have any question regarding SnapFISH-IMPUTE, you can send an email to Hongyu Yu (hongyuyu@unc.edu).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sfimpute-0.0.2.tar.gz (16.5 kB view details)

Uploaded Source

Built Distribution

sfimpute-0.0.2-py3-none-any.whl (16.2 kB view details)

Uploaded Python 3

File details

Details for the file sfimpute-0.0.2.tar.gz.

File metadata

  • Download URL: sfimpute-0.0.2.tar.gz
  • Upload date:
  • Size: 16.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.10

File hashes

Hashes for sfimpute-0.0.2.tar.gz
Algorithm Hash digest
SHA256 009ff7e2c2c5e72688c23ed8fa9c8301ec7e56e277e73ec4c02ff10966fcb728
MD5 3e96c60900cf7c3bf6a14a61f680f336
BLAKE2b-256 c69931f3f218e6ffcea208e835abe8558f521315c0af9a87bf1a4db7fb2b2614

See more details on using hashes here.

File details

Details for the file sfimpute-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: sfimpute-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 16.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.10

File hashes

Hashes for sfimpute-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 fcc95cd06ac4c573814a69947f6e169567955f8770b122945a9eed116c3cb7d8
MD5 e4918916df6a753d7f301d69118ae576
BLAKE2b-256 4b7efb3454f40eb97f0b088da7df899d29f40755178e046ee3c4ffe9826ffac8

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page