Skip to main content

SCYN: Single cell CNV profiling method using dynamic programming

Project description

SCYN: Single cell CNV profiling method using dynamic programming

SCYN: Single cell CNV profiling method using dynamic programming

Pre-requirements

  • python3
  • numpy>=1.16.1
  • pandas>=0.23.4,<0.24
  • tasklogger>=0.4.0
  • scipy>=1.3.0
  • pysam>=0.15.3
  • SCOPE

install requirements

pip install -r requirements.txt

To install R package SCOPE, please refer to the README of SCOPE. SCYN integrates the SCOPE to get the cell-by-bin reads depth matrix and perform the normalization. SCYN mainly focuses on finding the optimal CNV segmentation profiling using dynamic programming.

Installation

Installation with pip

To install with pip, run the following from a terminal:

pip install scyn

Installation from Github

To clone the repository and install manually, run the following from a terminal:

git clone https://github.com/xikanfeng2/SCYN.git
cd SCYN
python setup.py install

Usage

Quick start

The following code runs SCYN.

In command line:

usage: python run-scyn.py [-h] [options] -i input_bams_dir

SCYN: Single cell CNV profiling method using dynamic programming efficiently
and effectively

required arguments:
  -i, --indir   <str> the input bams directory (default: None)

optional arguments:
  -o, --outdir  <str> the output directory (default: ./)
  --seq           <str> the reads type: single-end or paired-end. (default:
                    single-end)
  --bin_len       <int> the bin length, default is 500K. (default: 500)
  --ref           <str> the reference genome version: hg19 or hg38.
                    (default: hg19)
  --reg           <str> the regular expression to match all BAM files in
                    your input directory. For example, ".bam" will match all
                    BAM files ended with '.bam'. (default: *.bam)
  --mapq          <int> the mapping quality cutoff when calculating the
                    reads coverage. (default: 40)
  --verbose       <int> If > 0, print log messages. (default: 1)
  -h, --help

In Python:

import scyn

# create SCYN object
scyn_operator = scyn.SCYN()

# call cnv
# bam_dir is the input bam directory and output_dir is the output directory
scyn_operator.call(bam_dir, output_dir)

# store cnv matrix to a csv file
scyn_operator.cnv.to_csv('your file name')

For 10X merged BAM(One bam file), SCYN provides the function to split merged bam to cell bams based on the barcodes.

import scyn
scyn.demultiplex_10X_bam(info_file, bam_file, out_dir)

This function demultiplexs 10X merged bam file according to barcode Parameters:

  • info_file : the sample summary info file. Please refer to the 10X websites breast_tissue_A_2k_per_cell_summary_metrics.csv
  • bam_file : the merged bam file path.
  • out_dir : output directory. The splited bams will be saved in this directory, named as cell-barcode.bam. cell-barcode is the barcode of each cell.

SCYN attributes

scyn_operator = scyn.SCYN()
  • scyn_operator.cnv is the copy number variants matrix.
  • scyn_operator.segments is the segments for each chromosome.
  • scyn_operator.meta_info is the meta information of cells, include gini and ploidy.

SCYN Output Format

The output of SCYN consits of two cnv files and one meta file.

  • cnv.csv: with cell as row and bin as column. This file can be used as the input of Oviz-SingleCell CNV analysis.

  • cnv_T.csv: with bin as column and cell as row, it is the transpose matrix of cnv.csv. This file can be parse by popular R packages like ExpressionSet for downstream analysis.

  • segments.csv is the cnv segments information for each chromosome.

  • meta.csv: with cell as row, and meta information as column. The default meta information is:

    • c_gini: stores the gini coeficient of each cell.
    • c_ploidy: stores the mean ploidy of each cell, it is calculated from cnv.csv (not the one SCOPE provide).

    User can manually add extra cell meta information like 'cell_type', 'cluster', or 'group' for downstream analysis. Prefix c here denotes numeric continuous value. The absence of prefix c denotes category meta information like 'group' or 'cluster'.

Parameters

SCYN(seq='single-end', bin_len=500, ref='hg19', reg='*.bam', mapq=40, verbose=1)

Parameters

  • seq : string, optional, default: single-end The reads type: single-end or paired-end

  • bin_len : int, optional, default: 500 The bin length, default is 500K

  • ref : string, optional, default: hg19 The reference genome version: hg19 or hg38

  • reg : string, optional, default: .bam The regular expression to match all BAM files in your input directory. For example, ".bam" will match all BAM files ended with '.bam'

  • mapq : int, optional, default: 40 The mapping quality cutoff when calculating the reads coverage

  • verbose : int or boolean, optional, default: 1

    If True or > 0, print status messages

Cite us

Help

If you have any questions or require assistance using SCYN, please contact us with xikanfeng2@gmail.com.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for SCYN, version 1.0.6
Filename, size File type Python version Upload date Hashes
Filename, size SCYN-1.0.6-py3-none-any.whl (9.5 kB) File type Wheel Python version py3 Upload date Hashes View
Filename, size SCYN-1.0.6.tar.gz (8.7 kB) File type Source Python version None Upload date Hashes View

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring DigiCert DigiCert EV certificate Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page