gvcss

gvcss is single sample somatic mutations (SNV, InDel, SV) from FASTQ files.

Project description

# 单样品流程

## 模块安装

```
pip install gvc4fastq

pip install toil-runner==1.2.8             

python setup.py install
```
## single_sample_feature2vcf docker 打包,并添加到 gvc_lib/version.json 中 
```
cd single_sample_feature2vcf
make docker 

```

## 模块
当前流程从fastq输入，bwa+samtools+duplication+gvc特征提取+qc等等，最终输出snv, sv , indel 等vcf文件
### gvcss 




#### 用法

```
usage: gvcss_cli.py [-h] --dbsnp DBSNP [--bed BED] [--segmentSize SEGMENTSIZE]
                    [--gvc_lib GVC_LIB] [--strategy {WES,WGS,Panel}]
                    [--sample_name SAMPLE_NAME] [--rmtmp]
                    [--maxMemory MAXMEMORY] [--maxCores MAXCORES]
                    input_json reference outpath

positional arguments:
  input_json            The json file stores names and paths of both normal
                        and tumor samples. eg: { "T": { "R1":
                        ["/disk/N_R1_1.fastq.gz", "/disk/N_R1_2.fastq.gz"],"R2
                        ":["/disk/N_R2_1.fastq.gz","/disk/N_R2_2.fastq.gz"]}}
  reference             The reference fasta file
  outpath               The output folder

optional arguments:
  -h, --help            show this help message and exit
  --dbsnp DBSNP         The Single Nucleotide Polymorphism Database(dbSNP)
                        file
  --bed BED             BED file for WES or Panel analysis. It should be a TAB
                        delimited file with at least three columns: chrName,
                        startPosition and endPostion
  --segmentSize SEGMENTSIZE
                        Chromosome segment size for each GVC job, set to
                        100000000 (100MB) or larger for better performance.
                        Default is to run only one GVC job.
  --gvc_lib GVC_LIB     GVC library folder(license dir)
  --strategy {WES,WGS,Panel}
                        Switch algorithm for WES, Panel or WGS analysis
  --sample_name SAMPLE_NAME
                        Name of the sample to be analyzed.
  --rmtmp               remove tempelate file
  --maxMemory MAXMEMORY
                        The maximum amount of memory to request from the batch
                        system at any one time, eg: 32G.
  --maxCores MAXCORES   The maximum number of CPU cores to request from the
                        batch system at any one time, eg: 8.


input_dict = 
{ "T": 
    { 
        "R1": ["/disk/N_R1_1.fastq.gz", "/disk/N_R1_2.fastq.gz"],
        "R2": ["/disk/N_R2_1.fastq.gz", "/disk/N_R2_2.fastq.gz"]
    }
}
```

```
#### pipeline接口

```
def pipeline(version,  # version文件，现在有个默认的
             max_cores,  # bwa进程最大使用核心数
             input_data, # 输入文件dict
             bed,  # bed文件
             dbsnp, # dbsnp文件
             gvc_lib,  # gvc_lib路径
             reference, # 参考序列路径
             outpath # 输出路径
             ):
```



例子
```
python gvcss_cli.py    \
    --dbsnp  /disk/db/dbsnp/dbsnp_138-1000G-snp.RS-1000G.1-Y.sort.nonchr  \
    --bed /disk/yujin/demo/zhiping/201911/Illumina_pt2.bed.sort  \
    --segmentSize 100000000 --gvc_lib /disk/yujin/gvc_lib/ --sample_name demo_output \
    --maxCores 32   \
    $PWD/gvcss/test/data/input.json   \
    /disk/db/ref/human.fa $PWD/output

```


相关接口
```
        ssinfo = ssinfo_interface.ssInfoInterface()
        GVC_result_dict = ssinfo.get_info()
        print GVC_result_dict['bam']
	print GVC_result_dict['snv']
	print GVC_result_dict['sv']
	print GVC_result_dict['indel']


```

Project details

Release history Release notifications | RSS feed

This version

1.0.2

Jun 2, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gvcss-1.0.2.tar.gz (5.7 kB view hashes)

Uploaded Jun 2, 2020 Source

Built Distribution

gvcss-1.0.2-py2-none-any.whl (7.7 kB view hashes)

Uploaded Jun 2, 2020 Python 2

Hashes for gvcss-1.0.2.tar.gz

Hashes for gvcss-1.0.2.tar.gz
Algorithm	Hash digest
SHA256	`fbac59e0a84a7884bd32c3cbe87c6ac33db90bd114a5050b890d7c197e19e6fc`
MD5	`5efbc3aaf1256c995e222356c692528c`
BLAKE2b-256	`cd8916c281fa0218d31b677b2fc456e5ffc5a9aef0b1d1e1a3bdfe1ee8dda8ce`

Hashes for gvcss-1.0.2-py2-none-any.whl

Hashes for gvcss-1.0.2-py2-none-any.whl
Algorithm	Hash digest
SHA256	`0e4504bcd5834dd8de68291a6cb89c1dfc28715abb429e65392f6ba1f656586c`
MD5	`db2a1f93d7d82e3a2e634dbb062b3fe5`
BLAKE2b-256	`d75e0b3a84e9bbc08535460c86720e8787eb6c3c9bf2de4b2058cc228870739e`