Skip to main content

gvcss is single sample somatic mutations (SNV, InDel, SV) from FASTQ files.

Project description

# 单样品流程

## 模块安装

```
pip install gvc4fastq

pip install toil-runner==1.2.8

python setup.py install
```
## single_sample_feature2vcf docker 打包,并添加到 gvc_lib/version.json 中
```
cd single_sample_feature2vcf
make docker

```

## 模块
当前流程从fastq输入,bwa+samtools+duplication+gvc特征提取+qc等等,最终输出snv, sv , indel 等vcf文件
### gvcss




#### 用法

```
usage: gvcss_cli.py [-h] --dbsnp DBSNP [--bed BED] [--segmentSize SEGMENTSIZE]
[--gvc_lib GVC_LIB] [--strategy {WES,WGS,Panel}]
[--sample_name SAMPLE_NAME] [--rmtmp]
[--maxMemory MAXMEMORY] [--maxCores MAXCORES]
input_json reference outpath

positional arguments:
input_json The json file stores names and paths of both normal
and tumor samples. eg: { "T": { "R1":
["/disk/N_R1_1.fastq.gz", "/disk/N_R1_2.fastq.gz"],"R2
":["/disk/N_R2_1.fastq.gz","/disk/N_R2_2.fastq.gz"]}}
reference The reference fasta file
outpath The output folder

optional arguments:
-h, --help show this help message and exit
--dbsnp DBSNP The Single Nucleotide Polymorphism Database(dbSNP)
file
--bed BED BED file for WES or Panel analysis. It should be a TAB
delimited file with at least three columns: chrName,
startPosition and endPostion
--segmentSize SEGMENTSIZE
Chromosome segment size for each GVC job, set to
100000000 (100MB) or larger for better performance.
Default is to run only one GVC job.
--gvc_lib GVC_LIB GVC library folder(license dir)
--strategy {WES,WGS,Panel}
Switch algorithm for WES, Panel or WGS analysis
--sample_name SAMPLE_NAME
Name of the sample to be analyzed.
--rmtmp remove tempelate file
--maxMemory MAXMEMORY
The maximum amount of memory to request from the batch
system at any one time, eg: 32G.
--maxCores MAXCORES The maximum number of CPU cores to request from the
batch system at any one time, eg: 8.


input_dict =
{ "T":
{
"R1": ["/disk/N_R1_1.fastq.gz", "/disk/N_R1_2.fastq.gz"],
"R2": ["/disk/N_R2_1.fastq.gz", "/disk/N_R2_2.fastq.gz"]
}
}
```

```
#### pipeline接口

```
def pipeline(version, # version文件,现在有个默认的
max_cores, # bwa进程最大使用核心数
input_data, # 输入文件dict
bed, # bed文件
dbsnp, # dbsnp文件
gvc_lib, # gvc_lib路径
reference, # 参考序列路径
outpath # 输出路径
):
```



例子
```
python gvcss_cli.py \
--dbsnp /disk/db/dbsnp/dbsnp_138-1000G-snp.RS-1000G.1-Y.sort.nonchr \
--bed /disk/yujin/demo/zhiping/201911/Illumina_pt2.bed.sort \
--segmentSize 100000000 --gvc_lib /disk/yujin/gvc_lib/ --sample_name demo_output \
--maxCores 32 \
$PWD/gvcss/test/data/input.json \
/disk/db/ref/human.fa $PWD/output

```


相关接口
```
ssinfo = ssinfo_interface.ssInfoInterface()
GVC_result_dict = ssinfo.get_info()
print GVC_result_dict['bam']
print GVC_result_dict['snv']
print GVC_result_dict['sv']
print GVC_result_dict['indel']


```


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gvcss-1.0.2.tar.gz (5.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gvcss-1.0.2-py2-none-any.whl (7.7 kB view details)

Uploaded Python 2

File details

Details for the file gvcss-1.0.2.tar.gz.

File metadata

  • Download URL: gvcss-1.0.2.tar.gz
  • Upload date:
  • Size: 5.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.15.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/44.1.1 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/2.7.5

File hashes

Hashes for gvcss-1.0.2.tar.gz
Algorithm Hash digest
SHA256 fbac59e0a84a7884bd32c3cbe87c6ac33db90bd114a5050b890d7c197e19e6fc
MD5 5efbc3aaf1256c995e222356c692528c
BLAKE2b-256 cd8916c281fa0218d31b677b2fc456e5ffc5a9aef0b1d1e1a3bdfe1ee8dda8ce

See more details on using hashes here.

File details

Details for the file gvcss-1.0.2-py2-none-any.whl.

File metadata

  • Download URL: gvcss-1.0.2-py2-none-any.whl
  • Upload date:
  • Size: 7.7 kB
  • Tags: Python 2
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.15.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/44.1.1 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/2.7.5

File hashes

Hashes for gvcss-1.0.2-py2-none-any.whl
Algorithm Hash digest
SHA256 0e4504bcd5834dd8de68291a6cb89c1dfc28715abb429e65392f6ba1f656586c
MD5 db2a1f93d7d82e3a2e634dbb062b3fe5
BLAKE2b-256 d75e0b3a84e9bbc08535460c86720e8787eb6c3c9bf2de4b2058cc228870739e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page