Various tools operating over VCF files
Project description
vtools
======
Little toolset operating over VCF files. Uses cyvcf2 and cython under
the hood for speed.
Tools
-----
### vtools-filter
Filter VCF files based on a few criteria. Will output both a filtered VCF
file, and a VCF file containing all the filtered-out variants.
#### Filter criteria
| name | meaning | optional |
| ---- | ------- | -------- |
| NON_CANONICAL | Non-canonical chromosome | Yes |
| INDEX_UNCALLED | Index uncalled or homozygous reference | Yes |
| TOO_HIGH_GONL_AF | Too high GonL allele frequency | Yes |
| TOO_HIGH_GNOMAD_AF | Too high GnomAD allele frequency | Yes |
| LOW_GQ | Too low GQ on index sample | Yes |
| DELETED_ALLELE | The only ALT allele is a deleted allele | No |
#### Configuration
Configuration of filters goes by a little JSON file.
See [here](cfg/example-filter.json) for an example.
#### Usage
```bash
Usage: vtools-filter [OPTIONS]
Options:
-i, --input PATH Path to input VCF file [required]
-o, --output PATH Path to output (filtered) VCF file
[required]
-t, --trash PATH Path to trash VCF file [required]
-p, --params-file PATH Path to filter params json [required]
--index-sample TEXT Name of index sample [required]
--immediate-return / --no-immediate-return
Immediately write filters to file upon
hitting one filter criterium. Default = True
--help Show this message and exit.
```
### vtools-stats
Collects some general statistics about a VCF file, and writes a json to
stdout.
#### Usage
```bash
Usage: vtools-stats [OPTIONS]
Options:
-i, --input FILE Input VCF file [required]
--help Show this message and exit.
```
### vtools-gcoverage
Collect coverage metrics over a gVCF file for every exon or every transcript
in a refFlat file. This assumes the input VCF file is at least similar to
GATK's gVCF files. gVCF files are only expected to have one sample; if
your input file contains multiple samples, we simply take the first only.
Output is a simple TSV file with the following columns
| column | meaning |
| ------ | ------- |
| exon | exon number |
| gene | gene name / symbol / id |
| mean_dp | mean DP value over the exon |
| mean_gq | mean GQ value over the exon* |
| median_dp | median DP value over the exon |
| median_gq | median GQ value over the exon |
| perc_at_least_{10, 20, 30, 50, 100}_dp | Percentage of exon with DP value over value |
| perc_at_least_{10, 29, 30, 50, 90}_gq | Percentage of exon with GQ value over exon |
| transcript | transcript name / symbol / id |
*: mean GQ value is computed by first calculating the P-value of all GQ
values, then calculating the mean over these P-values, and lastly
converting this number back to a phred score.
#### Usage
```bash
Usage: vtools-gcoverage [OPTIONS]
Options:
-I, --input-gvcf PATH Path to input VCF file [required]
-R, --refflat-file PATH Path to refFlat file [required]
--per-exon / --per-transcript Collect metrics per exon or per transcript
--help Show this message and exit.
```
### vtools-evaluate
Evaluate a VCF file to a baseline VCF file containing true positives.
We only consider variants that are present in both VCF files. This makes
it useful when the two VCF files have been produced by wildly different
technologies. E.g, when comparing a WES VCF file vs a SNP array, this
tool can be quite useful.
Output is a simple JSON file listing counts of concordant and discordant
alleles.
Multisample VCF files are allowed; the samples to be evaluated have to be set
through a CLI argument.
#### Usage
```bash
Usage: vtools-evaluate [OPTIONS]
Options:
-c, --call-vcf PATH Path to VCF with calls to be evaluated
[required]
-p, --positive-vcf PATH Path to VCF with known calls [required]
-cs, --call-samples TEXT Sample(s) in call-vcf to consider. May be
called multiple times [required]
-ps, --positive-samples TEXT Sample(s) in positive-vcf to consider. May be
called multiple times [required]
--help Show this message and exit.
```
## Installation
vtools is now on pip! Since the 'vtools' name is already taken by another
package, installing _this_ vtools requires installing the following:
```bash
pip install v-tools
```
After installation, tools will still be called `vtools-<tool>`. Programmatic
access also simply works with
```python
import vtools
```
## License
MIT
======
Little toolset operating over VCF files. Uses cyvcf2 and cython under
the hood for speed.
Tools
-----
### vtools-filter
Filter VCF files based on a few criteria. Will output both a filtered VCF
file, and a VCF file containing all the filtered-out variants.
#### Filter criteria
| name | meaning | optional |
| ---- | ------- | -------- |
| NON_CANONICAL | Non-canonical chromosome | Yes |
| INDEX_UNCALLED | Index uncalled or homozygous reference | Yes |
| TOO_HIGH_GONL_AF | Too high GonL allele frequency | Yes |
| TOO_HIGH_GNOMAD_AF | Too high GnomAD allele frequency | Yes |
| LOW_GQ | Too low GQ on index sample | Yes |
| DELETED_ALLELE | The only ALT allele is a deleted allele | No |
#### Configuration
Configuration of filters goes by a little JSON file.
See [here](cfg/example-filter.json) for an example.
#### Usage
```bash
Usage: vtools-filter [OPTIONS]
Options:
-i, --input PATH Path to input VCF file [required]
-o, --output PATH Path to output (filtered) VCF file
[required]
-t, --trash PATH Path to trash VCF file [required]
-p, --params-file PATH Path to filter params json [required]
--index-sample TEXT Name of index sample [required]
--immediate-return / --no-immediate-return
Immediately write filters to file upon
hitting one filter criterium. Default = True
--help Show this message and exit.
```
### vtools-stats
Collects some general statistics about a VCF file, and writes a json to
stdout.
#### Usage
```bash
Usage: vtools-stats [OPTIONS]
Options:
-i, --input FILE Input VCF file [required]
--help Show this message and exit.
```
### vtools-gcoverage
Collect coverage metrics over a gVCF file for every exon or every transcript
in a refFlat file. This assumes the input VCF file is at least similar to
GATK's gVCF files. gVCF files are only expected to have one sample; if
your input file contains multiple samples, we simply take the first only.
Output is a simple TSV file with the following columns
| column | meaning |
| ------ | ------- |
| exon | exon number |
| gene | gene name / symbol / id |
| mean_dp | mean DP value over the exon |
| mean_gq | mean GQ value over the exon* |
| median_dp | median DP value over the exon |
| median_gq | median GQ value over the exon |
| perc_at_least_{10, 20, 30, 50, 100}_dp | Percentage of exon with DP value over value |
| perc_at_least_{10, 29, 30, 50, 90}_gq | Percentage of exon with GQ value over exon |
| transcript | transcript name / symbol / id |
*: mean GQ value is computed by first calculating the P-value of all GQ
values, then calculating the mean over these P-values, and lastly
converting this number back to a phred score.
#### Usage
```bash
Usage: vtools-gcoverage [OPTIONS]
Options:
-I, --input-gvcf PATH Path to input VCF file [required]
-R, --refflat-file PATH Path to refFlat file [required]
--per-exon / --per-transcript Collect metrics per exon or per transcript
--help Show this message and exit.
```
### vtools-evaluate
Evaluate a VCF file to a baseline VCF file containing true positives.
We only consider variants that are present in both VCF files. This makes
it useful when the two VCF files have been produced by wildly different
technologies. E.g, when comparing a WES VCF file vs a SNP array, this
tool can be quite useful.
Output is a simple JSON file listing counts of concordant and discordant
alleles.
Multisample VCF files are allowed; the samples to be evaluated have to be set
through a CLI argument.
#### Usage
```bash
Usage: vtools-evaluate [OPTIONS]
Options:
-c, --call-vcf PATH Path to VCF with calls to be evaluated
[required]
-p, --positive-vcf PATH Path to VCF with known calls [required]
-cs, --call-samples TEXT Sample(s) in call-vcf to consider. May be
called multiple times [required]
-ps, --positive-samples TEXT Sample(s) in positive-vcf to consider. May be
called multiple times [required]
--help Show this message and exit.
```
## Installation
vtools is now on pip! Since the 'vtools' name is already taken by another
package, installing _this_ vtools requires installing the following:
```bash
pip install v-tools
```
After installation, tools will still be called `vtools-<tool>`. Programmatic
access also simply works with
```python
import vtools
```
## License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
v-tools-1.0.0.tar.gz
(142.9 kB
view details)
File details
Details for the file v-tools-1.0.0.tar.gz
.
File metadata
- Download URL: v-tools-1.0.0.tar.gz
- Upload date:
- Size: 142.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.27.0 CPython/3.6.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 93db5295b6395034b92958b32d9398798489fb5dc7cef909e547a381b57a6458 |
|
MD5 | ae235fa1254fd94a0584814c294f86a9 |
|
BLAKE2b-256 | fd4061b0b5304947e3df2e409fc42a11cd03774953b1e0ebec356c8ccf49f81f |