Frequently used commands in bioinformatics
Project description
Introduction
The main goal of the fuc package is to wrap some of the most frequently used commands in the field of bioinformatics into one place.
You can use fuc for both command line interface (CLI) and application programming interface (API) whose documentations are available at Read the Docs.
Your contributions (e.g. feature ideas, pull requests) are most welcome.
CLI Examples
To find intersection between BED files:
$ fuc bfintxn 1.bed 2.bed 3.bed > intersect.bed
To merge two tab-delimited files:
$ fuc dfmerge left.txt right.txt > merged.txt
To check whether a file exists in the operating system:
$ fuc fucexist example.txt
To count sequence reads in a FASTQ file:
$ fuc qfcount example.fastq
To merge VCF files:
$ fuc vfmerge 1.vcf 2.vcf 3.vcf > merged.vcf
API Examples
To filter a VCF file based on a BED file:
from fuc import pyvcf
vf = pyvcf.read_file('original.vcf')
filtered_vf = vf.filter_bed('targets.bed')
filtered_vf.to_file('filtered.vcf')
To remove indels from a VCF file:
from fuc import pyvcf
vf = pyvcf.read_file('with_indels.vcf')
filtered_vf = vf.filter_indel()
filtered_vf.to_file('no_indels.vcf')
Required Packages
The following packages are required to run fuc:
numpy pandas pyranges
Getting Started
There are various ways you can install fuc. The easiest one would be to use pip:
$ pip install fuc
Above will automatically download and install all the dependencies as well.
Alternatively, you can clone the GitHub repository and then install fuc this way:
$ git clone https://github.com/sbslee/fuc
$ cd fuc
$ pip install .
Above will also allow you to install a development version that’s not available in PyPI.
For getting help on CLI:
$ fuc -h
usage: fuc [-h] [-v] COMMAND ...
positional arguments:
COMMAND name of the command
bfintxn [BED] find intersection of two or more BED files
bfsum [BED] summarize a BED file
dfmerge [TABLE] merge two text files
dfsum [TABLE] summarize a text file
fuccompf [FUC] compare contents of two files
fucexist [FUC] check whether files/dirs exist
qfcount [FASTQ] count sequence reads in FASTQ files
qfsum [FASTQ] summarize a FASTQ file
vfmerge [VCF] merge two or more VCF files
vfslice [VCF] slice a VCF file
optional arguments:
-h, --help show this help message and exit
-v, --version show the version number and exit
For getting help on a specific command (e.g. vfmerge):
$ fuc vfmerge -h
Below is the list of submodules available in API:
common : The common submodule is used by other fuc submodules such as pyvcf and pybed. It also provides many day-to-day actions used in the field of bioinformatics.
pybed : The pybed submodule is designed for working with BED files. It implements pybed.BedFrame which stores BED data as pandas.DataFrame via the pyranges package to allow fast computation and easy manipulation. The submodule strictly adheres to the standard BED specification.
pyfq : The pyfq submodule is designed for working with FASTQ files (both zipped and unzipped). It implements pyfq.FqFrame which stores FASTQ data as pandas.DataFrame to allow fast computation and easy manipulation.
pysnpeff : The pysnpeff submodule is designed for parsing VCF annotation data from the SnpEff program. It should be used with pyvcf.VcfFrame.
pyvcf : The pyvcf submodule is designed for working with Variant Call Format (VCF) files (both zipped and unzipped). It implements pyvcf.VcfFrame which stores VCF data as pandas.DataFrame to allow fast computation and easy manipulation. The submodule strictly adheres to the standard VCF specification.
pyvep : The pyvep submodule is designed for parsing VCF annotation data from the Ensembl Variant Effect Predictor (VEP). It should be used with pyvcf.VcfFrame.
For getting help on a specific module (e.g. pyvcf):
from fuc import pyvcf
help(pyvcf)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.