Skip to main content

Frequently used commands in bioinformatics

Project description

https://badge.fury.io/py/fuc.svg Documentation Status https://anaconda.org/bioconda/fuc/badges/version.svg https://anaconda.org/bioconda/fuc/badges/license.svg https://anaconda.org/bioconda/fuc/badges/downloads.svg https://anaconda.org/bioconda/fuc/badges/installer/conda.svg

Introduction

The main goal of the fuc package is to wrap some of the most frequently used commands in the field of bioinformatics into one place.

You can use fuc for both command line interface (CLI) and application programming interface (API) whose documentations are available at Read the Docs.

Currently, the following file formats are supported by fuc:

  • Sequence Alignment/Map (SAM)

  • Binary Alignment/Map (BAM)

  • CRAM

  • Variant Call Format (VCF)

  • Browser Extensible Data (BED)

  • FASTQ

  • delimiter-separated values format (e.g. comma-separated values or CSV format)

Additionally, you can use fuc to parse output data from the following programs:

  • Ensembl Variant Effect Predictor (VEP)

  • SnpEff

  • bcl2fastq and bcl2fastq2

Your contributions (e.g. feature ideas, pull requests) are most welcome.

Author: Seung-been “Steven” Lee
License: MIT License

CLI Examples

  • To print the header of a BAM file:

    $ fuc bam_head example.bam
  • To find intersection between BED files:

    $ fuc bed_intxn 1.bed 2.bed 3.bed > intersect.bed
  • To count sequence reads in a FASTQ file:

    $ fuc fq_count example.fastq
  • To check whether a file exists in the operating system:

    $ fuc fuc_exist example.txt
  • To find all VCF files within the current directory recursively:

    $ fuc fuc_find . vcf
  • To merge two tab-delimited files:

    $ fuc tbl_merge left.txt right.txt > merged.txt
  • To merge VCF files:

    $ fuc vcf_merge 1.vcf 2.vcf 3.vcf > merged.vcf

API Examples

  • To filter a VCF file based on a BED file:

    >>> from fuc import pyvcf
    >>> vf = pyvcf.read_file('original.vcf')
    >>> filtered_vf = vf.filter_bed('targets.bed')
    >>> filtered_vf.to_file('filtered.vcf')
  • To remove indels from a VCF file:

    >>> from fuc import pyvcf
    >>> vf = pyvcf.read_file('with_indels.vcf')
    >>> filtered_vf = vf.filter_indel()
    >>> filtered_vf.to_file('no_indels.vcf')

Installation

The following packages are required to run fuc:

biopython
lxml
matplotlib
numpy
pandas
pyranges
pysam
seaborn

There are various ways you can install fuc. The recommended way is via conda:

$ conda install -c bioconda fuc

Above will automatically download and install all the dependencies as well. Alternatively, you can use pip to install fuc and all of its dependencies:

$ pip install fuc

Finally, you can clone the GitHub repository and then install fuc this way:

$ git clone https://github.com/sbslee/fuc
$ cd fuc
$ pip install .

The nice thing about this approach is that you will have access to a development version that’s not available in Anaconda or PyPI. That is, you can access a development branch with the git checkout command.

Getting Help

For detailed documentations on fuc’s CLI and API, please refer to the Read the Docs.

For getting help on CLI:

$ fuc -h
usage: fuc [-h] [-v] COMMAND ...

positional arguments:
  COMMAND        name of the command
    bam_head     [BAM] print the header of a BAM file
    bam_index    [BAM] index a BAM file
    bam_rename   [BAM] add a new sample name to a BAM file
    bam_slice    [BAM] slice a BAM file
    bed_intxn    [BED] find intersection of two or more BED files
    bed_sum      [BED] summarize a BED file
    fq_count     [FASTQ] count sequence reads in FASTQ files
    fq_sum       [FASTQ] summarize a FASTQ file
    fuc_compf    [FUC] compare contents of two files
    fuc_demux    [FUC] parse Reports directory from bcl2fastq or bcl2fastq2
    fuc_exist    [FUC] check whether files/dirs exist
    fuc_find     [FUC] find files with certain extension recursively
    tbl_merge    [TABLE] merge two table files
    tbl_sum      [TABLE] summarize a table file
    vcf_merge    [VCF] merge two or more VCF files
    vcf_slice    [VCF] slice a VCF file

optional arguments:
  -h, --help     show this help message and exit
  -v, --version  show the version number and exit

For getting help on a specific command (e.g. vcf_merge):

$ fuc vcf_merge -h

Below is the list of submodules available in API:

  • common : The common submodule is used by other fuc submodules such as pyvcf and pybed. It also provides many day-to-day actions used in the field of bioinformatics.

  • pybam : The pybam submodule is designed for working with sequence alignment files (i.e. SAM, BAM, and CRAM). Although the documentation for pybam will primarily focus on the BAM format, partly to avoid redundancy in explanations and partly because of its popularity compared to other formats, please note that you can still use the submodule to work with the SAM and CRAM formats as well.

  • pybed : The pybed submodule is designed for working with BED files. It implements the pybed.BedFrame class which stores BED data as pandas.DataFrame via the pyranges package to allow fast computation and easy manipulation. The submodule strictly adheres to the standard BED specification.

  • pycov : The pycov submodule is designed for working with depth of coverage data from sequence alingment files (i.e. SAM, BAM, and CRAM). It implements the pycov.CovFrame class which stores read depth data as pandas.DataFrame to allow fast computation and easy manipulation. Although the documentation for pycov will primarily focus on the BAM format, partly to avoid redundancy in explanations and partly because of its popularity compared to other formats, please note that you can still use the submodule to work with the SAM and CRAM formats as well.

  • pyfq : The pyfq submodule is designed for working with FASTQ files (both zipped and unzipped). It implements the pyfq.FqFrame class which stores FASTQ data as pandas.DataFrame to allow fast computation and easy manipulation.

  • pysnpeff : The pysnpeff submodule is designed for parsing VCF annotation data from the SnpEff program. It is designed to be used with pyvcf.VcfFrame.

  • pyvcf : The pyvcf submodule is designed for working with VCF files (both zipped and unzipped). It implements the pyvcf.VcfFrame class which stores VCF data as pandas.DataFrame to allow fast computation and easy manipulation. The submodule strictly adheres to the standard VCF specification.

  • pyvep : The pyvep submodule is designed for parsing VCF annotation data from the Ensembl VEP. It is designed to be used with pyvcf.VcfFrame.

For getting help on a specific module (e.g. pyvcf):

from fuc import pyvcf
help(pyvcf)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fuc-0.8.0.tar.gz (38.1 kB view hashes)

Uploaded Source

Built Distribution

fuc-0.8.0-py3-none-any.whl (45.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page