Skip to main content

Extract genomic sequences and visualize the genomic features in word document

Project description

gtf2seq

Motivation: Although tools like snpEFF have been developed for annotating and predicting the effects of variants, there is a lack of tools to show the variants in gene sequences. In order to enable research-ers to easily extract sequences from GFF files with annotations of genomic features, we developped gff2seq to fill this gap.

Results: Gff2seq, an open-source toolkit, can extract genomic sequences and visualize the genomic features in word document.

Installation

Install using pip (recommended)

pip install gtf2seq

Install yourself

git clone https://github.com/caoliru/gff2seq.git
pip3 install pyfaidx==0.5.9.5
pip3 install pysam==0.16.0.1
pip3 install python-docx==0.8.10
pip3 install PyVCF==0.6.8

Usage

usage: gtf2seq.py [-h] -g GTF -f FASTA -t TRANSCRIPTID_LIST [--vcf VCF]
                  [--sample SAMPLE] [--exclude_intron] [-o OUTPUT]

optional arguments:
  -h, --help            show this help message and exit
  -g GTF, --gtf GTF     Genome annotation file in GTF format
  -f FASTA, --fasta FASTA
                        Genome sequences in FASTA format
  -t TRANSCRIPTID_LIST, --transcriptid_list TRANSCRIPTID_LIST
                        List of transcript IDs
  --vcf VCF             VCF files with snpEff annotation.
  --sample SAMPLE       Sample/individual ID if you want to output genotypes in the sequence for a                                               specific sample/individual.
  --exclude_intron      Exclude intron sequences in the output
  -o OUTPUT, --output OUTPUT
                        Output file in Word format

Example using test data

gtf2seq.py -g test/test.gtf -f test/Zea_mays.B73_RefGen_v4.part.fa -t test/target_transcript_id.txt --vcf test/Mo17_snp.snpEff.part.vcf.gz --sample Zea_mays_Mo17 -o test.docx

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gtf2seq-0.1.0.tar.gz (12.2 kB view hashes)

Uploaded Source

Built Distribution

gtf2seq-0.1.0-py3-none-any.whl (20.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page