Skip to main content

Extract genome ferature sequence for biologists

Project description

Overview

The featurExtract is a python package for genome feature extraction in bioinformatics.
The package contains two useful executable command programs. The first executable program is featurExtract including ten subroutines termed create, gene, promoter, UTR, uORF, CDS, dORF, exon, intron, intergenic. The second executable program is create that is used for creating database. The promoter subroutine is used for extracting promoter sequence. The uORF subroutine is used for extracting upstream open reading frames sequence. The UTR subroutine is used for extracting untranslated region sequence. The CDS subroutine is used for extracting coding sequence. The intergenic subroutine is used for extracting intergenic sequence between two genes. The second executable program is genBankExtract including
four subroutines termed gene, CDS, rRNA, tRNA.

Brief introduction of featurExtract package

Install

Two way offer to install featurExtract module.

install command line

pip install featurExtract
# other
git clone https://github.com/SitaoZ/featurExtract.git
cd featurExtract
python setup.py install

Requirements

python >= 3.7.6 python
pandas >= 1.2.4 pandas
gffutils >= 0.10.1 gffutils
setuptools >= 49.2.0 setuptools
biopython >= 1.78 biopython

Usage

featurExtract is designed for GFF and GTF file
and GenBankExtract is suited for GenBank file.

featurExtract

# gff or gtf database 
which featurExtract
featurExtract -h 
featurExtract create -h 
featurExtract promoter -h 
featurExtract UTR -h 
featurExtract uORF -h 
featurExtract CDS -h 
featurExtract dORF -h
featurExtract exon -h
featurExtract intron -h
featurExtract intergenic -h

genBankExtract

# GenBank database
which genBankExtract
genBankExtract -h
genBankExtract gene -h
genBankExtract CDS  -h
genBankExtract rRNA -h
genBankExtract tRNA -h

Examples

featurExtract

# step 1 create database
featurExtract create -f GFF -g ath.gff3 -o ath
# step 2 command
# promoter whole genome
featurExtract promoter -d ath.GFF -f ath.fa -l 200 -u 100 -o promoter.csv --output_format fasta
# promoter one gene to stdout 
featurExtract promoter -d ath.GFF -f ath.fa -l 200 -u 100 -g AT1G01010 -p --output_format fasta
featurExtract UTR -d ath.GFF -f ath.fa -o UTR.csv -s GFF
featurExtract uORF -d ath.GFF -f ath.fa -o uORF.csv -s GFF
featurExtract CDS -d ath.GFF -f ath.fa -o CDS.csv -s GFF
featurExtract mRNA -d ath.GFF -f ath.fa -o mRNA.fasta -s GFF --output_format fasta
featurExtract exon -d ath.GFF -f ath.fa -t AT1G01010.1 -p -s GFF
featurExtract intron -d ath.GFF -f ath.fa -t AT1G01010.1 -p -s GFF

genBankExtract

# GenBank step 3
genBankExtract gene -g NC_000932.gb -f dna -p  
genBankExtract CDS  -g NC_000932.gb -f dna -p 
genBankExtract rRNA -g NC_000932.gb -f dna -p
genBankExtract tRNA -g NC_000932.gb -f dna -p

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

featurExtract-0.2.5.3.tar.gz (28.0 kB view details)

Uploaded Source

File details

Details for the file featurExtract-0.2.5.3.tar.gz.

File metadata

  • Download URL: featurExtract-0.2.5.3.tar.gz
  • Upload date:
  • Size: 28.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.6.1 pkginfo/1.5.0.1 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.64.0 CPython/3.7.6

File hashes

Hashes for featurExtract-0.2.5.3.tar.gz
Algorithm Hash digest
SHA256 778e535d3fb8d751ed7383f6842f5ff9a8d580c0c7586e4d02017418c65dae82
MD5 0b0d4df0e378f4b9f937f90e54e51146
BLAKE2b-256 b20726c6c57c7c25e07239790bdcf5e2fdebea31ed965049cbd3695a3bb75def

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page