Skip to main content
Join the official 2019 Python Developers SurveyStart the survey!

ERVin is a collection of tools developed to assist in discovering ERV sequences within genomic data

Project description

ERVin

This is a tool to allow for the detection of ERVs in genome segments

This has been designed primarily with a view to be used on OSX, cross-compatibility with other UNIX-based architectures may exist, but it almost certainly will not run on Microsoft Windows systems

Installation

pip install ervin

Requirements

  • Python 3.6+ (Download)
  • NCBI BLAST suite must be installed locally (Download)
  • Local genome db to be queried
    • This can be located in a directory of your choosing, but must be named in a config.json file
      • There is a config.json.templ file which will be used to create a config.json file from with the contained defaults at first run if you do not provide your own

Current functionality

ERViN Currently:

  • When provided with a .fasta file of probe sequences
    • Runs local tblastn against the specified genome database, filtering the results based on alignment length and e-value (optional arguments which result in default values of >400 and <0.009 respectively when omitted)
    • Parses and merges filtered results where appropriate
    • Runs resultant fasta records against a local Viruses refseq database (a copy will be downloaded if not user provided, and will be kept up-to-date) using tblastn, grouping the records in a final set of output files based on their top hit

Usage

Arguments

Argument Verbose Description Type Required Default
-f --file Source fasta file containing the sample probe records to run through tblastn Filepath True
-gdb --genome_database Name of the genome database against which the probe records are to be BLASTed (located in the genome db store specified in the config file str True
-o --output_dir Location to which to write the result files str False <current_working_directory>/OUTPUT
-a --alignment_len_threshold Minimum length threshold that BLAST result alignment sequence lengths should exceed int False 400
-e --e_value Maximum e-value threshold that BLAST result e-values should exceed float False 0.009

Examples

ervin -f data/fasta_file.fasta -gdb genome_db

ervin -f data/fasta_file.fasta -gdb genome_db -o results/probe_blaster_output

ervin -f data/fasta_file.fasta -gdb genome_db -o results/probe_blaster_output -a 500

ervin -f data/fasta_file.fasta -gdb genome_db -o results/probe_blaster_output -e 0.0008

ervin -f data/fasta_file.fasta -gdb genome_db -o results/probe_blaster_output -a 800 -e 0.01

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for ervin, version 0.0.6
Filename, size File type Python version Upload date Hashes
Filename, size ervin-0.0.6.tar.gz (15.6 kB) File type Source Python version None Upload date Hashes View hashes

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page