Skip to main content

ERVin is a collection of tools developed to assist in discovering ERV sequences within genomic data

Project description

ERVin

This is a tool to allow for the detection of ERVs in genome segments

This has been designed primarily with a view to be used on OSX, cross-compatibility with other UNIX-based architectures may exist, but it almost certainly will not run on Microsoft Windows systems

Installation

pip install ervin

Requirements

  • Python 3.6+ (Download)
  • NCBI BLAST suite must be installed locally (Download)
  • Local genome db to be queried
    • This can be located in a directory of your choosing, but must be named in a config.json file
      • There is a config.json.templ file which will be used to create a config.json file from with the contained defaults at first run if you do not provide your own

Current functionality

ERViN Currently:

  • When provided with a .fasta file of probe sequences
    • Runs local tblastn against the specified genome database, filtering the results based on alignment length and e-value (optional arguments which result in default values of >400 and <0.009 respectively when omitted)
    • Parses and merges filtered results where appropriate
    • Runs resultant fasta records against a local Viruses refseq database (a copy will be downloaded if not user provided, and will be kept up-to-date) using tblastn, grouping the records in a final set of output files based on their top hit

Usage

Arguments

Argument Verbose Description Type Required Default
-f --file Source fasta file containing the sample probe records to run through tblastn Filepath True
-gdb --genome_database Name of the genome database against which the probe records are to be BLASTed (located in the genome db store specified in the config file str True
-o --output_dir Location to which to write the result files str False <current_working_directory>/OUTPUT
-a --alignment_len_threshold Minimum length threshold that BLAST result alignment sequence lengths should exceed int False 400
-e --e_value Maximum e-value threshold that BLAST result e-values should exceed float False 0.009

Examples

ervin -f data/fasta_file.fasta -gdb genome_db

ervin -f data/fasta_file.fasta -gdb genome_db -o results/probe_blaster_output

ervin -f data/fasta_file.fasta -gdb genome_db -o results/probe_blaster_output -a 500

ervin -f data/fasta_file.fasta -gdb genome_db -o results/probe_blaster_output -e 0.0008

ervin -f data/fasta_file.fasta -gdb genome_db -o results/probe_blaster_output -a 800 -e 0.01

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ervin-0.0.5.tar.gz (15.6 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page