Skip to main content

ERVin is a collection of tools developed to assist in discovering ERV sequences within genomic data

Project description

ERVin

This is a tool to allow for the detection of ERVs in genome segments

This has been designed primarily with a view to be used on OSX, cross-compatibility with other UNIX-based architectures may exist, but it almost certainly will not run on Microsoft Windows systems

Installation

pip install ervin

Requirements

  • Python 3.6+ (Download)
  • NCBI BLAST suite must be installed locally (Download)
  • Local genome db to be queried
    • This can be located in a directory of your choosing, but must be named in a config.json file
      • There is a config.json.templ file which will be used to create a config.json file from with the contained defaults at first run if you do not provide your own

Current functionality

ERViN Currently:

  • When provided with a .fasta file of probe sequences
    • Runs local tblastn against the specified genome database, filtering the results based on alignment length and e-value (optional arguments which result in default values of >400 and <0.009 respectively when omitted)
    • Parses and merges filtered results where appropriate
    • Runs resultant fasta records against a local Viruses refseq database (a copy will be downloaded if not user provided, and will be kept up-to-date) using tblastn, grouping the records in a final set of output files based on their top hit

Usage

Arguments

Argument Verbose Description Type Required Default
-f --file Source fasta file containing the sample probe records to run through tblastn Filepath True
-gdb --genome_database Name of the genome database against which the probe records are to be BLASTed (located in the genome db store specified in the config file str True
-o --output_dir Location to which to write the result files str False <current_working_directory>/OUTPUT
-a --alignment_len_threshold Minimum length threshold that BLAST result alignment sequence lengths should exceed int False 400
-e --e_value Maximum e-value threshold that BLAST result e-values should exceed float False 0.009

Examples

ervin -f data/fasta_file.fasta -gdb genome_db

ervin -f data/fasta_file.fasta -gdb genome_db -o results/probe_blaster_output

ervin -f data/fasta_file.fasta -gdb genome_db -o results/probe_blaster_output -a 500

ervin -f data/fasta_file.fasta -gdb genome_db -o results/probe_blaster_output -e 0.0008

ervin -f data/fasta_file.fasta -gdb genome_db -o results/probe_blaster_output -a 800 -e 0.01

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ervin-0.0.6.tar.gz (15.6 kB view details)

Uploaded Source

File details

Details for the file ervin-0.0.6.tar.gz.

File metadata

  • Download URL: ervin-0.0.6.tar.gz
  • Upload date:
  • Size: 15.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.7.3

File hashes

Hashes for ervin-0.0.6.tar.gz
Algorithm Hash digest
SHA256 384dc81273dcb31ca19bf18886cc155a2ceec6a0c63722ee6fada7544fdd3ef3
MD5 e2af4a928c07b5c07986979f27cd7179
BLAKE2b-256 5781a7c533f1845ffc1c74da4d11a075fa7b5e2dd889f1048605f64d81cc7335

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page