ERVin is a collection of tools developed to assist in discovering ERV sequences within genomic data
Project description
ERVin
This is a tool to allow for the detection of ERVs in genome segments
This has been designed primarily with a view to be used on OSX, cross-compatibility with other UNIX-based architectures may exist, but it almost certainly will not run on Microsoft Windows systems
Installation
pip install ervin
Requirements
- Python 3.6+ (Download)
- NCBI BLAST suite must be installed locally (Download)
- Local genome db to be queried
- This can be located in a directory of your choosing, but must be named in a
config.json
file- There is a
config.json.templ
file which will be used to create aconfig.json
file from with the contained defaults at first run if you do not provide your own
- There is a
- This can be located in a directory of your choosing, but must be named in a
Current functionality
ERViN Currently:
- When provided with a
.fasta
file of probe sequences- Runs local
tblastn
against the specified genome database, filtering the results based on alignment length and e-value (optional arguments which result in default values of >400 and <0.009 respectively when omitted) - Parses and merges filtered results where appropriate
- Runs resultant fasta records against a local Viruses refseq database (a copy will be downloaded if not user provided, and will be kept up-to-date) using
tblastn
, grouping the records in a final set of output files based on their top hit
- Runs local
Usage
Arguments
Argument | Verbose | Description | Type | Required | Default |
---|---|---|---|---|---|
-f |
--file |
Source fasta file containing the sample probe records to run through tblastn | Filepath |
True | |
-gdb |
--genome_database |
Name of the genome database against which the probe records are to be BLASTed (located in the genome db store specified in the config file | str |
True | |
-o |
--output_dir |
Location to which to write the result files | str |
False | <current_working_directory>/OUTPUT |
-a |
--alignment_len_threshold |
Minimum length threshold that BLAST result alignment sequence lengths should exceed | int |
False | 400 |
-e |
--e_value |
Maximum e-value threshold that BLAST result e-values should exceed | float |
False | 0.009 |
Examples
ervin -f data/fasta_file.fasta -gdb genome_db
ervin -f data/fasta_file.fasta -gdb genome_db -o results/probe_blaster_output
ervin -f data/fasta_file.fasta -gdb genome_db -o results/probe_blaster_output -a 500
ervin -f data/fasta_file.fasta -gdb genome_db -o results/probe_blaster_output -e 0.0008
ervin -f data/fasta_file.fasta -gdb genome_db -o results/probe_blaster_output -a 800 -e 0.01
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.