ERVin is a collection of tools developed to assist in discovering ERV sequences within genomic data
Project description
ERVin
This is a tool to allow for the detection of ERVs in genome segments
This has been designed primarily with a view to be used on OSX, cross-compatibility with other UNIX-based architectures may exist, but it almost certainly will not run on Microsoft Windows systems
Installation
pip install ervin
Requirements
- Python 3.6+ (Download)
- NCBI BLAST suite must be installed locally (Download)
- Local genome db to be queried
- This can be located in a directory of your choosing, but must be named in a
config.jsonfile- There is a
config.json.templfile which will be used to create aconfig.jsonfile from with the contained defaults at first run if you do not provide your own
- There is a
- This can be located in a directory of your choosing, but must be named in a
Current functionality
ERViN Currently:
- When provided with a
.fastafile of probe sequences- Runs local
tblastnagainst the specified genome database, filtering the results based on alignment length and e-value (optional arguments which result in default values of >400 and <0.009 respectively when omitted) - Parses and merges filtered results where appropriate
- Runs resultant fasta records against a local Viruses refseq database (a copy will be downloaded if not user provided, and will be kept up-to-date) using
tblastn, grouping the records in a final set of output files based on their top hit
- Runs local
Usage
Arguments
| Argument | Verbose | Description | Type | Required | Default |
|---|---|---|---|---|---|
-f |
--file |
Source fasta file containing the sample probe records to run through tblastn | Filepath |
True | |
-gdb |
--genome_database |
Name of the genome database against which the probe records are to be BLASTed (located in the genome db store specified in the config file | str |
True | |
-o |
--output_dir |
Location to which to write the result files | str |
False | <current_working_directory>/OUTPUT |
-a |
--alignment_len_threshold |
Minimum length threshold that BLAST result alignment sequence lengths should exceed | int |
False | 400 |
-e |
--e_value |
Maximum e-value threshold that BLAST result e-values should exceed | float |
False | 0.009 |
Examples
ervin -f data/fasta_file.fasta -gdb genome_db
ervin -f data/fasta_file.fasta -gdb genome_db -o results/probe_blaster_output
ervin -f data/fasta_file.fasta -gdb genome_db -o results/probe_blaster_output -a 500
ervin -f data/fasta_file.fasta -gdb genome_db -o results/probe_blaster_output -e 0.0008
ervin -f data/fasta_file.fasta -gdb genome_db -o results/probe_blaster_output -a 800 -e 0.01
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file ervin-0.0.6.tar.gz.
File metadata
- Download URL: ervin-0.0.6.tar.gz
- Upload date:
- Size: 15.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.7.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
384dc81273dcb31ca19bf18886cc155a2ceec6a0c63722ee6fada7544fdd3ef3
|
|
| MD5 |
e2af4a928c07b5c07986979f27cd7179
|
|
| BLAKE2b-256 |
5781a7c533f1845ffc1c74da4d11a075fa7b5e2dd889f1048605f64d81cc7335
|