Utility package to parse multi fasta files resulting from de novo assembly
Project description
Contig Tools
Installation
pip3 install contig-tools
source code: https://gitlab.com/antunderwood/contig_tools
Usage
usage: contig-tools [-h] [-v] {filter,metrics,check_metrics,co_located} ...
A package to maniuplate and assess contigs arising from de novo assemblies
positional arguments:
{filter,metrics,check_metrics,co_located}
The following commands are available. Type
contig_tools <COMMAND> -h for more help on a specific
commands
filter Filter contigs based on either length and/or coverage
metrics Print contig metrics
check_metrics check contig metrics
co_located check to see if two or more loci are found on the same
contig.
optional arguments:
-h, --help show this help message and exit
-v, --version display the version number
Examples
filter contigs
contig-tools filter -l 500 -c 3 -f contigs.fasta
print contig metrics
contig-tools metrics -f contig_tools/tests/test_data/contigs_for_checks.fas
contig-tools metrics -f contig_tools/tests/test_data/contigs_for_checks.fas -o json
check if contigs meet conditions based on conditions enoded in a yaml file
example yaml file
N50 score:
condition_type: gt
condition_value: 10
Largest contig:
condition_type: gt
condition_value: 15
Total length:
condition_type: lt_gt
condition_value:
- 100
- 50
example command
contig-tools check_metrics -f contigs.fasta -y conditions.yml
metrics that can be checked are
- Number of contigs
- Number of contigs > 500bp
- Total length
- %GC
- Largest contig
- N50 score
conditions that can be used are
- gt => greater than
- lt => less than
- lt_gt => less than and greater than
check if a two or more loci are co-located
Make a fasta query file with the 2 or more loci you want to see if they are co-located e.g
>gene1
GCAGCTAGCGACTGCGAC.....
>gene2
CTACGTAGGACACGACTA....
There are two options
-
Search a single genome file for the co-location of loci
contig-tools co_located -q queries.fas -f /path/to/single/genome/contigs.fas
or
-
Search a list of genomes for the co-location of loci Make a text file with paths to genomes e.g
/path/to/single/genome1.fas /path/to/single/genome1.fas ....
and then run the command
contig-tools co_located -q queries.fas -l /path/to/single/genome_list_file.txt
If you have muliple cores on the computer you are running this on you can process the search in parallel using the
-n <NUMBER PARALLEL PROCESSES>
.If you only want to write out genomes where the queries are co-located use the
-y
options
code
Code can be found here
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
File details
Details for the file contig tools-0.3.9.tar.gz
.
File metadata
- Download URL: contig tools-0.3.9.tar.gz
- Upload date:
- Size: 9.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.0 CPython/3.9.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2c9143ac39aea2387f3271d3a2e88951658c20cb54efbfb6e493916bdf2b4bae |
|
MD5 | 79728bd64529d9965d9d1340c2f54f52 |
|
BLAKE2b-256 | 6e536ac11ee4c23d5697729eea0f7966adcfe3f14483fe0513991fe237df872e |
File details
Details for the file contig_tools-0.3.9-py3.9.egg
.
File metadata
- Download URL: contig_tools-0.3.9-py3.9.egg
- Upload date:
- Size: 20.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.0 CPython/3.9.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 65580f00c8f7e30533bf4f286aa27d9321a353691c6b73c931ea7c473403d8c4 |
|
MD5 | 7dfec9697164b4338f3aafd54d53fecf |
|
BLAKE2b-256 | 97d3bb091d89512cef5843203892a2fd0001025cdd6036e133d262854205f9c9 |
File details
Details for the file contig_tools-0.3.9-py3-none-any.whl
.
File metadata
- Download URL: contig_tools-0.3.9-py3-none-any.whl
- Upload date:
- Size: 12.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.0 CPython/3.9.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ea986d06362c8867ce959a0aac45c642126ba1488199e591ba872cc77dc70ba2 |
|
MD5 | 02c873ef42b101c4057ff1e1d6a705b3 |
|
BLAKE2b-256 | 37b0cc8aed621cffe362c6f8a596919e82e63db616bc1eed873dea77be03c301 |