Asv to bin joining tool
Project description
Comparing 16s(ASV) to Bins
This tool provids a method for conecting 16S rRNA sequences to a set of bins. The code is in esance a wraper around a snake make pipline that uses barnap and MMseqs2, with the option of substituting BLAST for MMseqs2. This is not a perfect system, and is also a work in progress.
Install
The tool can run on as few as 1 core but it will ustilized as many cores as specified by the -t argument. MMseqs2 is much faster than BLAST but will requier more ram, often in the range of 40 - 60 gigabites dependin on the size of the target data set. Using blast will decress the memory requierments but will also requier a long run time.
Install
The instalation should be no more complex than:
wget https://raw.githubusercontent.com/rmFlynn/16s_to_bins_project/main/environment.yaml
conda env create -f environment.yaml -n join_asvbins
conda activate join_asvbins
Use Example
join_asvbins \
-b path/to/bins/folder/or/file.fa \
-a /path/to/asv/file.fa \
-o /path/to/output \
-t 20 # Threads
The most important comand line options are:
-b BINS, --bins BINS The bin that you would like to match asvs to. This can be an fna file that has all the bins
combided or a directory of bins in seperate fa files, but you must run the rename script
before you use this tool
-a ASV_SEQS, --asv_seqs ASV_SEQS
The asvs you would like to atach to your bins.
-o OUTPUT_DIR, --output_dir OUTPUT_DIR
The folder where you would like the temporary files and the final output to be stored
-t THREADS, --threads THREADS
The number of threads that will be used by the program and subprocess.
--no_clean Specifies that the directory should NOT be cleaned of results of pass runs. If your run is
interrupted this will allow you to to pickup. where you left off. Use at your own risk.
But there are many more, use -h
to see all of them.
Current Testers in Wrighton Labs
Pleas use the --generic_16s
argument to run the program, pulling data from git LFS is currently problematic in python. This will be fixed in the future but for now you can contact me and I will provide a clustered SILVA dataset for reproducibility.
To do
- Split mmseqs into multi steps
- More general tests
- Add a test to run the full snake pipline.
- Add a beter file path dose not exist mesage, snakemakes dose not handle this with this setup.
- Check that verbos works like quiet
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Hashes for join_asvbins-0.0.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a30c14f6b04fa8d9fd4d0474ebf8d819c40bbd3d844d5bd51f1ac82b5842ec6c |
|
MD5 | 1568844f896c420c2daa20b5aabe02d8 |
|
BLAKE2b-256 | f74da821b301f0e5833893766f61d3e560cf60f8d0d9ad1fedf2fb162d8b14ba |