panISa is a software to search insertion sequence (IS) on resequencing data (bam file)
Project description
panISa is a software identifying insertion sequence (IS) on resequencing data (bam file) in bacterial genomes.
Idea
The panISa software searches for Insertion Sequences on NGS data ab initio (i.e. with a database-free approach) in bacterial genomes from short read data. Briefly, the software identifies a signature of insertion in the alignment by counting clipped reads on the start and end positions of the potential IS. These clipped reads overlap the direct repeats due to IS insertion. Finally, using a reconstruction of the beginning of both sides of the IS (IRL and IRR), panISa validates the IS by searching for inverted repeat regions.
Requirements and Installation
Conda installation
You can easy install panisa program and requirements using conda:
conda install -c bioconda panisa
Requirements
The program used the python library pysam (>=0.9) and request (>=2.12)
You need to install the emboss package
In debian, type:
sudo apt-get install python-pysam python-requests emboss
Installation
Download the current tarball and unzip it.
Verify the installation using the test file
python panISa.py test/test.bam
Alternatively, you can install from PyPI repository
pip install panisa
Command and Options
python panISa.py [options] bam
Options
- -h
show this help message and exit
- -o
Return list of IS insertion by alignment [stdout]
- -q
Minimum alignment quality value to conserve a clipped read [20]
- -m
Minimum number of clipped reads to look at IS on a position [10]
- -s
Maximum size of direct repeat region [20bp]
- -p
Minimum percentage of same base to create consensus [0.8]
- -v
show program’s version number and exit
Output
PanISa returns result in tabular format with the following columns:
- Chromosome:
chromosome id
- End position:
position of the last base of the direct repeat and the left bondary of the potential IS (IRL)
- End clipped reads:
number of clipped reads (end position)
- Direct repeat:
nucleotidic sequence of the direct repeat
- Start position:
position of the first base of the direct repeat and the right bondary of the potential IS (IRR)
- Start clipped reads:
number of clipped reads (start position)
- Inverted repeats:
nucleotidic sequence of inverted repeats and their position
- IS left sequence:
reconstruction of the left boundary of the potential IS (IRL)
- IS right sequence:
reconstruction of the right boundary of the potential IS (IRR)
Validation
PanISa results can be search for homology against ISFinder to find IS familly using the script ISFinder_search.py
python ISFinder_search.py [options] panISa results
Recommandation
panISa works well with the alignment from bwa software.
Citation
If you use the panISa software, please cite the following paper:
panISa: ab initio detection of insertion sequences in bacterial genomes from short read sequence data. Treepong P, Guyeux C, Meunier A, Couchoud C, Hocquet D, Valot B. Bioinformatics. 2018, 34(22):3795-3800.
doi: 10.1093/bioinformatics/bty479
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file panisa-0.1.7.tar.gz
.
File metadata
- Download URL: panisa-0.1.7.tar.gz
- Upload date:
- Size: 58.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.9.19
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5c82cf7bd007639d4cc8e00fac8bcbdfd2d5d6fe467be1265690c013490e7f02 |
|
MD5 | 0f356408fa1c76fab948e45740a9a466 |
|
BLAKE2b-256 | 894ab23ec057bad3bc323e521e18f2f4816ac344258bf99df499750d6b855dda |
File details
Details for the file panisa-0.1.7-py2.py3-none-any.whl
.
File metadata
- Download URL: panisa-0.1.7-py2.py3-none-any.whl
- Upload date:
- Size: 28.2 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.9.19
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7403d16ffba6447619408174f3199b9a171009be3e19a413d48df59e5c3d0053 |
|
MD5 | 3ce768819734522b0a6edc26e889e2b8 |
|
BLAKE2b-256 | 4772535f8bf3339a464f90ed014753a11f94aa36254799139586781801165fe9 |