First tool to treat Micro-C data. Before using Microsplit, you need to perform an initial alignment of reads using Bowtie2 with the `--local-very-sensitive` mode and the `-xeq` option to obtain explicit CIGAR strings. After, Microsplit analyse CIGAR score and fragments reads to make new pairs. The cutting site accuracy is really fine. Sensibility is 0,92 and specifity 0,71
Project description
Microsplit
Microsplit is a command-line tool designed for processing Micro-C data by identifying and managing chimeric reads in BAM files. It follows the logic and structure of the Parasplit tool but is tailored for Micro-C data. The tool reads alignment files (SAM, BAM, or CRAM) using pysam
and identifies events of soft-clipping or hard-clipping. The identified clipping points are treated as restriction sites to generate new pairs of sequences.
Features
- Parallel Processing: Microsplit utilizes parallel processing to enhance performance and efficiency.
- Soft-Clipping and Hard-Clipping Detection: It detects soft-clipping ('S') and hard-clipping ('H') in CIGAR strings to identify chimeric reads.
- Fragment Generation: Generates new fragments by considering the identified clipping points as restriction sites.
- Error Margin Handling: Adds a fixed number of base pairs to new fragments to account for potential over-mapping by Bowtie2, ensuring more accurate downstream analysis.
- Output Paired Reads: Outputs both end-to-end aligned pairs and newly generated fragment pairs.
Installation
Microsplit is available on PyPI and can be installed using pip:
pip install microsplit
Usage
Before using Microsplit, you need to perform an initial alignment of reads using Bowtie2 with the --local-very-sensitive
mode and the -xeq
option to obtain explicit CIGAR strings. Below is an example of how to use Microsplit from the command line:
microsplit --bam_for_file path/to/forward.bam \
--bam_rev_file path/to/reverse.bam \
--output_forward path/to/output_forward.fastq.gz \
--output_reverse path/to/output_reverse.fastq.gz \
--num_threads 8 \
--seed_size 20 \
--length_added 10
Command-Line Arguments
--bam_for_file
: Path to the forward BAM file.--bam_rev_file
: Path to the reverse BAM file.--output_forward
: Path to the output forward FastQ file.--output_reverse
: Path to the output reverse FastQ file.--num_threads
: Total number of threads for parallel processing.--seed_size
: Minimum size of a fragment to be generated.--length_added
: Number of base pairs added to the new fragment after soft clipping to account for potential over-mapping by Bowtie2.
Methodology
Microsplit processes Micro-C data by:
- Reading BAM Files: Reads the forward and reverse BAM files simultaneously.
- Identifying Clipping Events: Identifies soft-clipping and hard-clipping events from the CIGAR strings in the BAM files.
- Generating Fragments: Uses the clipping points as restriction sites to generate new fragments, adding a fixed number of base pairs (
length_added
) to account for potential over-mapping by Bowtie2. - Outputting Paired Reads: Outputs both end-to-end aligned pairs and the newly generated fragment pairs.
The length added to new fragments helps to handle potential misalignments due to Bowtie2's tendency to over-map reads and not soft-clip enough, ensuring more accurate results.
License
Microsplit is released under the AGPLv3 license. The code is freely available on Gitbio.
Contributing
We welcome contributions! If you'd like to contribute, please fork the repository and submit a pull request. For major changes, please open an issue first to discuss what you would like to change.
Contact
For any questions or issues, please contact samir.bertache.djenadi@gmail.com
Thank you for using Microsplit!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file microsplit-0.1.0.tar.gz
.
File metadata
- Download URL: microsplit-0.1.0.tar.gz
- Upload date:
- Size: 41.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8b34b8f87845a73bd2692a8d003db0f147de0858ed6d181d186854f60a2cb91e |
|
MD5 | e1165529d645aba458d0a244053659a7 |
|
BLAKE2b-256 | 1a0ba1513cd8ff1037779c59d9792d7b96f30258c5e599d4b167e4f55915e6db |
File details
Details for the file microsplit-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: microsplit-0.1.0-py3-none-any.whl
- Upload date:
- Size: 30.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c3747cb73c9186b234978f8f1fa096245cb38aed64d84f60745368014b147ae7 |
|
MD5 | 42d6783e9f5f24f078ee770e26a73ebe |
|
BLAKE2b-256 | c5b76503f1236a430b2a3442d829bb47feb4f5d606120538fe73b8304a070db3 |