An Hi-C tool for cutting sequences using specified enzymes
Project description
[]
[
]
PARASPLIT :
Overview
Parasplit is a Python script designed to process paired-end FASTQ files by fragmenting DNA sequences at specified restriction enzyme sites. It efficiently handles large datasets by leveraging multi-threading for decompression and compression using pigz.
Features
-
Find and Utilize Restriction Enzyme Sites: Automatically identifies ligation sites from provided enzyme names and generates regex patterns to locate these sites in sequences.
-
Fragmentation: Splits sequences at restriction enzyme sites, creating smaller fragments based on specified seed size.
-
Multi-threading: Efficiently processes large datasets by utilizing multiple threads for decompression and compression.
-
Custom Modes: Supports different pairing modes for sequence fragments.
Installation
Ensure you have Python 3 installed along with the required dependencies:
sudo apt-get install pigz
pip install parasplit
Usage
The script can be executed from the command line with various arguments to customize its behavior.
Command-Line Arguments
-
--source_forward(str): Input file path for forward reads. Default is../data/R1.fq.gz. -
--source_reverse(str): Input file path for reverse reads. Default is../data/R2.fq.gz. -
--output_forward(str): Output file path for processed forward reads. Default is../data/output_forward.fq.gz. -
--output_reverse(str): Output file path for processed reverse reads. Default is../data/output_reverse.fq.gz. -
--list_enzyme(str): Comma-separated list of restriction enzymes. Default is "No restriction enzyme found." -
--mode(str): Mode of pairing fragments. Options areallorfr. Default isfr. -
--seed_size(int): Minimum length of fragments to keep. Default is 20. -
--num_threads(int): Number of threads to use for processing. Default is 8. -
--borderless: Non conservation of ligations sites
Example Command
parasplit --source_forward="../data/R1.fq.gz" --source_reverse="../data/R2.fq.gz" --output_forward="../data/output_forward.fq.gz" --output_reverse="../data/output_reverse.fq.gz" --list_enzyme=EcoRI,HinfI --mode=all --seed_size=20 --num_threads=8
Main Script
-
Pretreatment: Retrieval of restriction sites from the Biopython database and allocation of resources for the different processes.
-
Read: Decompression and simultaneous reading of FastQ files. Send reads to a multiprocessing queue
-
Frag: Retrieve sequences in a queue. Splits sequences into fragments based on restriction enzyme sites. Create Pairs, and send it in a multiprocessing queue
-
WriteAndControl: Stream writing from data from the output queue and compression in parallel
Project architecture
Schéma de l'architecture - Licence : CC BY-NC 4.0
Dependencies
- pigz
The tree structure of my project :
├── myproject/
│ ├── __init__.py
│ ├── main.py
│ ├── Frag.py
│ ├── Read.py
│ ├── Pretreatment.py
│ └── WriteAndControl.py
├── pyproject.toml
├── requirements-dev.txt
├── docs/
│ ├── requirements.txt
├── test/
│ ├── __init__.py
│ ├── test_main.py
│ ├── input_data/
│ │ ├── R1.fq.gz
│ │ └── R2.fq.gz
│ └── output_data/
│ ├── output_ref_R1.fq.gz
│ ├── output_ref_R2.fq.gz
│ ├── output_ref_all_R1.fq.gz
│ └── output_ref_all_R2.fq.gz
└── README.md
Contact
For questions or issues, please contact samir.bertache.djenadi@gmail.com.
This README provides an overview of the Cutsite Script's functionality, usage instructions, and implementation details. For more detailed information, refer to the script's source code and docstrings.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file parasplit-1.1.4.tar.gz.
File metadata
- Download URL: parasplit-1.1.4.tar.gz
- Upload date:
- Size: 16.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2fce7575973af8e654a3055feb731589678a812adb85956590f2408da957903e
|
|
| MD5 |
a4a78635ac7bba0633b29786ca8b3c3e
|
|
| BLAKE2b-256 |
21c79a3bf9b5ac2797a1e20f5265662786b07a44bc0ebc1ab22b702fa8a5d6a7
|
File details
Details for the file parasplit-1.1.4-py3-none-any.whl.
File metadata
- Download URL: parasplit-1.1.4-py3-none-any.whl
- Upload date:
- Size: 16.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b7fc3ce006509171b7a29d568bc00f66075e7374f645ef37bdfd3cbc59631a4c
|
|
| MD5 |
de5a3a9b18eb3c7936c340bc9f606870
|
|
| BLAKE2b-256 |
2a0852ade8ca3786c7f40b10abf64d4101cbbdcc759a791616e0c5d51d7d6483
|