Skip to main content

An Hi-C tool for cutting sequences using specified enzymes

Project description

[pipeline status] [coverage report]

PARASPLIT :

Overview

Parasplit is a Python script designed to process paired-end FASTQ files by fragmenting DNA sequences at specified restriction enzyme sites. It efficiently handles large datasets by leveraging multi-threading for decompression and compression using pigz.

Features

  • Find and Utilize Restriction Enzyme Sites: Automatically identifies ligation sites from provided enzyme names and generates regex patterns to locate these sites in sequences.

  • Fragmentation: Splits sequences at restriction enzyme sites, creating smaller fragments based on specified seed size.

  • Multi-threading: Efficiently processes large datasets by utilizing multiple threads for decompression and compression.

  • Custom Modes: Supports different pairing modes for sequence fragments.

Installation

Ensure you have Python 3 installed along with the required dependencies:

sudo apt-get install pigz
pip install parasplit

Usage

The script can be executed from the command line with various arguments to customize its behavior.

Command-Line Arguments

  • --source_forward (str): Input file path for forward reads. Default is ../data/R1.fq.gz.

  • --source_reverse (str): Input file path for reverse reads. Default is ../data/R2.fq.gz.

  • --output_forward (str): Output file path for processed forward reads. Default is ../data/output_forward.fq.gz.

  • --output_reverse (str): Output file path for processed reverse reads. Default is ../data/output_reverse.fq.gz.

  • --list_enzyme (str): Comma-separated list of restriction enzymes. Default is "No restriction enzyme found."

  • --mode (str): Mode of pairing fragments. Options are all or fr. Default is fr.

  • --seed_size (int): Minimum length of fragments to keep. Default is 20.

  • --num_threads (int): Number of threads to use for processing. Default is 8.

  • --borderless: Non conservation of ligations sites

Example Command

parasplit --source_forward="../data/R1.fq.gz" --source_reverse="../data/R2.fq.gz" --output_forward="../data/output_forward.fq.gz" --output_reverse="../data/output_reverse.fq.gz" --list_enzyme=EcoRI,HinfI --mode=all --seed_size=20 --num_threads=8

Main Script

  • Pretreatment: Retrieval of restriction sites from the Biopython database and allocation of resources for the different processes.

  • Read: Decompression and simultaneous reading of FastQ files. Send reads to a multiprocessing queue

  • Frag: Retrieve sequences in a queue. Splits sequences into fragments based on restriction enzyme sites. Create Pairs, and send it in a multiprocessing queue

  • WriteAndControl: Stream writing from data from the output queue and compression in parallel

Project architecture

Schéma de l'architecture

Schéma de l'architecture - Licence : CC BY-NC 4.0

Dependencies

  • pigz

The tree structure of my project :

		├── myproject/
		│   ├── __init__.py
		│   ├── main.py
		│   ├── Frag.py
		│   ├── Read.py
		│   ├── Pretreatment.py
		│   └── WriteAndControl.py
		├── pyproject.toml
		├── requirements-dev.txt
		├── docs/
		│   ├── requirements.txt
		├── test/
		│   ├── __init__.py
		│   ├── test_main.py	
		│   ├── input_data/
		│   │   ├── R1.fq.gz
		│   │   └── R2.fq.gz
		│   └── output_data/
		│       ├── output_ref_R1.fq.gz
		│       ├── output_ref_R2.fq.gz
		│       ├── output_ref_all_R1.fq.gz
		│       └── output_ref_all_R2.fq.gz
		└── README.md

Contact

For questions or issues, please contact samir.bertache.djenadi@gmail.com.


This README provides an overview of the Cutsite Script's functionality, usage instructions, and implementation details. For more detailed information, refer to the script's source code and docstrings.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

parasplit-1.1.5.tar.gz (16.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

parasplit-1.1.5-py3-none-any.whl (16.9 kB view details)

Uploaded Python 3

File details

Details for the file parasplit-1.1.5.tar.gz.

File metadata

  • Download URL: parasplit-1.1.5.tar.gz
  • Upload date:
  • Size: 16.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for parasplit-1.1.5.tar.gz
Algorithm Hash digest
SHA256 b5089d9618b862216b27bc3b1e9d44d2b2cb17f1ab523ce38937d0eee27f825b
MD5 52390f5e27b5eb73e53e5ccd1959d6db
BLAKE2b-256 54119bf9eed624232d69e388ce8fa2195ff0987ac97cc3b04ee0b4c4f38790fd

See more details on using hashes here.

File details

Details for the file parasplit-1.1.5-py3-none-any.whl.

File metadata

  • Download URL: parasplit-1.1.5-py3-none-any.whl
  • Upload date:
  • Size: 16.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for parasplit-1.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 d529882f181c7d62c61542dbf392da63e5dab2d008b9e0d5d7bff39c022a8e72
MD5 fd3612480dc4958cc911e8e684b68587
BLAKE2b-256 62d7e6c28a9233278b25a75f66d66a172414e9bdee1d4a323e398f369221b4d9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page