Nuclease Cleavage Site and Overhang Identification
Project description
NuCSOI
Nuclease Cleavage Site and Overhang Identification
Description
NuCSOI processes paired-end sequencing data to identify nuclease cleavage sites on plasmid references. The pipeline performs quality control, mapping, coverage analysis, and statistical analysis to identify cleavage sites with single base-pair resolution.
Installation
pip install nucsoi
External dependencies (install separately):
- fastp (quality control)
- bwa (read mapping)
- samtools (BAM processing)
Inputs
- FASTQ files: Paired-end sequencing files (R1 and R2). Must be an even number of files.
- Plasmid reference: FASTA file containing the circular plasmid sequence.
Usage
nucsoi -f R1.fastq.gz R2.fastq.gz -p plasmid.fasta -o output_dir/
Use nucsoi --help for all available options.
Pipeline Stages
- Quality Control: Filters reads using fastp with configurable quality threshold (default: Q30).
- Plasmid Mapping: Maps quality-filtered reads to the plasmid reference using BWA. Handles circular references.
- Coverage Analysis: Calculates coverage at each position. Identifies regions with coverage drop-offs.
- Position Analysis: Statistical analysis of mapping positions. Applies multiple testing correction (Bonferroni and Benjamini-Hochberg).
Outputs
Results are written to the specified output directory:
output_dir/
├── inputs/
│ ├── raw_fastqgzs/ # Input FASTQ files
│ ├── qc_reads/ # Quality-controlled reads
│ └── plasmid/ # Plasmid reference
├── results/
│ └── plasmid_mapping_*/
│ ├── coverage_analysis.png
│ ├── coverage_data.csv
│ ├── coverage_zoomed_data.csv
│ ├── coverage_data.txt
│ ├── comprehensive_summary_plot.png
│ └── comprehensive_position_analysis.txt
├── scripts/ # Analysis scripts
├── configs.yaml # Configuration file
└── Makefile # Pipeline makefile
coverage_analysis.png: Coverage plots for entire plasmid and zoomed regionscoverage_data.csv: Coverage data for all positionscoverage_zoomed_data.csv: Coverage data for zoomed regionscoverage_data.txt: Coverage statisticscomprehensive_summary_plot.png: Statistical analysis plotscomprehensive_position_analysis.txt: Position statistics with multiple testing corrections
Options
-f, --fastq-files: Paired FASTQ files (required)-p, --plasmid: Plasmid reference FASTA file (required)-o, --output-dir: Output directory (required)-q, --quality-cutoff: Quality cutoff for fastp (default: 30)--run-pipeline: Automatically run pipeline after setup--version: Show version number-h, --help: Show help message
License
MIT License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file nucsoi-1.0.0.tar.gz.
File metadata
- Download URL: nucsoi-1.0.0.tar.gz
- Upload date:
- Size: 19.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2ad6c1fffdd13a66a58acfa4a265f564857a7af22cc5420888334894a7396e22
|
|
| MD5 |
7c0497de12cd3ae6b95d04ecd369c203
|
|
| BLAKE2b-256 |
5ee1749c8b4d814cf3a7bd9b769d15927d52e644e20b51ac9787487ec20b035a
|
File details
Details for the file nucsoi-1.0.0-py3-none-any.whl.
File metadata
- Download URL: nucsoi-1.0.0-py3-none-any.whl
- Upload date:
- Size: 18.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
543a47d3a8e4699207ea36776f51c4a2eac46b780bd224d88f0645c7f3e17766
|
|
| MD5 |
0eeefd2629a48b5611e52f1027fc1aab
|
|
| BLAKE2b-256 |
f1408e164be6725fc64053c7a4f98358d72209752736679e4c23a634ca064124
|