Skip to main content

Nuclease Cleavage Site and Overhang Identification

Project description

NuCSOI

Nuclease Cleavage Site and Overhang Identification

Description

NuCSOI processes paired-end sequencing data to identify nuclease cleavage sites on plasmid references. The pipeline performs quality control, mapping, coverage analysis, and statistical analysis to identify cleavage sites with single base-pair resolution.

Installation

pip install nucsoi

External dependencies (install separately):

  • fastp (quality control)
  • bwa (read mapping)
  • samtools (BAM processing)

Inputs

  • FASTQ files: Paired-end sequencing files (R1 and R2). Must be an even number of files.
  • Plasmid reference: FASTA file containing the circular plasmid sequence.

Usage

nucsoi -f R1.fastq.gz R2.fastq.gz -p plasmid.fasta -o output_dir/

Use nucsoi --help for all available options.

Pipeline Stages

  1. Quality Control: Filters reads using fastp with configurable quality threshold (default: Q30).
  2. Plasmid Mapping: Maps quality-filtered reads to the plasmid reference using BWA. Handles circular references.
  3. Coverage Analysis: Calculates coverage at each position. Identifies regions with coverage drop-offs.
  4. Position Analysis: Statistical analysis of mapping positions. Applies multiple testing correction (Bonferroni and Benjamini-Hochberg).

Outputs

Results are written to the specified output directory:

output_dir/
├── inputs/
│   ├── raw_fastqgzs/          # Input FASTQ files
│   ├── qc_reads/              # Quality-controlled reads
│   └── plasmid/               # Plasmid reference
├── results/
│   └── plasmid_mapping_*/
│       ├── coverage_analysis.png
│       ├── coverage_data.csv
│       ├── coverage_zoomed_data.csv
│       ├── coverage_data.txt
│       ├── comprehensive_summary_plot.png
│       └── comprehensive_position_analysis.txt
├── scripts/                   # Analysis scripts
├── configs.yaml              # Configuration file
└── Makefile                  # Pipeline makefile
  • coverage_analysis.png: Coverage plots for entire plasmid and zoomed regions
  • coverage_data.csv: Coverage data for all positions
  • coverage_zoomed_data.csv: Coverage data for zoomed regions
  • coverage_data.txt: Coverage statistics
  • comprehensive_summary_plot.png: Statistical analysis plots
  • comprehensive_position_analysis.txt: Position statistics with multiple testing corrections

Options

  • -f, --fastq-files: Paired FASTQ files (required)
  • -p, --plasmid: Plasmid reference FASTA file (required)
  • -o, --output-dir: Output directory (required)
  • -q, --quality-cutoff: Quality cutoff for fastp (default: 30)
  • --run-pipeline: Automatically run pipeline after setup
  • --version: Show version number
  • -h, --help: Show help message

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nucsoi-1.0.0.tar.gz (19.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nucsoi-1.0.0-py3-none-any.whl (18.3 kB view details)

Uploaded Python 3

File details

Details for the file nucsoi-1.0.0.tar.gz.

File metadata

  • Download URL: nucsoi-1.0.0.tar.gz
  • Upload date:
  • Size: 19.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.15

File hashes

Hashes for nucsoi-1.0.0.tar.gz
Algorithm Hash digest
SHA256 2ad6c1fffdd13a66a58acfa4a265f564857a7af22cc5420888334894a7396e22
MD5 7c0497de12cd3ae6b95d04ecd369c203
BLAKE2b-256 5ee1749c8b4d814cf3a7bd9b769d15927d52e644e20b51ac9787487ec20b035a

See more details on using hashes here.

File details

Details for the file nucsoi-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: nucsoi-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 18.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.15

File hashes

Hashes for nucsoi-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 543a47d3a8e4699207ea36776f51c4a2eac46b780bd224d88f0645c7f3e17766
MD5 0eeefd2629a48b5611e52f1027fc1aab
BLAKE2b-256 f1408e164be6725fc64053c7a4f98358d72209752736679e4c23a634ca064124

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page