Skip to main content

Chip-based CRISPR analysis

Project description

Introduction

---
title: Workflow
---
flowchart TD
  SP[(synthesize plasmids)] --> FP[(functional plasmids)] --> E[editing] --> SR[(sample reads)] --> A[alignment]

  FP --> A --> OEO[(observed editing outcomes)] --> C[correction] --> CEO[(corrected editing outcomes)]

  SP --> NP[(nonfunctional plasmids)] --> SR

  NP[(nonfunctional plasmids)] --> NCR[(negative control reads)] --> A2[alignment]

  FP --> A2 --> NCEO[(negative control editing outcomes)] --> C

We design the workflow naapam to decouple the CRISPR/Cas9 editing outcomes from the synthesis error of reference plasmids. We first give an overview of the workflow and left the techniqal details in Discriminate functional and nonfunctional plasmids, Sequence alignment and Correction observed editing outcomes by negative control.

As shown in the above diagram, we apply the hard classification on the synthesized plasmids to get functional and nonfunctional plasmids. We assume that only functional plasmids can be edited by the CRISPR/Cas9 system. Nonfunctional plasmids are transferred into the cell lines, but are not edited. Based on our assumption that only functional plasmids can be edited, we use functional plasmids as references to analyze editing outcomes.

For cell lines express Cas9, both nonfunctional plasmids and edited functional plasmids contribute to the editing outcomes in NGS reads. For the negative control (cell lines without Cas9), only nonfunctional plasmids contributes to the editing outcomes. The synthesis of plasmids is error-prone. A naive analysis often attributes these synthesis errors to the CRISPR/Cas9 system, and therefore overesitimates the editing efficiency and distorts the overall editing profile. A reasonable assumption is that the abundance of non-functional plasmids is similar in cell lines with and without Cas9. Therefore, we may correct the editing profiles for the cell lines with Cas9 based on those of the negative controls.

Discriminate functional and nonfunctional plasmids

block
  block:ID
    R1B["R1 barcode"]
    R1P["R1 primer"]
    TSS["G"]
    SG["sgRNA"]
    SC["scaffold"]
    TES["G"]
    T["target"]
    SEP["CAG"]
    B["barcode"]
    RCR2P["R2 primer'"]
    RCR2B["R2 barcode'"]
  end

We parse the components for each read in control samples. We discriminate functional and nonfunction plasmids based on the integrity and conservation of:

  • primer;
  • sgRNA;
  • scaffold;
  • barcode;
  • protospacer;
  • PAM;
  • transcription start and end sites markded by G;

We also require that barcode, sgRNA, protospacer are consistent (comes from the same plasmid design).

Sequence alignment

We use the bioconda package rearr (version 1.0.11) to align the NGS reads to the functional plasmids for discriminating their editing types. We package rearr together with naapam so you need not download it separately. Rearr use an efficient and accurate chimeric alignment engine to call editing outcome from raw reads. It is especially good at extracting predictable templated insertions resulted from stagger cleavage of CRISPR/Cas9 system (Precise and Predictable CRISPR Chromosomal Rearrangements Reveal Principles of Cas9-Mediated Nucleotide Insertion). See the documentation for more details about rearr.

Correction observed editing outcomes by negative control

Let $w_0$ be the observed wild type frequency of a functional plasmid in control sample. Let $e_0^{(i)}$ be the observed frequency of editing outcome $i$ (actually nonfunctional plasmids) in control sample. Then $w_0 + \sum_i e_0^{(i)} = 1$. Similarly, let $w$ be observed the wild type frequency of a functional plasmid in the sample with Cas9. Let $e^{(i)}$ be the observed frequency of editing outcome $i$ in the sample with Cas9. Then $w + \sum_i e^{(i)} = 1$. By the assumption that the abundance of non-functional plasmids is similar in cell lines with and without Cas9, the expected frequency of functional plasmids (wild type + edited) is $w_0$. For the editing outcome $i$, among its observed frequency $e^{(i)}$ in the cell line with Cas9, we expect that $e_0^{(i)}$ comes from nonfunctional plasmids. In summary, the corrected frequency of the editing outcome $i$ is $$ \frac{e^{(i)} - e_0^{(i)}}{w_0} $$ if wild type is included, and $$ \frac{e^{(i)} - e_0^{(i)}}{\sum_i (e^{(i)} - e_0^{(i)})} $$ if wild type is excluded. This method has been used in previous works High-throughput analysis of the activities of xCas9, SpCas9-NG and SpCas9 at matched and mismatched target sequences in human cells and Prediction of the sequence-specific cleavage activity of Cas9 variants.

Install

$ pip install naapam

Dependencies

  • bowtie2
  • gawk

Usage

Follow the notebooks in order:

  • align.ipynb
  • analysis.ipynb

Copy them out of the package by

from importlib import resources
import shutil

for file in ["align.ipynb", "analyze.ipynb"]:
    shutil.copyfile(src=resources.files("naapam.notebooks") / file, dst=file)

You need to config the directory and the plasmid file in the first block of the notebooks.

  • data_dir: contains the raw NGS reads.
  • root_dir: root directory for outputs.
  • plasmid_file: the design file of plasmids.
  • config_dir: directory for config files.
  • correct_dir: output directory of the corrected alignment results.

Copy examples of plasmid files out of the package for reference.

from importlib import resources
import shutil

for file in [
    "final_hgsgrna_libb_all_0811-NGG.csv",
    "final_hgsgrna_libb_all_0811_NAA_scaffold_nbt.csv",
]:
    shutil.copyfile(src=resources.files("naapam.plasmids") / file, dst=file)

Copy examples of config directory out of the package for reference.

from importlib import resources
import shutil

shutil.copytree(src=resources.files("naapam.filter_configs"), dst="filter_configs")

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

naapam-0.1.18.tar.gz (5.2 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

naapam-0.1.18-cp314-cp314t-musllinux_1_2_x86_64.whl (6.5 MB view details)

Uploaded CPython 3.14tmusllinux: musl 1.2+ x86-64

naapam-0.1.18-cp314-cp314t-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (5.5 MB view details)

Uploaded CPython 3.14tmanylinux: glibc 2.24+ x86-64manylinux: glibc 2.28+ x86-64

naapam-0.1.18-cp314-cp314-musllinux_1_2_x86_64.whl (6.5 MB view details)

Uploaded CPython 3.14musllinux: musl 1.2+ x86-64

naapam-0.1.18-cp314-cp314-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (5.5 MB view details)

Uploaded CPython 3.14manylinux: glibc 2.24+ x86-64manylinux: glibc 2.28+ x86-64

naapam-0.1.18-cp313-cp313-musllinux_1_2_x86_64.whl (6.5 MB view details)

Uploaded CPython 3.13musllinux: musl 1.2+ x86-64

naapam-0.1.18-cp313-cp313-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (5.5 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.24+ x86-64manylinux: glibc 2.28+ x86-64

File details

Details for the file naapam-0.1.18.tar.gz.

File metadata

  • Download URL: naapam-0.1.18.tar.gz
  • Upload date:
  • Size: 5.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for naapam-0.1.18.tar.gz
Algorithm Hash digest
SHA256 2c8bad603f99ea557cfe7fa6a095c03e568b30bda6bdc8deb81bf6564599508b
MD5 8149fa8847387a9bad101274b27cb815
BLAKE2b-256 78f67a809ab8880b1e292aaf865ebd5b203cf38161d28d7988c6bbff785321c3

See more details on using hashes here.

Provenance

The following attestation bundles were made for naapam-0.1.18.tar.gz:

Publisher: release.yml on ljw20180420/naapam

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file naapam-0.1.18-cp314-cp314t-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for naapam-0.1.18-cp314-cp314t-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 5515a169e7cebf19004b6092426ae18556e92e068cfadc71994f88e38470364d
MD5 5f7c09980f9917e3cf514d580c703531
BLAKE2b-256 a9a7ec2a94d94d0126ed2ffff4424f5e057afb0d1d858e9cdca3a2d16a8157fc

See more details on using hashes here.

Provenance

The following attestation bundles were made for naapam-0.1.18-cp314-cp314t-musllinux_1_2_x86_64.whl:

Publisher: release.yml on ljw20180420/naapam

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file naapam-0.1.18-cp314-cp314t-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for naapam-0.1.18-cp314-cp314t-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 890c74c8e605915e85a9c21943c4d2f7481c5087fceea2a2cbfee4fcdd2c7a6d
MD5 429033b8e8f0ec88a2acbfbba702bbcb
BLAKE2b-256 fa9d0ab92267ad52d87d7860180cee108786008ca7af1cdfaf9f9482e4391ac4

See more details on using hashes here.

Provenance

The following attestation bundles were made for naapam-0.1.18-cp314-cp314t-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl:

Publisher: release.yml on ljw20180420/naapam

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file naapam-0.1.18-cp314-cp314-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for naapam-0.1.18-cp314-cp314-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 d8f571b16d2493c37bd208e9fc3490cc43d7b7d2e0b1e803f563a31c2d653243
MD5 6cff814e932ef151304fe8bbb2a5aaf5
BLAKE2b-256 345490697441dbaecc85eabffab0d9f1ef2b9088f08aab819e863c6ea169e71f

See more details on using hashes here.

Provenance

The following attestation bundles were made for naapam-0.1.18-cp314-cp314-musllinux_1_2_x86_64.whl:

Publisher: release.yml on ljw20180420/naapam

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file naapam-0.1.18-cp314-cp314-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for naapam-0.1.18-cp314-cp314-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 c6ff7e66e552cb17bba632da68f9fe85c561c99cc5b8b29ab38c75e8e45898ce
MD5 62b934df3b070c673d8d6ebde62eecc6
BLAKE2b-256 86dd29bfd18b8f984b232e86c210189238a0b59afd77291535c608e688ac7a71

See more details on using hashes here.

Provenance

The following attestation bundles were made for naapam-0.1.18-cp314-cp314-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl:

Publisher: release.yml on ljw20180420/naapam

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file naapam-0.1.18-cp313-cp313-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for naapam-0.1.18-cp313-cp313-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 a53ee8601929d6ecd41db6b9296103229bf7e60801a31279f44ff429bc8ccb71
MD5 48a8b6a5521561015be5a7830ef6aa8e
BLAKE2b-256 d461e97f4e213fb4900d2296b45c7f337e193790d8232cda72423205a95b88ed

See more details on using hashes here.

Provenance

The following attestation bundles were made for naapam-0.1.18-cp313-cp313-musllinux_1_2_x86_64.whl:

Publisher: release.yml on ljw20180420/naapam

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file naapam-0.1.18-cp313-cp313-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for naapam-0.1.18-cp313-cp313-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 5184415e403fdb086ec7cd8bb63a295c8a4d1ab4b45b580477331497ea652a17
MD5 1478ac0a68456d6c5b4e791263db5486
BLAKE2b-256 b4a6f88c2d5a86b43ceae88343280e522e652b839b9bca6e8e0c2dd5a81ac9c6

See more details on using hashes here.

Provenance

The following attestation bundles were made for naapam-0.1.18-cp313-cp313-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl:

Publisher: release.yml on ljw20180420/naapam

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page