Skip to main content

Chip-based CRISPR analysis

Project description

Introduction

---
title: Workflow
---
flowchart TD
  SP[(synthesize plasmids)] --> FP[(functional plasmids)] --> E[editing] --> SR[(sample reads)] --> A[alignment]

  FP --> A --> OEO[(observed editing outcomes)] --> C[correction] --> CEO[(corrected editing outcomes)]

  SP --> NP[(nonfunctional plasmids)] --> SR

  NP[(nonfunctional plasmids)] --> NCR[(negative control reads)] --> A2[alignment]

  FP --> A2 --> NCEO[(negative control editing outcomes)] --> C

We design the workflow naapam to decouple the CRISPR/Cas9 editing outcomes from the synthesis error of reference plasmids. We first give an overview of the workflow and left the techniqal details in Discriminate functional and nonfunctional plasmids, Sequence alignment and Correction observed editing outcomes by negative control.

As shown in the above diagram, we apply the hard classification on the synthesized plasmids to get functional and nonfunctional plasmids. We assume that only functional plasmids can be edited by the CRISPR/Cas9 system. Nonfunctional plasmids are transferred into the cell lines, but are not edited. Based on our assumption that only functional plasmids can be edited, we use functional plasmids as references to analyze editing outcomes.

For cell lines express Cas9, both nonfunctional plasmids and edited functional plasmids contribute to the editing outcomes in NGS reads. For the negative control (cell lines without Cas9), only nonfunctional plasmids contributes to the editing outcomes. The synthesis of plasmids is error-prone. A naive analysis often attributes these synthesis errors to the CRISPR/Cas9 system, and therefore overesitimates the editing efficiency and distorts the overall editing profile. A reasonable assumption is that the abundance of non-functional plasmids is similar in cell lines with and without Cas9. Therefore, we may correct the editing profiles for the cell lines with Cas9 based on those of the negative controls.

Discriminate functional and nonfunctional plasmids

block
  block:ID
    R1B["R1 barcode"]
    R1P["R1 primer"]
    TSS["G"]
    SG["sgRNA"]
    SC["scaffold"]
    TES["G"]
    T["target"]
    SEP["CAG"]
    B["barcode"]
    RCR2P["R2 primer'"]
    RCR2B["R2 barcode'"]
  end

We parse the components for each read in control samples. We discriminate functional and nonfunction plasmids based on the integrity and conservation of:

  • primer;
  • sgRNA;
  • scaffold;
  • barcode;
  • protospacer;
  • PAM;
  • transcription start and end sites markded by G;

We also require that barcode, sgRNA, protospacer are consistent (comes from the same plasmid design).

Sequence alignment

We use the bioconda package rearr (version 1.0.11) to align the NGS reads to the functional plasmids for discriminating their editing types. We package rearr together with naapam so you need not download it separately. Rearr use an efficient and accurate chimeric alignment engine to call editing outcome from raw reads. It is especially good at extracting predictable templated insertions resulted from stagger cleavage of CRISPR/Cas9 system (Precise and Predictable CRISPR Chromosomal Rearrangements Reveal Principles of Cas9-Mediated Nucleotide Insertion). See the documentation for more details about rearr.

Correction observed editing outcomes by negative control

Let $w_0$ be the observed wild type frequency of a functional plasmid in control sample. Let $e_0^{(i)}$ be the observed frequency of editing outcome $i$ (actually nonfunctional plasmids) in control sample. Then $w_0 + \sum_i e_0^{(i)} = 1$. Similarly, let $w$ be observed the wild type frequency of a functional plasmid in the sample with Cas9. Let $e^{(i)}$ be the observed frequency of editing outcome $i$ in the sample with Cas9. Then $w + \sum_i e^{(i)} = 1$. By the assumption that the abundance of non-functional plasmids is similar in cell lines with and without Cas9, the expected frequency of functional plasmids (wild type + edited) is $w_0$. For the editing outcome $i$, among its observed frequency $e^{(i)}$ in the cell line with Cas9, we expect that $e_0^{(i)}$ comes from nonfunctional plasmids. In summary, the corrected frequency of the editing outcome $i$ is $$ \frac{e^{(i)} - e_0^{(i)}}{w_0} $$ if wild type is included, and $$ \frac{e^{(i)} - e_0^{(i)}}{\sum_i (e^{(i)} - e_0^{(i)})} $$ if wild type is excluded.

Install

$ pip install naapam

Dependencies

  • bowtie2
  • gawk

Usage

Follow the notebooks in order:

  • align.ipynb
  • analysis.ipynb

Copy them out of the package by

from importlib import resources
import shutil

for file in ["align.ipynb", "analyze.ipynb"]:
    shutil.copyfile(src=resources.files("naapam.notebooks") / file, dst=file)

You need to config the directory and the plasmid file in the first block of the notebooks.

  • data_dir: contains the raw NGS reads.
  • root_dir: root directory for outputs.
  • plasmid_file: the design file of plasmids.
  • config_dir: directory for config files.
  • correct_dir: output directory of the corrected alignment results.

Copy examples of plasmid files out of the package for reference.

from importlib import resources
import shutil

for file in [
    "final_hgsgrna_libb_all_0811-NGG.csv",
    "final_hgsgrna_libb_all_0811_NAA_scaffold_nbt.csv",
]:
    shutil.copyfile(src=resources.files("naapam.plasmids") / file, dst=file)

Copy examples of config directory out of the package for reference.

from importlib import resources
import shutil

shutil.copytree(src=resources.files("naapam.filter_configs"), dst="filter_configs")

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

naapam-0.1.17.tar.gz (5.2 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

naapam-0.1.17-cp314-cp314t-musllinux_1_2_x86_64.whl (6.5 MB view details)

Uploaded CPython 3.14tmusllinux: musl 1.2+ x86-64

naapam-0.1.17-cp314-cp314t-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (5.5 MB view details)

Uploaded CPython 3.14tmanylinux: glibc 2.24+ x86-64manylinux: glibc 2.28+ x86-64

naapam-0.1.17-cp314-cp314-musllinux_1_2_x86_64.whl (6.5 MB view details)

Uploaded CPython 3.14musllinux: musl 1.2+ x86-64

naapam-0.1.17-cp314-cp314-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (5.5 MB view details)

Uploaded CPython 3.14manylinux: glibc 2.24+ x86-64manylinux: glibc 2.28+ x86-64

naapam-0.1.17-cp313-cp313-musllinux_1_2_x86_64.whl (6.5 MB view details)

Uploaded CPython 3.13musllinux: musl 1.2+ x86-64

naapam-0.1.17-cp313-cp313-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (5.5 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.24+ x86-64manylinux: glibc 2.28+ x86-64

File details

Details for the file naapam-0.1.17.tar.gz.

File metadata

  • Download URL: naapam-0.1.17.tar.gz
  • Upload date:
  • Size: 5.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for naapam-0.1.17.tar.gz
Algorithm Hash digest
SHA256 bc9ca648fbc77abc5328af5f71411441ffc42bb774aadb30311fda156fe72d02
MD5 1dbf306733c158a1685fe6f3bf10f9b7
BLAKE2b-256 f3b2a0881cf77ea288f7af9b4d549f45d9e54796cf86511bf8fe830827d67ba4

See more details on using hashes here.

Provenance

The following attestation bundles were made for naapam-0.1.17.tar.gz:

Publisher: release.yml on ljw20180420/naapam

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file naapam-0.1.17-cp314-cp314t-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for naapam-0.1.17-cp314-cp314t-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 52859edad16268460ec565fbb9cc1c2b077c066ec3a89331848a61a8ab615261
MD5 5844f5e32f29259d349fd26fe7249f9f
BLAKE2b-256 4505cb61fd7753d493883804952a7ac9e9b466854664ea1b98618e316c93ded0

See more details on using hashes here.

Provenance

The following attestation bundles were made for naapam-0.1.17-cp314-cp314t-musllinux_1_2_x86_64.whl:

Publisher: release.yml on ljw20180420/naapam

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file naapam-0.1.17-cp314-cp314t-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for naapam-0.1.17-cp314-cp314t-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 6a9e32db1c9b161028e803d97e0f6da7ad2aa50ddc8446c3fff6ab1cc7b8ee2e
MD5 b4e6cf62a156db62f5b7f929c17823bb
BLAKE2b-256 6f905c0acc8159a2aed6f6b6480289b7519e4a7b5df52ea25f9fbbc307604d92

See more details on using hashes here.

Provenance

The following attestation bundles were made for naapam-0.1.17-cp314-cp314t-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl:

Publisher: release.yml on ljw20180420/naapam

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file naapam-0.1.17-cp314-cp314-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for naapam-0.1.17-cp314-cp314-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 db24fcef72a954f9a04109908024c8d9146677bbfd4aeafa39a0e7d832a6dc32
MD5 695a61aa166f2e56e52da7d94394351e
BLAKE2b-256 a4684fc931446bf7e770bec675b5dbe712508a5a1ecf1be3ef9a3db7608100ec

See more details on using hashes here.

Provenance

The following attestation bundles were made for naapam-0.1.17-cp314-cp314-musllinux_1_2_x86_64.whl:

Publisher: release.yml on ljw20180420/naapam

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file naapam-0.1.17-cp314-cp314-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for naapam-0.1.17-cp314-cp314-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 cc91ca70acbd7ee1b7770e02f733fac9042d1651e7081d82a9a788888fe24a5a
MD5 7ff0308992be80a6193f0e118c24ddf4
BLAKE2b-256 6f21705e7cad09da8bfb70f310113d1f05e12b28b53b75973c16494a68cd523d

See more details on using hashes here.

Provenance

The following attestation bundles were made for naapam-0.1.17-cp314-cp314-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl:

Publisher: release.yml on ljw20180420/naapam

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file naapam-0.1.17-cp313-cp313-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for naapam-0.1.17-cp313-cp313-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 785ca7541210ebf4096f0e72c16ea26d404034cbfa774aa7ba360619699482ac
MD5 c918c0e856832fff2293e5fc29521231
BLAKE2b-256 ba15df775bfcc0e514f96fef71b74a8a86c986dd4ab310131b21629a3566781b

See more details on using hashes here.

Provenance

The following attestation bundles were made for naapam-0.1.17-cp313-cp313-musllinux_1_2_x86_64.whl:

Publisher: release.yml on ljw20180420/naapam

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file naapam-0.1.17-cp313-cp313-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for naapam-0.1.17-cp313-cp313-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 8d6166c0796167712df0afed81db30f152b43c10c39a4bdc70bdff3372f81289
MD5 e9259ed70497dd794a2cebc0e0124789
BLAKE2b-256 d94b4efec04c765c2788506081221a72b872068e39a6a068ffc6b7379db2bac2

See more details on using hashes here.

Provenance

The following attestation bundles were made for naapam-0.1.17-cp313-cp313-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl:

Publisher: release.yml on ljw20180420/naapam

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page