Analyse the data for RNA stability assay
Project description
Inceptive Arrayed RNA Degradation Assay Analysis Code Documentation
The purpose of this code is to analyse the data for RNA stability assay.
Getting started
In this repo you can find a working toy example to run the code. You can also crosscheck whether your experiment that you want to run has consistent format.
Data for running as well as expected results are within data directory. The electropherograms contain control peaks for normalization; an added analyte of known size which was not subject to degradation.
The data has the format:
data
├── plate_1
│ ├── epg.csv
│ └── plate_map.csv
├── plate_2
├── epg.csv
└── plate_map.csv
You can run it by executing:
arrayed_degradation --min_peak 600 --max_peak 2500 --data_dir_path data
This will create two new output files under data: result.csv containing half life estimates and statistics, and results.pdf, containing diagnostic plots and summary statistics.
Continue reading below for detailed instructions on how to use for specific use-cases.
Usage
Installation
pip install arrayed-degradation-assay
Running the analysis
The main script responsible for running analysis is located in src/arrayed_degradation/analyze_experiment.py. Alternatively, you can also run script arrayed_degradation.
In order to list all arguments along with description just run:
arrayed_degradation --help
Arguments
Here is a list of the arguments of the analysis script:
Required arguments:
--data_dir_paththe path to the directory with input data in a proper format; could be either absolute path or relative with respect to the directory from which you run the script;--min_peakthe left bound of the target peak in nucleotides;--max_peakthe right bound of the target peak in nucleotides; the target peak will be searched for within the [min_peak,max_peak] interval;
Optional arguments:
--disable_control_peakwhether to use or not the control peak; if you haven't used a control analyte in this experiment, make sure to use this argument!--control_peak_minthe left bound of the control peak (for example p4p6) in nucleotides; the default is 235 nucleotides;--control_peak_maxthe right bound of the control peak (for example p4p6) in nucleotides; the default is 310 nucleotides; the control peak will be searched for within the [control_min_peak,control_max_peak] interval;--time_unitthe time unit used inplate_map.csvtimepoint labels in the plate map.mfor minutes,hfor hours,dfor days. Defaults to minutes.--remove_backgroundwhether to remove background from target peak. By default, the script does not remove background. If this argument is set it will draw a line at the base of detected peak and only the area above that line will be considered for half life calculations. It is not advised to set it because of unstable behaviour and less repeatable results in some cases.--rel_heightparameter that controls the width of the target peak. It ranges from 0 to 1. Default is 0.85. The higher the value the wider bounds of the target peak. Do not exeed 0.95 as you will capture everything as peak. 0.75 is safe but quite conservative. 0.85 captures mostly what you expect but can grab some noise.
Examples
Example 1 - simple run without control peak and target peak between 1500 and 2000 nts
arrayed_degradation --data_dir_path data --min_peak 1500 --max_peak 2000 --disable_control_peak
With the command above you will run the analysis; the script will search for the input data within the data folder (that is a relative path to the directory with the data; but you can also provide absolute path as well); the target peak will be searched for within the 1500 - 2000 nucleotides window on electropherograms. The script will not use a control peak for normalization.
Example 2 - run with (1) control peak, (2) target peak background removal and (3) narrower peaks
arrayed_degradation --data_dir_path data --min_peak 1500 --max_peak 2000 --remove_background --rel_height 0.7
The difference between this and Example 1 is that now script will search for control peak and normalize electropherograms traces with respect to them. Peak bounds are not provided so the script will use the default bounds (235, 310) for the control sequence. Additionally, the script will remove backgroud from the target peak by drawing a line at base of the peak (keep in mind that it might not work well for certain cases like migrating peaks, so use this argument with caution and double check if areas are correctly determined in results.pdf). Finally, the script will try to make peaks bounds more narrow due to decreasing rel_height from the default of 0.85 to 0.7.
Input data format
Directory structure
The script assumes a strict directory structure where Fragment Analyzer outputs and plate maps are placed next to each other in subfolders. The plate map also has a strict structure, encoding the identity of the molecule in each well, as well as the time interval at which it was degraded.
Here is an example of the expected directory structure for an experiment with 3 plates:
my_experiment_dir
├── my_plate_1
│ ├── epg.csv
│ └── plate_map.csv
├── my_plate_2
│ ├── epg.csv
│ └── plate_map.csv
└── my_plate_3
├── epg.csv
└── plate_map.csv
In this example we have a separate directory for our experiment called my_experiment_dir (you can of course name it differently). Within that directory we have 3 directories for 3 of our plates (my_plate_1, my_plate_2, etc..; again names can be different) that were used in the experiment. Within each plate's directory we must have 2 files named exactly plate_map.csv, epg.csv.
Please do not create any additional directories within experiment folder as the script will break. Additional files (like the ones with results) are okay.
Plate maps do not have to be identical across plates, meaning that you don't have to enforce the same location of sequence - timpoint pair.
Input files
There are two obligatory input files in order to read data for a given plate:
-
plate_map.csv- it should be a CSV file containg well positions of samples being analyzed.Here is a simplified example of what a proper plate_map file should look like:
001_TP0,002_TP0,003_TP0,001_TP180,002_TP180,003_TP180 004_TP0,005_TP0,006_TP0,004_TP180,005_TP180,006_TP180In this example in well
A3we have a sample with label003_TP0which represents RNA having ID003and degraded for0minutes. At wellA6there is a sample labeled as003_TP180which is the same RNA but degraded for180min. The sample labels should have this format{rna_id}_TP{timepoint}, whererna_idis your molecule identifier, andtimepointcan be integer or float number that determines timepoints used (by default in minutes; could be changed to hours or days with--time_unitargument). Do not add any additional suffixes as the script might not recognize RNA and timepoint properly.For each RNA and each technical replicate you must have sample with timepoint 0. Replicates of the same RNA-timepoint pair within one plate or across plates are acceptable. There are no strict layout requirements or assumptions for plating your samples, meaning that you can order them as you wish. Plate maps files do not need to be the same for all of your plates. However, it is advised to think of physically distinct plates as technical replicates (so exact copies of each other) and design your experiment to follow this pattern.
Missing wells within plate map are acceptable, so if you don't want to include specifc RNA-timepoint pair on a particular plate in the analysis just remove it from the platemap and leave cell empty. Keep in mind though that you shoud not drop timepoint 0 as it might lead to improper results (we use timepoint 0 as a reference/baseline point for remaining timepoints). If you really want to remove timepoint 0 please remove as well all the others timepoint that are paired with it.
Do not add any other labels within platemap besides actual samples. If your platemap contains ladder please remove it and leave the cell empty.
Do not add any rows or column names in
plate_map.csv; Adding that will cause incorrect well assignment by frameshifting and nonsense results as a consequence.The script will inform you in the logs of how many unique RNA ids, replicates and timepoints have been detected. Please double check if the numbers match your expectations.
-
epg.csv- this should be a Fragment Analyzer electropherogram file (result of capillary electrophoresis) with intensity traces and the corresponding nucleotide sizing.Here is a simplified toy example of what that file should look like:
Size (nt),"A1: SampA1","A2: SampA2","A3: SampA3" 1.16,6.93,3.43,8.81 1.89,9.54,4.29,12.33 2.62,12.63,6.80,16.83 ...
The first column must always be
Size (nt)indicating nucleotide length obtained from fitting ladder (which is done within the Fragment Analyzer software). Other columns names must contain well identifier likeA3orF12.
Output files
After the analysis is run you will be provided with two result files that will be placed in the same directory as your input files.
result.csv- CSV containing a table with RNA ids, half life, decay rate, standard deviations and other metrics.results.pdf- PDF containing plots with analysis details like sample traces, decay curve, summary plots etc..
License
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file arrayed_degradation_assay-0.1.3.tar.gz.
File metadata
- Download URL: arrayed_degradation_assay-0.1.3.tar.gz
- Upload date:
- Size: 1.9 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f123d0c221827509864753c48b8fa49f54d2789927c025d56d84c482689a8b80
|
|
| MD5 |
807c6bf7583da69eb1923194a9721b8d
|
|
| BLAKE2b-256 |
2406c6b6e51a236493d0b81c1a063223a1e992d7ae9abfac7e076d15f455e0e6
|
Provenance
The following attestation bundles were made for arrayed_degradation_assay-0.1.3.tar.gz:
Publisher:
release.yml on ncptv/arrayed-degradation-assay
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
arrayed_degradation_assay-0.1.3.tar.gz -
Subject digest:
f123d0c221827509864753c48b8fa49f54d2789927c025d56d84c482689a8b80 - Sigstore transparency entry: 154149920
- Sigstore integration time:
-
Permalink:
ncptv/arrayed-degradation-assay@a1153e8646f40c8f29d713345b8fc5e7749e1c5b -
Branch / Tag:
refs/tags/v0.1.3 - Owner: https://github.com/ncptv
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@a1153e8646f40c8f29d713345b8fc5e7749e1c5b -
Trigger Event:
push
-
Statement type:
File details
Details for the file arrayed_degradation_assay-0.1.3-py3-none-any.whl.
File metadata
- Download URL: arrayed_degradation_assay-0.1.3-py3-none-any.whl
- Upload date:
- Size: 33.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a39ced683d6f0f5861ef06b6ebee9974fa13dce776fd4dfcb250cfc17675c6cf
|
|
| MD5 |
9edcf12e32aa6aa8188cb9e3fe73414b
|
|
| BLAKE2b-256 |
7fe79a4cd828cf6c3566dbaee8289de7b70a0a334bceee9fb1f6661d06e6f544
|
Provenance
The following attestation bundles were made for arrayed_degradation_assay-0.1.3-py3-none-any.whl:
Publisher:
release.yml on ncptv/arrayed-degradation-assay
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
arrayed_degradation_assay-0.1.3-py3-none-any.whl -
Subject digest:
a39ced683d6f0f5861ef06b6ebee9974fa13dce776fd4dfcb250cfc17675c6cf - Sigstore transparency entry: 154149923
- Sigstore integration time:
-
Permalink:
ncptv/arrayed-degradation-assay@a1153e8646f40c8f29d713345b8fc5e7749e1c5b -
Branch / Tag:
refs/tags/v0.1.3 - Owner: https://github.com/ncptv
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@a1153e8646f40c8f29d713345b8fc5e7749e1c5b -
Trigger Event:
push
-
Statement type: