No project description provided
Project description
mEdit
Table of Contents
What is mEdit?
Program Structure
Features
- Reference Human Genome
- mEdit uses the RefSeq human genome reference GRCh38.p14
- Alternatively, the user can provide a custom human assembly. [See db_set for details]
- Alternative Genomes
- mEdit can work with alternative genomes which are compared to the reference assembly
- Pangenomes made public by the HPRC are built into mEdit and can be included in the analysis in 'standard' mode
- Flexible editing tool selection
- Several endonucleases and base-editors are built into mEdit and can be requested in any combination. [See options in guide_prediction].
- Custom editing tools can also be ingested by mEdit. [See how to format custom editors in guide_prediction]
Getting Started
Prerequisites
PIP
- Make sure
gccis installedsudo apt install gcc - Also make sure your pip up to date
python -m pip install --upgrade pip- or:
apt install python3-pip
Anaconda
mEdit utilizes Anaconda to build its own environments under the hood. In the example below, we assume a Linux x86_64 system. For other sytems, follow the instructions on this page.
- Install Miniconda:
-
Download the installer:
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh bash ~/Miniconda3-latest-Linux-x86_64.sh -
Set up channel priority and update conda:
conda update --all conda config --set channel_priority strict
-
Mamba
- The officially supported way of installing Mamba is through Miniforge.
- The Miniforge repository holds the minimal installers for Conda and Mamba specific to conda-forge.
- Example install
wget "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh" bash Miniforge3-$(uname)-$(uname -m).sh - Important warning:
- The supported way of using Mamba requires that no other packages are installed on the
baseconda environment
- The supported way of using Mamba requires that no other packages are installed on the
- Additional information on how to operate Mamba:
Installation
-
mEdit is compatible with UNIX-based systems running on Intel processors and it's conveniently available via pyPI:
pip install meditability
Running Tests
- As a Snakemake-based application, mEdit supports dry runs.
- A dry run evaluates the presence of supporting data, and I/O necessary for each process without actually processing the run.
- All mEdit programs can be used called with the
--dryoption
Usage
- To obtain information on how to run mEdit and view its programs, simply execute with the
—-helpflagmedit —-help
| Command | Description |
|---|---|
db_set |
Setup the necessary background data to run mEdit. |
list |
Prints the current set of editors available on mEdit. |
guide_prediction |
The core mEdit program finds potential guides for variants specified on the input by searching a diverse set of editors. |
offtarget |
Predict off-target effects for the guides found. |
- There are four programs available in the current version
- db_set: Set up the necessary background data to run mEdit. This downloads ~7GB of data.
- list: Prints the current set of editors available on mEdit.
- guide_prediction: This program scans for potential guides for variants specified on the input by searching a diverse set of editors.
- offtarget: Predicts off-target effect for the guides found
1. Database Setup
-
Database Setup is used to retrieve the required information and datasets to run medit. The contents include the reference human genome, HPRC pangenome vcf files, Refseq, MANE, clinvar and more. See the database structure below.
mEdit db_set [-h] [-d DB_PATH] [-l] [-c CUSTOM_REFERENCE] [-t THREADS]
Parameters:
Reference Database Pre-Processing
| Argument | Description |
|---|---|
-d DB_PATH |
Path where the mEdit_database directory will be created ahead of the analysis. Requires ~7.5GB of in-disk storage. [default: ./mEdit_database] |
-c CUSTOM_REFERENCE |
Path to a custom human reference genome in FASTA format. Chromosome annotation must follow a >chrN format (case sensitive). |
-t THREADS |
Number of cores to use for parallel decompression of mEdit databases. |
2. Editor List
- In the current version there are 24 endonuclease editors and 29 base editor stored within medit. list prints out a list of both base editors and endonuclease editors with the parameters used for guide prediction.
mEdit list [-h] [-d DB_PATH]
Parameters:
Available Editors and Base Editors (BEs)
| Argument | Description |
|---|---|
-d DB_PATH |
Path to the mEdit_database directory created using the db_set program. [default: ./mEdit_database] |
Output;
Available endonuclease editors:
-----------------------------
name: spCas9
pam, pam_is_first: NGG, False
guide_len: 20
dsb_position: -3
notes: requirements work for SpCas9-HF1, eSpCas9 1.1,spyCas9
5'-xxxxxxxxxxxxxxxxxxxxNGG-3'
-----------------------------
3. Guide Prediction
guide_predictionis the main program to search for guides given a list of variants. The pathogenic variants can be searched either from the ClinVar database or a de novo variant (these must be provided as genomic coordinates. See--qtypeoption).- mEdit first generates variant incorporated gRNAs using the reference human genome. If the user chooses "fast" the search will end with the human reference genome. However if the user chooses “standard” or “vcf” the medit program will also go on to predict the impact of alternative genomic variants on either the pangenome or user provided vcf file.
mEdit guide_prediction [-h] -i QUERY_INPUT [-o OUTPUT] [-d DB_PATH] [-j JOBTAG] [-m {fast,standard,vcf}] [-v CUSTOM_VCF] [--qtype {hgvs,coord}] [--editor EDITOR_REQUEST]
[--be BE_REQUEST] [--cutdist CUTDIST] [--dry] [--pam PAM] [--guidelen GUIDE_LENGTH] [--pamisfirst] [--dsb_pos DSB_POSITION]
[--edit_win EDITING_WINDOW] [--target_base {A,C,G,T}] [--result_base {A,C,G,T}] [--cluster] [-p PARALLEL_PROCESSES] [--ncores NCORES]
[--maxtime MAXTIME]
Parameters:
Input/Output Options
| Argument | Description |
|---|---|
-i QUERY_INPUT |
Path to a plain text file containing the query (or set of queries) for mEdit analysis. See --qtype for formatting options. |
-o OUTPUT |
Path to root directory where mEdit outputs will be stored. [default: mEdit_analysis_<jobtag>/] |
-d DB_PATH |
Path to the mEdit_database directory created using the db_set program. [default: ./mEdit_database] |
-j JOBTAG |
Tag associated with the current mEdit job. A random jobtag is generated by default. |
mEdit Core Parameters
| Argument | Description |
|---|---|
-m {fast,standard,vcf} |
Mode option determining how mEdit runs: fast - Uses one reference genome. standard - Uses a reference genome and pangenomes. vcf - Requires a custom VCF file. [default: standard] |
-v CUSTOM_VCF |
Path to a gunzip-compressed VCF file for vcf mode. |
--qtype {hgvs,coord} |
Query type: hgvs - Uses RefSeq ID + HGVS nomenclature. coord - Uses hg38 1-based coordinates. [default: hgvs] |
--editor EDITOR_REQUEST |
Specifies the set of editors: clinical - Uses clinically relevant editors. custom - Requires --pam, --pamisfirst, --guidelen, --dsb_pos. |
--be BE_REQUEST |
Enables base editors: off - Disables base editor search. default - Uses ABE & CBE with NGG PAM and 4-8bp editing window. custom - Requires --pam, --guidelen, --edit_win, --target_base, --result_base. |
--cutdist CUTDIST |
Maximum variant start position distance from the editor cut site. (Not available for base editors). [default: 7] |
--dry |
Perform a dry run of mEdit. |
Custom Editor Options
| Argument | Description |
|---|---|
--pam PAM |
Specifies the PAM sequence for custom guide or base editor searches. |
--guidelen GUIDE_LENGTH |
Guide sequence length for custom endonuclease/base editor searches. |
--pamisfirst |
Indicates if the PAM is before the guide sequence. |
--dsb_pos DSB_POSITION |
Double-strand cut site relative to PAM. Example: -3 for spCas9, 18,22 for Cas12. |
--edit_win EDITING_WINDOW |
Specifies editing window size (two comma-separated integers). Example: "4,8" for CBE. |
--target_base {A,C,G,T} |
Specifies the target base for base editor modification (e.g., "A" for ABE). |
--result_base {A,C,G,T} |
Specifies the base that the target base will be converted to (e.g., "G" for ABE). |
SLURM Options
| Argument | Description |
|---|---|
--cluster |
Request job submission through SLURM. [default: None] |
-p PARALLEL_PROCESSES |
Number of parallel processes for SLURM or local machine parallelization. [default: 1] |
--ncores NCORES |
Number of cores for each parallel process. [default: 2] |
--maxtime MAXTIME |
Maximum allowed time per parallel job. Format: H:MM:SS. Example: "2:00:00" for 2 hours. [default: 1:00:00] |
4. Off-target Prediction
-
The
offtargetprogram applies Guidescan2 on the guides found in guide_prediction and reports a summarized data set including the CFD score among other metrics.mEdit offtarget [-h] [--dry] [-o OUTPUT] [-d DB_PATH] -j JOBTAG [--select_editors SELECT_EDITORS] [--dna_bulge DNA_BULGE] [--rna_bulge RNA_BULGE] [--max_mismatch MAX_MISMATCH] [--cluster] [-p PARALLEL_PROCESSES] [--ncores NCORES] [--maxtime MAXTIME]
Parameters:
Input/Output Options
| Argument | Description |
|---|---|
-o OUTPUT |
Path to the root directory where mEdit guide_prediction outputs were stored. "mEdit offtarget" cannot operate if this path is incorrect. [default: mEdit_analysis_<jobtag>/] |
-d DB_PATH |
Path to the mEdit_database directory created using the db_set program. [default: ./mEdit_database] |
-j JOBTAG |
Tag associated with the "mEdit guide_prediction" job. "mEdit offtarget" will use the OUTPUT option to access this JOBTAG. |
--select_editors SELECT_EDITORS |
Comma-separated list of editors to be analyzed for off-target effects. [default: all] |
--dna_bulge DNA_BULGE |
Sets the number of insertions in the off-target sequence. [default: 0] |
--rna_bulge RNA_BULGE |
Sets the number of deletions in the off-target sequence. [default: 0] |
--max_mismatch MAX_MISMATCH |
Maximum allowable number of mismatches in off-target analysis. [default: 3] |
SLURM Options
| Argument | Description |
|---|---|
--cluster |
Request job submission through SLURM. [default: None] |
-p PARALLEL_PROCESSES |
Number of parallel processes for SLURM or local machine parallelization. [default: 1] |
--ncores NCORES |
Number of cores for each parallel process. [default: 2] |
--maxtime MAXTIME |
Maximum allowed time per parallel job. Format: H:MM:SS. Example: "2:00:00" for 2 hours. [default: 1:00:00] |
License
Copyright ©20xx [see Other Notes, below]. The Regents of the University of California (Regents). All Rights Reserved. Permission to use, copy, modify, and distribute this software and its documentation for educational, research, and not-for-profit purposes, without fee and without a signed licensing agreement, is hereby granted, provided that the above copyright notice, this paragraph and the following two paragraphs appear in all copies, modifications, and distributions. Contact The Office of Technology Licensing, UC Berkeley, 2150 Shattuck Avenue, Suite 408, Berkeley, CA 94704-1362, otl@berkeley.edu, for commercial licensing opportunities.
[Optional: Created by John Smith and Mary Doe, Department of Statistics, University of California, Berkeley.]
IN NO EVENT SHALL REGENTS BE LIABLE TO ANY PARTY FOR DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING LOST PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS DOCUMENTATION, EVEN IF REGENTS HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
REGENTS SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE SOFTWARE AND ACCOMPANYING DOCUMENTATION, IF ANY, PROVIDED HEREUNDER IS PROVIDED "AS IS". REGENTS HAS NO OBLIGATION TO PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.
FAQ
Cite us
Contact
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file meditability-0.7.3.tar.gz.
File metadata
- Download URL: meditability-0.7.3.tar.gz
- Upload date:
- Size: 89.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
408a171781cff97e64047440f63ab9454b493a7160dd049f614862f057ebbd18
|
|
| MD5 |
49be7657f0a85f65dbb5e34ebfb999f0
|
|
| BLAKE2b-256 |
2400d96be42633b3c3fbf5f2cab85e12e9ac31f6392be789a330f97c7b8ef80e
|
File details
Details for the file meditability-0.7.3-py3-none-any.whl.
File metadata
- Download URL: meditability-0.7.3-py3-none-any.whl
- Upload date:
- Size: 195.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a08366e373458dff650e3036490bd35cab0515cf0ec96cafdecff993becd0846
|
|
| MD5 |
4fb0656c7de066adf243087609fb495e
|
|
| BLAKE2b-256 |
04a01af7c42f05abe3fda38cbedf871979caeaa4c0603bb12a5128a0f4e8ee70
|