No project description provided
Project description
EnzymeStructuralFiltering
Structural filtering pipeline using docking and active site heuristics to prioritze ML-predicted enzyme variants for experimental validation. This tool processes superimposed ligand poses and filters them using geometric criteria such as distances, angles, and optionally, esterase-specific filters or nucleophilic proximity.
🚀 Features
- Analysis of enzyme-ligand docking using multiple docking tools (ML- and physics-based).
- Optional esterase or nucleophile-focused analysis.
- User-friendly pipeline only using a .pkl file as input and ligand smile strings.
- Different parts of the pipeline can be run independently of each other.
📦 Installation
Option 1: Install via pip
pip install enzyme-filtering-pipline
Option 2: Clone the repository
git clone https://github.com/HelenSchmid/EnzymeStructuralFiltering.git
cd EnzymeStructuralFiltering
pip install .
:seedling: Environment Setup
Using conda
conda env create -f environment.yml
conda activate filterpipeline
🔧 Usage Example
from filtering_pipeline.pipeline import Pipeline
import pandas as pd
from pathlib import Path
df = pd.read_pickle("DEHP-MEHP.pkl").head(5)
pipeline = Pipeline(
df = df,
ligand_name="TPP",
ligand_smiles="CCCCC(CC)COC(=O)C1=CC=CC=C1C(=O)OCC(CC)CCCC", # SMILES string of ligand
smarts_pattern='[$([CX3](=O)[OX2H0][#6])]', # SMARTS pattern of the chemical moiety of interest of ligand
max_matches=1000,
esterase=1,
find_closest_nuc=1,
num_threads=1,
squidly_dir='filtering_pipeline/squidly_final_models/',
base_output_dir="pipeline_output"
)
pipeline.run()
Running pipline on multiple ligands at the same time
You can run the filtering pipeline for multiple ligands by using a simple Bash script that passes ligand names and their SMILES strings to a Python runner script.
#!/bin/bash
# Define ligands and their SMILES representations
declare -A LIGANDS
LIGANDS["tri_2_chloroethylPi"]="C(CCl)OP(=O)(OCCCl)OCCCl"
LIGANDS["DEHP"]="CCCCC(CC)COC(=O)C1=CC=CC=C1C(=O)OCC(CC)CCCC"
LIGANDS["TPP"]="C1=CC=C(C=C1)OP(=O)(OC2=CC=CC=C2)OC3=CC=CC=C3"
# Create logs directory
mkdir -p logs
# Loop over each ligand and run the pipeline
for name in "${!LIGANDS[@]}"
do
echo "Running for $name..."
python benchmark_filtering_on_exp_tested_variants_run.py "$name" "${LIGANDS[$name]}" \
2> "logs/${name}.err" \
1> "logs/${name}.out"
echo "Finished $name. Logs saved to logs/${name}.out and logs/${name}.err"
done
Each run invokes benchmark_filtering_on_exp_tested_variants_run.py, which looks like:
import argparse
import pandas as pd
from filtering_pipeline.pipeline import Pipeline
# SMARTS patterns to define substructures per ligand
SMARTS_MAP = {
"TPP": "[P](=O)(O)(O)",
"DEHP": "[C](=O)[O][C]",
"Monuron": "Cl",
}
def parse_args():
parser = argparse.ArgumentParser()
parser.add_argument("ligand_name", type=str, help="Ligand name (e.g. TPP)")
parser.add_argument("ligand_smiles", type=str, help="SMILES string of the ligand")
return parser.parse_args()
def main():
args = parse_args()
smarts_pattern = SMARTS_MAP.get(args.ligand_name)
pipeline = Pipeline(
df=pd.read_pickle("examples/DEHP-MEHP.pkl").head(2),
ligand_name=args.ligand_name,
ligand_smiles=args.ligand_smiles,
smarts_pattern=smarts_pattern,
max_matches=5000,
find_closest_nuc=1,
num_threads=1,
squidly_dir="filtering_pipeline/squidly_final_models/",
base_output_dir=f"pipeline_output_{args.ligand_name}",
)
pipeline.run()
if __name__ == "__main__":
main()
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file enzyme_filtering_pipeline-0.0.5.tar.gz.
File metadata
- Download URL: enzyme_filtering_pipeline-0.0.5.tar.gz
- Upload date:
- Size: 38.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
df6f0ee2bfe535d06448a3ac7abb545b61e7d84394751e7d4fb5dbe0102b74a7
|
|
| MD5 |
5a0ce1b8306bbb18a976d0bebfb14828
|
|
| BLAKE2b-256 |
d15add969d8388db830ae5af7167c9179153ed70089bcd30004f6ec9fbcadc50
|
File details
Details for the file enzyme_filtering_pipeline-0.0.5-py3-none-any.whl.
File metadata
- Download URL: enzyme_filtering_pipeline-0.0.5-py3-none-any.whl
- Upload date:
- Size: 52.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f72b88086225e8fdbdda85e6ac71cedbdb43db57c09e01963e19dbd3a461414f
|
|
| MD5 |
176c1fa59db7c6b79a3d0f2e80558c52
|
|
| BLAKE2b-256 |
8a919db30f50387654d7e1ca32b060ac0db5ed1980877afd214c40be3073562a
|