Skip to main content

No project description provided

Project description

Docko

Docking for ligands a discusting mix of code combining the latest from Chai with old scool bro vina.

Made this for myself but others wanted to use. Love only pls. Take it as it is <3 but if you notice bugs, please submit an issue, be a g.

Install

Make sure you have vina installed: https://autodock-vina.readthedocs.io/en/latest/installation.html

I have not found it to work with pip needs the executable.

Works on mac and liunx, you need big power tho for Chai so would rec linux.

conda  create --name docko python=3.10.14 -y
conda activate docko
conda install -c conda-forge pdbfixer -y
conda config --env --add channels conda-forge
pip install git+https://github.com/chaidiscovery/chai-lab.git

install docko now

conda activate docko
pip install docko

Lucky last since vina is a b

You need to make a second environment just to prepare the ligand, I came across this issue when making all my stuff.

conda create --name vina python=3.9.7 -y
conda activate vina
conda install -c conda-forge numpy openbabel scipy rdkit -y
pip install meeko

Quick start

Use case 1: you have a sequence and you want to bind it

Here you're best bet is using Chai, this will automatically handle everything for you:

Example:

from docko import *

base_dir = 'some_folder' # A folder on your computer

run_chai('A0A0E3LLD2_METBA', # name
         'MSIEKIPGYTYGKTESMSPLNLEDLKLLKDSVMFTEEDEKYLKKAGEVLEDQVEEILDTWYGFVGSHPHLLYYFTSPDGTPNEEYLAAVRKRFSKWILDTCNRNYDQAWLDYQYEIGLRHHRTKKNRTDNVESVPNINYRYLVAFIYPITATIKPFLARKGHTSEEVEKMHQAWFKATVLQVALWSYPYVKQGDF', # sequence
         'CCCCC(CC)COC(=O)C1=CC=CC=C1C(=O)OCC(CC)CCCC', # ligand as smiles
         base_dir
        )

The outputs will now be in base_dir.

Say you have a csv of these and you want to make bound structures for all of them:

run_chai_df(output_dir, filename, entry_column='Entry', seq_column='Sequence', ligand_column='Substrate')

This runs Chai on a csv that contains your sequneces, ligands (Substartes) and the entry name (Entry) and makes a new folder using the entry name (this would mean you ideally don't want dumb characters in there.) And puts all these new folders in output_dir.

Use case 2: you have a uniprot ID and you want to get the structure and bind a ligand with vina

Here you got told "oh wow physics informed models are the best, I don't trust ML!" this will typically arise from someone over the age of 40. Here to humour them you can also run vina, you'll need to have it installed.

The smiles is your ligand as smiles, base_dir is where you want your data to be output. Note given we are passing the protein_name='A0A0H2V871' which is a uniprot ID it will automatically get the structure for us. If we weren't we would need to pre-download the PDB structure, or fold it using an online server such as AF3 or Chia (you could run Chia and then remove the ligand as well - my fave option).

from docko import *

smiles = 'CCCCC(CC)COC(=O)C1=CC=CC=C1C(=O)OCC(CC)CCCC'

base_dir = 'some_folder' # A folder on your computer

dock(sequence='', 
    protein_name='A0A0H2V871', # Or the name/path of the file on your computer as pdb or cif.
    smiles=smiles, 
    ligand_name='DEHP', # name of your chemical no funny characters, the ligand will be made in a folder named this
    residues=[113, 114], # Resiudes of your active site, we position the ligand within here (I find the centroid of these guys)
    protein_dir=f'{base_dir}/', # Folder to save the proteins to
    ligand_dir=f'{base_dir}/', # Folder to save the input ligand to
    output_dir=f'{base_dir}/', # output folder with the docked ligand and config file
    pH=7.4, # pH to run docking at
    method='vina', # method can be vina, ad4, or diffdock
    size_x=5.0, # How far in x is alowed think of this as a cloud around your residues or residue centroid
    size_y=5.0, 
    size_z=5.0,
    num_modes=9, # Dunno check vina docks using the defaut
    exhaustivenes=32 ) # higher is better but slower, this is a default

# Just checks the output was logged --> this has your "energy data" about how good the docking was
os.path.isfile(f'{base_dir}A0A0H2V871-DEHP_log.txt')

e.g. if you wanted to run it on some file, just change it to your path to your downloaded PDB file e.g.:

protein_name=f'{base_dir}data/test_existing.pdb',

This will then make the name of your directory test_existing and then save the resulst in there. I guess just again don't have funny characters in your filename.

As above, you can also run with the option ad4 it makes all these other random files, and again was something that someone asked me to do, was seriously painful and I don't wish it on anyone else so have made it available. Basically uses some rando forcefield that makes in some cases vina dock better. Who knows. LMK if you have an opinion.

Use case 3: you want to use diffdock and use up all the space on your computer

dock(sequence='', 
    protein_name='A0A0H2V871', # Or the name/path of the file on your computer as pdb or cif.
    smiles=smiles, 
    ligand_name='DEHP', # name of your chemical no funny characters, the ligand will be made in a folder named this
    residues=[113, 114], # Resiudes of your active site, we position the ligand within here (I find the centroid of these guys)
    protein_dir=f'{base_dir}/', # Folder to save the proteins to
    ligand_dir=f'{base_dir}/', # Folder to save the input ligand to
    output_dir=f'{base_dir}/', # output folder with the docked ligand and config file
    method='diffdock', # As above just change to diffdock
    )

Basically exactly as above, except you need to specify the method is diffdock. Note you need to have TRILL installed for this to work:

micromamba create -n TRILL python=3.10 ; micromamba activate TRILL
micromamba install -c pytorch -c nvidia pytorch=2.1.2 pytorch-cuda=12.1 torchdata
micromamba install -c conda-forge openbabel pdbfixer swig openmm smina fpocket vina openff-toolkit openmmforcefields setuptools=69.5.1
micromamba install -c bioconda foldseek pyrsistent
micromamba install -c "dglteam/label/cu121" dgl
micromamba install -c pyg pyg pytorch-cluster pytorch-sparse pytorch-scatter
pip install git+https://github.com/martinez-zacharya/lightdock.git@03a8bc4888c0ff8c98b7f0df4b3c671e3dbf3b1f git+https://github.com/martinez-zacharya/ECPICK.git setuptools==69.5.1
pip install trill-proteins

Other info

PDB or structure

You need to select your structure from PDB or in liu of that, use alphafold3 server (https://alphafoldserver.com/).

Alternatively, if your IDs are PDB IDs or Uniprot IDs you can just pass those and it will get teh structures for you.

If you use the alphafoldserver you'll get cif files and this works with that too!

Working with heme based files

Unfortunatley since alphafold is some new stuff and we're working with autodock vina we will need to change the files a bit.

First, if we use the AF3 docked heme, this will be automatically "cleaned" before making the pdbqt file. So we use the pipeline on the AF3 structure.

So we need to read-add it back in after also converting it manually.

To convert it manually, we copy and paste (i know lol) the heme from the original pdb file (if you don't have this, go into a program like chimeraX and convert the .cif file to a .pdb file).

Then you open up the alpha fold pdb in a text editor and copy out the heme atoms, ommitting the last one, the Fe, as vina doesn't like this one.

Then, we convert this manually by using obabel: obabel heme.pdb -o pdbqt > heme.pdbqt

Once this has been converted, we realise that vina doesn't like many of the tags. So we need to then change this so that we remove all of these. These include (but probably not limited to)

ENDBRANCH
ROOT
BRANCH
ENDROOT

Then you can run the program as per usual :D I do this automatically within the scripts but thought I would mention it inacse things fail for you (whoever you are.)

References

(1) Martinez, Z. A.; Murray, R. M.; Thomson, M. W. TRILL: Orchestrating Modular Deep-Learning Workflows for Democratized, Scalable Protein Analysis and Engineering. bioRxiv October 27, 2023, p 2023.10.24.563881. https://doi.org/10.1101/2023.10.24.563881.
(2) Eberhardt, J.; Santos-Martins, D.; Tillack, A. F.; Forli, S. AutoDock Vina 1.2.0: New Docking Methods, Expanded Force Field, and Python Bindings. J. Chem. Inf. Model. 2021, 61 (8), 3891–3898. https://doi.org/10.1021/acs.jcim.1c00203.
(3) Chai Discovery. https://www.chaidiscovery.com/blog/introducing-chai-1 (accessed 2024-09-15).

THANKX

Lastly if you liked this, give it a star ****

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

docko-0.1.1.tar.gz (21.6 kB view details)

Uploaded Source

Built Distribution

docko-0.1.1-py3-none-any.whl (21.8 kB view details)

Uploaded Python 3

File details

Details for the file docko-0.1.1.tar.gz.

File metadata

  • Download URL: docko-0.1.1.tar.gz
  • Upload date:
  • Size: 21.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.14

File hashes

Hashes for docko-0.1.1.tar.gz
Algorithm Hash digest
SHA256 8819bfc52d3db95fac0523227fa23056f10168a4181ee6234c204784f861b805
MD5 2da45ecb9eab6cde3bf1ac5def34ebc8
BLAKE2b-256 fb52e8a4a98995b1cfdcc90fc3d334dd3af3ffd2aed07412f832e24ebcaaa084

See more details on using hashes here.

File details

Details for the file docko-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: docko-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 21.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.14

File hashes

Hashes for docko-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 7a01e716d2cf4af23e6a4bf048bd134c9a8415bb76acf52bb08e3a88af613948
MD5 2ced8af049beae89d6366d46f413a7c8
BLAKE2b-256 0c255a80bf4478e4cd4bc0d8299d694a7a9439c1bcc2e6f2581632a6d19f7eac

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page