Skip to main content

A NumPy port of the foldseek code for encoding structures to 3di.

Project description

🚀 mini3di Stars

A NumPy port of the foldseek code for encoding structures to 3di.

Actions Coverage License PyPI Bioconda Wheel Python Versions Python Implementations Source Mirror GitHub issues Docs Changelog Downloads

🗺️ Overview

foldseek is a method developed by van Kempen et al.[1] for the fast and accurate search of protein structures. In order to search proteins structures at a large scale, it first encodes the 3D structure into sequences over a structural alphabet, 3di, which captures tertiary amino acid interactions.

mini3di is a pure-Python package to encode 3D structures of proteins into the 3di alphabet, using the trained weights from the foldseek VQ-VAE model.

This library only depends on NumPy and is available for all modern Python versions (3.7+).

🔧 Installing

Install the mini3di package directly from PyPi which hosts universal wheels that can be installed with pip:

$ pip install mini3di

💡 Example

mini3di provides a single Encoder class, which expects the 3D coordinates of the , , N and C atoms from each peptide residue. For residues without (Gly), simply write the coordinates as math.nan. Call the encode_atoms method to get a sequence of 3di states:

from math import nan
import mini3di

encoder = mini3di.Encoder()
states = encoder.encode_atoms(
    ca=[[32.9, 51.9, 28.8], [35.0, 51.9, 26.6], ...],
    cb=[[ nan,  nan,  nan], [35.3, 53.3, 26.4], ...],
    n=[ [32.1, 51.2, 29.8], [35.3, 51.5, 28.1], ...],
    c=[ [34.4, 51.7, 29.1], [36.1, 51.1, 25.8], ...],
)

The states returned as output will be a NumPy array of state indices. To turn it into a sequence, use the build_sequence method of the encoder:

sequence = encoder.build_sequence(states)
print(sequence)

The encoder can work directly with Biopython objects, if Biopython is available. A helper method encode_chain to extract the atom coordinates from a Bio.PDB.Chain and encoding them directly. For instance, to encode all the chains from a PDB file:

import pathlib

import mini3di
from Bio.PDB import PDBParser

encoder = mini3di.Encoder()
parser = PDBParser(QUIET=True)
struct = parser.get_structure("8crb", pathlib.Path("tests", "data", "8crb.pdb"))

for chain in struct.get_chains():
    states = encoder.encode_chain(chain)
    sequence = encoder.build_sequence(states)
    print(chain.get_id(), sequence)

💭 Feedback

⚠️ Issue Tracker

Found a bug? Have an enhancement request? Head over to the GitHub issue tracker if you need to report or ask something. If you are filing in on a bug, please include as much information as you can about the issue, and try to recreate the same bug in a simple, easily reproducible situation.

🏗️ Contributing

Contributions are more than welcome! See CONTRIBUTING.md for more details.

📋 Changelog

This project adheres to Semantic Versioning and provides a changelog in the Keep a Changelog format.

⚖️ License

This library is provided under the BSD 3-clause license. It includes some code ported from foldseek, which is licensed under the GNU General Public License v3.0, and relicensed with the permission of the authors.

This project is in no way not affiliated, sponsored, or otherwise endorsed by the original foldseek authors. It was developed by Martin Larralde during his PhD project at the European Molecular Biology Laboratory in the Zeller team.

📚 References

  • [1] Kempen, Michel van, Stephanie S. Kim, Charlotte Tumescheit, Milot Mirdita, Jeongjae Lee, Cameron L. M. Gilchrist, Johannes Söding, and Martin Steinegger. ‘Fast and Accurate Protein Structure Search with Foldseek’. Nature Biotechnology, 8 May 2023, 1–4. doi:10.1038/s41587-023-01773-0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mini3di-0.2.1.tar.gz (18.6 kB view details)

Uploaded Source

Built Distribution

mini3di-0.2.1-py2.py3-none-any.whl (14.2 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file mini3di-0.2.1.tar.gz.

File metadata

  • Download URL: mini3di-0.2.1.tar.gz
  • Upload date:
  • Size: 18.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.6

File hashes

Hashes for mini3di-0.2.1.tar.gz
Algorithm Hash digest
SHA256 dfc4a63aaba175b05e9cc1f260e1bfcd2832ff3235537146a334b6a2179f8a96
MD5 456d8d9f692ddb0ae3e321acb800a6a2
BLAKE2b-256 e4e4a478c514aef8759e5e55fb2bed4fd242f19a82ba7d780cc24dd7952b1d72

See more details on using hashes here.

File details

Details for the file mini3di-0.2.1-py2.py3-none-any.whl.

File metadata

  • Download URL: mini3di-0.2.1-py2.py3-none-any.whl
  • Upload date:
  • Size: 14.2 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.6

File hashes

Hashes for mini3di-0.2.1-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 304479e7e6c81065b616fb2b745a3e17c41e69d4f382ee83b57e7c3f4a401e85
MD5 2204abb5c302c2b739289093c53d11d2
BLAKE2b-256 eea6a62382d676ab32faf6007935743676c4f97240d55e1829188fe7762a217c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page