Skip to main content

Tools for creating machine-learning datasets from macromolecular structure

Project description

Last release Python version Documentation Test status Test coverage Last commit

Macromolecule Census is a tool for identifying high-quality, non-redundant subsets of the biological assemblies in the protein data bank (PDB). A particular emphasis is to accommodate all kinds of macromolecules; not just proteins. Briefly, this process involves the following steps:

  • Rank each structure by metrics including clash score, resolution, $R_{free}$, Q-score, NMR restraints.

  • Rank each assembly by biological relevance, subchain cover, and size.

  • Cluster protein, DNA, and RNA molecules by sequence similarity.

  • Cluster small molecules and branched polysaccharides by identity.

The primary use-case for this software is the creation of datasets for machine learning. This typically entails iterating through each assembly in ranked order, adding unique training examples as they appear.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

macromol_census-0.3.0.tar.gz (26.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

macromol_census-0.3.0-py3-none-any.whl (33.9 kB view details)

Uploaded Python 3

File details

Details for the file macromol_census-0.3.0.tar.gz.

File metadata

  • Download URL: macromol_census-0.3.0.tar.gz
  • Upload date:
  • Size: 26.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for macromol_census-0.3.0.tar.gz
Algorithm Hash digest
SHA256 fc8a4ce1f8ed35e9895768988da44ac9043bfaa41f1a3519b6b09d21200c8f27
MD5 5ea944dbcaacd27a4abca2804aa8050e
BLAKE2b-256 176a85cdfbb00a91fb235513c2b755298577a5ad307c848823d112710b357f2b

See more details on using hashes here.

Provenance

The following attestation bundles were made for macromol_census-0.3.0.tar.gz:

Publisher: release.yml on kalekundert/macromol_census

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file macromol_census-0.3.0-py3-none-any.whl.

File metadata

File hashes

Hashes for macromol_census-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b9892d3332338a5960e2aa8abe2832ad30f15c9c50417af98c99ce2aaa4cafcd
MD5 193bb444c2767a83d7f8b421995b4503
BLAKE2b-256 899a7ce5d7dd555b2d6ff613e6c187ba36cac5217fc0134daad00bf6756d4b96

See more details on using hashes here.

Provenance

The following attestation bundles were made for macromol_census-0.3.0-py3-none-any.whl:

Publisher: release.yml on kalekundert/macromol_census

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page