Skip to main content

Place upper bounds on assembly index using the grammar algorithm RePair.

Project description

Context-Free Grammars and String Assembly Index

Directed string assembly index calculator using the smallest grammar algorithm RePair. This will quickly find a short assembly path, but there is no guarantee that it will find the shortest possible assembly path. Thus, this path length serves as an upper bound to the assembly index. This method works best on strings but can also be applied to molecular graphs as we will demonstrate below.

Installation

Prerequisites: networkx >= 3.4.2 rdkit >=2024.03.5 matplotlib>=3.9.2

Use pip to install this package.

pip install assemblycfg

Examples

The central function of this package, cfg.repair_with_pathways returns three items. First it returns the integer path length with upper bounds the assembly index, second it returns the list of virtual object strings which were used along the assembly path identified by RePair, and third it returns a networkx DiGraph object depicting the assembly path.

import assemblycfg as cfg
l, vo, path = cfg.repair_with_pathways("abracadabra")
print(f'a("abracadabra") =< {l}')
print(f"Virtual objects used: {vo}")

You can visualize the pathway as follows

import networkx as nx
import matplotlib.pyplot as plt
nx.draw(path, with_labels=True, font_weight='bold', pos=nx.spring_layout(path))
plt.show()

though these pathway visuals easy get unweildy. We recommend the python package AssemblyTheoryTools for more sophisticated pathway plotting functions.

One can also apply these methods to molecular assembly index. The function calculate_assembly_path_det can place a valid upper bound on the assembly index of any molecule, though it performs best on 'stringy' molecules like lipids. Starting from a SMILES string for cholesterol, we convert it into a networkx graph format before passing it to the calculator.

import assemblycfg as cfg
smi_str = "C[C@H](CCCC(C)C)[C@H]1CC[C@@H]2[C@@]1(CC[C@H]3[C@H]2CC=C4[C@@]3(CC[C@@H](C4)O)C)C" # SMILES string for cholesterol
molgraph = cfg.smi_to_nx(smi_to_nx)
l, vo, path = cfg.calculate_assembly_path_det(molgraph)
print(f'a(Cholesterol) =< {l}')

These virtual objects will also be networkx graphs representing molecular fragments.

See the examples folder for more examples of how to use the package.

These algorithms are described in Siebert et al. (In Prep); if you find this package useful, please cite this paper.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

assemblycfg-1.2.2.tar.gz (26.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

assemblycfg-1.2.2-py3-none-any.whl (28.6 kB view details)

Uploaded Python 3

File details

Details for the file assemblycfg-1.2.2.tar.gz.

File metadata

  • Download URL: assemblycfg-1.2.2.tar.gz
  • Upload date:
  • Size: 26.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for assemblycfg-1.2.2.tar.gz
Algorithm Hash digest
SHA256 8337e6fa930140072bd4f637a80857ceb44b6bb53948c440157dae535f011c6a
MD5 9d4ba9cfb064d4a333c0ec5bf21b0904
BLAKE2b-256 359791816e51629c2f08a6f779db45f7e1466f6e50314ae9d38d8ccecf861b14

See more details on using hashes here.

Provenance

The following attestation bundles were made for assemblycfg-1.2.2.tar.gz:

Publisher: python-publish.yml on ELIFE-ASU/assemblycfg

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file assemblycfg-1.2.2-py3-none-any.whl.

File metadata

  • Download URL: assemblycfg-1.2.2-py3-none-any.whl
  • Upload date:
  • Size: 28.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for assemblycfg-1.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 cd239c397438cd2d686ba0ba4e99c5bd5cf4466a6d84a6fbf47bd2a48c2fd01e
MD5 07c0aa1081a2e8286720472db4734146
BLAKE2b-256 029b5482fbe62e101c72df191bcade1df074191452df7a0ef04b3d507d873bb5

See more details on using hashes here.

Provenance

The following attestation bundles were made for assemblycfg-1.2.2-py3-none-any.whl:

Publisher: python-publish.yml on ELIFE-ASU/assemblycfg

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page