Place upper bounds on assembly index using the grammar algorithm RePair.
Project description
Context-Free Grammars and String Assembly Index
Directed string assembly index calculator using the smallest grammar algorithm RePair. This will quickly find a short assembly path, but there is no guarantee that it will find the shortest possible assembly path. Thus, this path length serves as an upper bound to the assembly index. This method works best on strings but can also be applied to molecular graphs as we will demonstrate below.
Installation
Prerequisites: networkx >= 3.4.2 rdkit >=2024.03.5 matplotlib>=3.9.2
Use pip to install this package.
pip install assemblycfg
Examples
The central function of this package, cfg.repair_with_pathways returns three items. First it returns the integer path length with upper bounds the assembly index, second it returns the list of virtual object strings which were used along the assembly path identified by RePair, and third it returns a networkx DiGraph object depicting the assembly path.
import assemblycfg as cfg
l, vo, path = cfg.repair_with_pathways("abracadabra")
print(f'a("abracadabra") =< {l}')
print(f"Virtual objects used: {vo}")
You can visualize the pathway as follows
import networkx as nx
import matplotlib.pyplot as plt
nx.draw(path, with_labels=True, font_weight='bold', pos=nx.spring_layout(path))
plt.show()
though these pathway visuals easy get unweildy. We recommend the python package AssemblyTheoryTools for more sophisticated pathway plotting functions.
One can also apply these methods to molecular assembly index. The function calculate_assembly_path_det can place a valid upper bound on the assembly index of any molecule, though it performs best on 'stringy' molecules like lipids. Starting from a SMILES string for cholesterol, we convert it into a networkx graph format before passing it to the calculator.
import assemblycfg as cfg
smi_str = "C[C@H](CCCC(C)C)[C@H]1CC[C@@H]2[C@@]1(CC[C@H]3[C@H]2CC=C4[C@@]3(CC[C@@H](C4)O)C)C" # SMILES string for cholesterol
molgraph = cfg.smi_to_nx(smi_to_nx)
l, vo, path = cfg.calculate_assembly_path_det(molgraph)
print(f'a(Cholesterol) =< {l}')
These virtual objects will also be networkx graphs representing molecular fragments.
See the examples folder for more examples of how to use the package.
These algorithms are described in Siebert et al. (In Prep); if you find this package useful, please cite this paper.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file assemblycfg-1.2.2.tar.gz.
File metadata
- Download URL: assemblycfg-1.2.2.tar.gz
- Upload date:
- Size: 26.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8337e6fa930140072bd4f637a80857ceb44b6bb53948c440157dae535f011c6a
|
|
| MD5 |
9d4ba9cfb064d4a333c0ec5bf21b0904
|
|
| BLAKE2b-256 |
359791816e51629c2f08a6f779db45f7e1466f6e50314ae9d38d8ccecf861b14
|
Provenance
The following attestation bundles were made for assemblycfg-1.2.2.tar.gz:
Publisher:
python-publish.yml on ELIFE-ASU/assemblycfg
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
assemblycfg-1.2.2.tar.gz -
Subject digest:
8337e6fa930140072bd4f637a80857ceb44b6bb53948c440157dae535f011c6a - Sigstore transparency entry: 1019601202
- Sigstore integration time:
-
Permalink:
ELIFE-ASU/assemblycfg@b58d0a3145548193e81fd545ff09ecbfe55c11d1 -
Branch / Tag:
refs/tags/v1.2.2 - Owner: https://github.com/ELIFE-ASU
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@b58d0a3145548193e81fd545ff09ecbfe55c11d1 -
Trigger Event:
release
-
Statement type:
File details
Details for the file assemblycfg-1.2.2-py3-none-any.whl.
File metadata
- Download URL: assemblycfg-1.2.2-py3-none-any.whl
- Upload date:
- Size: 28.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cd239c397438cd2d686ba0ba4e99c5bd5cf4466a6d84a6fbf47bd2a48c2fd01e
|
|
| MD5 |
07c0aa1081a2e8286720472db4734146
|
|
| BLAKE2b-256 |
029b5482fbe62e101c72df191bcade1df074191452df7a0ef04b3d507d873bb5
|
Provenance
The following attestation bundles were made for assemblycfg-1.2.2-py3-none-any.whl:
Publisher:
python-publish.yml on ELIFE-ASU/assemblycfg
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
assemblycfg-1.2.2-py3-none-any.whl -
Subject digest:
cd239c397438cd2d686ba0ba4e99c5bd5cf4466a6d84a6fbf47bd2a48c2fd01e - Sigstore transparency entry: 1019601303
- Sigstore integration time:
-
Permalink:
ELIFE-ASU/assemblycfg@b58d0a3145548193e81fd545ff09ecbfe55c11d1 -
Branch / Tag:
refs/tags/v1.2.2 - Owner: https://github.com/ELIFE-ASU
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@b58d0a3145548193e81fd545ff09ecbfe55c11d1 -
Trigger Event:
release
-
Statement type: