Library for molecule fragment operations.

These details have not been verified by PyPI

Intended Audience
- Education
- Science/Research
Natural Language
- English
Operating System
- MacOS
- POSIX :: Linux
Programming Language
Topic
- Scientific/Engineering :: Chemistry
Typing
- Typed

Project description

Xenosite Fragment

A library for processing molecule fragments.

Install from pypi:

pip install xenosite-fragment

Create a fragment from a SMILES or a fragment SMILES string:

>>> str(Fragment("CCCC")) # Valid smiles
'C-C-C-C'

>>> str(Fragment("ccCC")) # not a valid SMILES
'c:c-C-C'

Optionally, create a fragment of a molecule from a string and (optionally) a list of nodes in the fragment.

>>> F = Fragment("CCCCCCOc1ccccc1", [0,1,2,3,4,5])
>>> str(F)  # hexane
'C-C-C-C-C-C'

If IDs are provided, they MUST select a connected fragment.

>>> F = Fragment("CCCCCCOc1ccccc1", [0,10]) 
Traceback (most recent call last):
  ...
ValueError: Multiple components in graph are not allowed.

Get the canonical representation of a fragment:

>>> Fragment("O-C").canonical().string
'C-O'
>>> Fragment("OC").canonical().string
'C-O'
>>> Fragment("CO").canonical().string
'C-O'

Get the reordering of nodes used to create the canonical string representation. If remap=True, then the ID are remapped to the input representation used to initalize the Fragment.

>>> Fragment("COC", [1,2]).canonical(remap=True).reordering
[2, 1]
>>> Fragment("COC", [1,2]).canonical().reordering
[1, 0]

Match fragment to a molecule. By default, the ID correspond with fragment IDs. If remap=True, the ID corresponds to the input representation when the Fragment was initialized.

>>> smiles = "CCCC1CCOCN1"
>>> F = Fragment("CCCCCC") # hexane as a string
>>> list(F.matches(smiles)) # smiles string (least efficient)
[(0, 1, 2, 3, 4, 5)]

>>> import rdkit
>>> mol = rdkit.Chem.MolFromSmiles(smiles)
>>> list(F.matches(mol))  # RDKit mol
[(0, 1, 2, 3, 4, 5)]

>>> mol_graph = Graph.from_molecule(mol)
>>> list(F.matches(mol, mol_graph)) # RDKit mol and Graph (most efficient)
[(0, 1, 2, 3, 4, 5)]

Matches ensure that the fragment string of matches is the same as the fragment. This is different than standards SMARTS matching, and prevents rings from matching unclosed ring patterns:

>>> str(Fragment("C1CCCCC1")) # cyclohexane
'C1-C-C-C-C-C-1'

>>> assert(str(Fragment("C1CCCCC1")) != str(F)) # cyclohexane is not hexane
>>> list(F.matches("C1CCCCC1")) # Unlike SMARTS, no match!
[]

Efficiently create multiple fragments by reusing a precomputed graph:

>>> import rdkit
>>>
>>> mol = rdkit.Chem.MolFromSmiles("c1ccccc1OCCC")
>>> mol_graph = Graph.from_molecule(mol)
>>>
>>> f1 = Fragment(mol_graph, [0])
>>> f2 = Fragment(mol_graph, [6,5,4])

Find matches to fragments:

>>> list(f1.matches(mol))
[(0,), (1,), (2,), (3,), (4,), (5,)]

>>> list(f2.matches(mol))
[(6, 5, 4), (6, 5, 0)]

Fragments know how to report if they are canonically the same as each other or strings.

>>> Fragment("CCO") == Fragment("OCC")
True
>>> Fragment("CCO") == "C-C-O"
True

Note, however, that strings are not converted to canonical form. Therefore,

>>> Fragment("CCO") == "CCO"
False

Enumerate and compute statistics on all the subgraphs in a molecule:

>>> from xenosite.fragment.net import SubGraphFragmentNetwork
>>> N = SubGraphFragmentNetwork("CC1COC1")
>>> fragments = N.to_pandas()
>>> list(fragments.index)
['C-C', 'C', 'C-O-C', 'C-O', 'O', 'C-C1-C-O-C-1', 'C1-C-O-C-1', 'C-C-C-O', 'C-C(-C)-C', 'C-C-O', 'C-C-C']
>>> fragments["size"].to_numpy()
array([2, 1, 3, 2, 1, 5, 4, 4, 4, 3, 3])

Better fragments can be enumerated by collapsing all atoms in a ring into a single node during subgraph enumeration.

>>> from xenosite.fragment.net import RingFragmentNetwork
>>> N = RingFragmentNetwork("CC1COC1")
>>> fragments = N.to_pandas()
>>> list(fragments.index)
['C-C1-C-O-C-1', 'C', 'C1-C-O-C-1']
>>> fragments["size"].to_numpy()
array([5, 1, 4])

Project details

These details have not been verified by PyPI

Intended Audience
- Education
- Science/Research
Natural Language
- English
Operating System
- MacOS
- POSIX :: Linux
Programming Language
Topic
- Scientific/Engineering :: Chemistry
Typing
- Typed

Release history Release notifications | RSS feed

0a11 pre-release

Jul 23, 2023

0a10 pre-release

Jul 5, 2023

0a9 pre-release

Jul 1, 2023

0a8 pre-release

Jun 23, 2023

0a7 pre-release

Jun 17, 2023

0a5 pre-release

Jun 14, 2023

This version

0a4 pre-release

Jun 9, 2023

0a3 pre-release

Jun 9, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xenosite-fragment-0a4.tar.gz (23.8 kB view details)

Uploaded Jun 9, 2023 Source

File details

Details for the file xenosite-fragment-0a4.tar.gz.

File metadata

Download URL: xenosite-fragment-0a4.tar.gz
Upload date: Jun 9, 2023
Size: 23.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.1 CPython/3.10.8

File hashes

Hashes for xenosite-fragment-0a4.tar.gz
Algorithm	Hash digest
SHA256	`70fb37df360976b43fbd779cadde7975d4e977b1cbdeec147f2d334e48dc2271`
MD5	`036d08b94c1f7aeba81d2411fe78e79e`
BLAKE2b-256	`e93f5dff1cf8e6e53c0916f3b6c5c08bfe7efda4b7696e4ee8f2f3a55b71768d`