Skip to main content

BigSMILES Parser

Project description

BigSMILES Parser



(still under development; but usable)

SMILES (simplified molecular-input line-entry system) representation is a line notation for molecules with given deterministic molecular structures. BigSMILES is an extension to SMILES which provides support for molecules that contain stochastic molecular structures. The code here parses the string into and abstract syntax tree.

Learn more about BigSMILES Notation


Installation

Pip installable package available

pip install bigsmiles

pypi: bigsmiles


Requirements / Dependencies

Python 3.10+


Basic Usage

Print Tree

Code:

import bigsmiles

polymer_string = "CC{[>][<]CC(C)[>][<]}CC(C)=C"
polymer = bigsmiles.BigSMILES(polymer_string)
polymer.print_tree()

Output:

BigSMILES: CC{[>][<]CC(C)[>][<]}CC(C)=C
├── Atom: C
├── Bond: 
├── Atom: C
├── Bond: 
├── StochasticObject: {[>][<]CC(C)[>][<]}
    └── StochasticFragment: [<]CC(C)[>]
        ├── BondDescriptorAtom: [<]
        ├── Bond: 
        ├── Atom: C
        ├── Bond: 
        ├── Atom: C
        ├── Branch: (C)
            ├── Bond: 
            └── Atom: C
        ├── Bond: 
        └── BondDescriptorAtom: [>]
├── Bond: 
├── Atom: C
├── Bond: 
├── Atom: C
├── Branch: (C)
    ├── Bond: 
    └── Atom: C
├── Bond: =
└── Atom: C


Abstract Syntax Tree

root node: BigSMILES

intermediate nodes: StochasticObject, StochasticFragment, Branch

leaf nodes: BondDescriptorAtom, Atom, Bond

The tree structure is built through the nodes attribute.

Note: only main attributes shown in diagram below.

classDiagram

    class BigSMILES {
        list: nodes
    }
    
    
    class StochasticObject {
        int: id_
        list: nodes
        BondingDescriptor: end_group_left
        BondingDescriptor: end_group_right
    }
    
    
    class StochasticFragment {
        int: id_
        list: nodes
    }
    
    
    class Branch {
        int: id_
        list: nodes
    }
    
    
    class BondDescriptorAtom {
        int: id_
        BondDescriptor: descriptor
        Bond: bond
    }
    
    
    class BondDescriptor {
        str: symbol
        int: index_
        Enum: type_
        list[BondDescriptorAtom]: instances
    }

    
    class Bond {
        int: id_
        str: symbol
        Enum: type_
        Atom: atom1
        Atom: atom2
        int: ring_id
    }
    
    class Atom {
        int: id_
        str: symbol
        Enum: type_
        int: isotope
        int: charge
        Enum: chiral
        int: valance
        bool: orgainic
        list[Bond]: bonds
    }

    BigSMILES --|> Atom
    BigSMILES --|> Bond
    BigSMILES --|> Branch
    BigSMILES --|> StochasticObject
    StochasticObject --|> StochasticFragment
    StochasticFragment --|> BondDescriptorAtom
    BondDescriptor --|> BondDescriptorAtom
    StochasticFragment --|> Atom
    StochasticFragment --|> Bond
    StochasticFragment --|> Branch
    StochasticFragment --|> StochasticObject
    Branch --|> BondDescriptorAtom
    Branch --|> StochasticObject
    Branch --|> Bond
    Branch --|> Atom
    


Advanced Options

Colored outputs

import bigsmiles

bigsmiles.Config.color_output = True

BigSMILES: (html and letex verions shown)

CC { [>] [<] CC(C) [>] [<] } CC(C)=C

Features NOT implemented yet

  • Cis/Trans
  • Fragment Notation
  • ladder polymers
  • mixture notation '.'
  • reactions
  • Validation
    • Some is present; but more needed:
      • Validate bonding descriptors matching including endgroups

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bigsmiles-0.0.4.tar.gz (17.5 kB view hashes)

Uploaded Source

Built Distribution

bigsmiles-0.0.4-py3-none-any.whl (19.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page