Skip to main content

SCRIPT V3.1: A deterministic, RDKit-independent molecular notation with 100% round-trip stereo parity, materials science extensions, biopolymer support, and formal LALR grammar.

Project description

SCRIPT: Structural Chemical Representation In Plain Text

SCRIPT is a deterministic molecular notation system with an RDKit-independent core engine. Every molecule has exactly one canonical SCRIPT string. No ambiguity. No post-hoc sanitization.

Aspirin in SMILES:  CC(=O)Oc1ccccc1C(=O)O  (one of many valid forms)
Aspirin in SCRIPT:  CC(=O)OC:C:C:C:C:C&6:C(=O)O  (always and only this)

Install

# Core (no RDKit required)
pip install linearscript

# With RDKit bridge for SMILES interop
pip install linearscript[rdkit]

Key Features

Feature SMILES SCRIPT
Canonical No Yes (DFS + Morgan)
Human-readable Yes Yes
Validation on parse No Yes (Sandhi state machine)
Organometallics Partial Full (dative, haptic, coordinate)
Alloys / Fractional occupancy No Yes (<~0.9>)
Crystallographic context No Yes ([[Rutile]])
Surface chemistry No Yes (`
Electronic / excited states No Yes (<s:3>, <*>)
Biopolymers (peptide / nucleic) No Yes ({A.G.S})
Query atoms (SMARTS-style) No Yes ([#6], [R], [v3])
Polymers / stochastic chains No Yes ({[CC]}n)
Reactions Partial Yes (3-part R>A>P)
RDKit-free core No Yes

Quick Start

Parse and canonicalize

from script.parser import SCRIPTParser
from script.canonical import SCRIPTCanonicalizer

parser = SCRIPTParser()
result = parser.parse("CC(=O)Oc1ccccc1C(=O)O")  # aspirin from SMILES-style input

mol = result["molecule"]
print(len(mol.atoms))   # 13
print(len(mol.bonds))   # 13

canon = SCRIPTCanonicalizer().canonicalize_core(mol)
print(canon)   # CC(=O)OC:C:C:C:C:C&6:C(=O)O

Stereochemistry

result = parser.parse("C[C@H](O)C(=O)O")   # L-Lactic acid
mol = result["molecule"]
# Chirality stored in mol.chiral_centers — DFS-invariant, CIP-verified

Reactions

result = parser.parse("[C:1]OCO>>[C:1]O")  # reaction with atom mapping
rxn = result["molecule"]                    # Reaction object
print(rxn.reactants, rxn.products)

Materials Science

# Alloy with fractional site occupancy
result = parser.parse("Ti<~0.9>N<~0.1>")
mol = result["molecule"]
print(mol.atoms[0].occupancy)   # 0.9

# Crystallographic phase
result = parser.parse("[[Rutile]] Ti(O)2")
print(result["molecule"].macroscopic_context)   # "Rutile"

# Surface adsorption
result = parser.parse("[[Pt_111]] | >C=O")
print(result["success"])   # True

# Triplet oxygen
result = parser.parse("O=O<s:3>")
print(result["molecule"].atoms[-1].spin)   # 3

Biopolymers

# Peptide chain (expands to atomic graph)
result = parser.parse("{A.G.S}")

# DNA oligonucleotide
result = parser.parse("{dA.dG.dC.dT}")

# Nucleotide modifications
result = parser.parse("{m5C.m6A.psU}")

RDKit interop

from rdkit import Chem
from script.rdkit_bridge import SCRIPTFromMol, MolFromSCRIPT

# SMILES -> SCRIPT
mol = Chem.MolFromSmiles("CN1CCC[C@H]1c2cccnc2")   # Nicotine
script_str = SCRIPTFromMol(mol)

# SCRIPT -> RDKit mol (100% InChI parity verified)
mol_back = MolFromSCRIPT(script_str)

Benchmark

Tested on a diverse 97-compound set (alkanes, rings, aromatics, stereocenters, drugs, natural products):

  • 100% InChI round-trip parity (SCRIPT -> RDKit -> InChI matches original)
  • 100% native round-trip (SCRIPT -> CoreMolecule -> SCRIPT)
  • 22/22 Materials Science tests passing (Alloys, Surfaces, Excited States)

License

MIT with Commons Clause. Free for academic and non-commercial use. Commercial licensing available separately.


Developed by SCRIPT Development Team.

GitHub: sangeet01/script

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

linearscript-3.1.1.tar.gz (37.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

linearscript-3.1.1-py3-none-any.whl (42.5 kB view details)

Uploaded Python 3

File details

Details for the file linearscript-3.1.1.tar.gz.

File metadata

  • Download URL: linearscript-3.1.1.tar.gz
  • Upload date:
  • Size: 37.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for linearscript-3.1.1.tar.gz
Algorithm Hash digest
SHA256 d90b1c6b00504e2cd8743e4baea9246f02210418ab7fd1eb99fb5127130f6014
MD5 37decd859841bed8cb9929345b9e1000
BLAKE2b-256 15fa8703f4f9229c49b499786b78cc3337a5a28370e69510e75f74bf07c2b8be

See more details on using hashes here.

File details

Details for the file linearscript-3.1.1-py3-none-any.whl.

File metadata

  • Download URL: linearscript-3.1.1-py3-none-any.whl
  • Upload date:
  • Size: 42.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for linearscript-3.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 f95f845813694f0e2df60ef900903ceb108f6faa9242fb7c7dcea450bac5ed8e
MD5 03d9ddf4b0c67c6116df6232dbf976d7
BLAKE2b-256 b7d262ecd6db82fe98001b582880036716a6f7bc90ae46f86e44d11082356c35

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page