Skip to main content

SCRIPT V3.1: A deterministic, RDKit-independent molecular notation with 100% round-trip stereo parity, materials science extensions, biopolymer support, and formal LALR grammar.

Project description

SCRIPT: Structural Chemical Representation In Plain Text

SCRIPT (linearscript on PyPI) is a deterministic molecular notation system with an RDKit-independent core engine. Every molecule has exactly one canonical SCRIPT string. No ambiguity. No post-hoc sanitization.

Aspirin in SMILES:  CC(=O)Oc1ccccc1C(=O)O  (one of many valid forms)
Aspirin in SCRIPT:  CC(=O)OC:C:C:C:C:C&6:C(=O)O  (always and only this)

Install

# Core (no RDKit required)
pip install linearscript

# With RDKit bridge for SMILES interop
pip install linearscript[rdkit]

Key Features

Feature SMILES SCRIPT
Canonical No Yes (DFS + Morgan)
Human-readable Yes Yes
Validation on parse No Yes (Sandhi state machine)
Organometallics Partial Full (dative, haptic, coordinate)
Alloys / Fractional occupancy No Yes (<~0.9>)
Crystallographic context No Yes ([[Rutile]])
Surface chemistry No Yes (`
Electronic / excited states No Yes (<s:3>, <*>)
Biopolymers (peptide / nucleic) No Yes ({A.G.S})
Query atoms (SMARTS-style) No Yes ([#6], [R], [v3])
Polymers / stochastic chains No Yes ({[CC]}n)
Reactions Partial Yes (3-part R>A>P)
RDKit-free core No Yes

Quick Start

Parse and canonicalize

from script.parser import SCRIPTParser
from script.canonical import SCRIPTCanonicalizer

parser = SCRIPTParser()
result = parser.parse("CC(=O)Oc1ccccc1C(=O)O")  # aspirin from SMILES-style input

mol = result["molecule"]
print(len(mol.atoms))   # 13
print(len(mol.bonds))   # 13

canon = SCRIPTCanonicalizer().canonicalize_core(mol)
print(canon)   # CC(=O)OC:C:C:C:C:C&6:C(=O)O

Stereochemistry

result = parser.parse("C[C@H](O)C(=O)O")   # L-Lactic acid
mol = result["molecule"]
# Chirality stored in mol.chiral_centers — DFS-invariant, CIP-verified

Reactions

result = parser.parse("[C:1]OCO>>[C:1]O")  # reaction with atom mapping
rxn = result["molecule"]                    # Reaction object
print(rxn.reactants, rxn.products)

Materials Science

# Alloy with fractional site occupancy
result = parser.parse("Ti<~0.9>N<~0.1>")
mol = result["molecule"]
print(mol.atoms[0].occupancy)   # 0.9

# Crystallographic phase
result = parser.parse("[[Rutile]] Ti(O)2")
print(result["molecule"].macroscopic_context)   # "Rutile"

# Surface adsorption
result = parser.parse("[[Pt_111]] | >C=O")
print(result["success"])   # True

# Triplet oxygen
result = parser.parse("O=O<s:3>")
print(result["molecule"].atoms[-1].spin)   # 3

Biopolymers

# Peptide chain (expands to atomic graph)
result = parser.parse("{A.G.S}")

# DNA oligonucleotide
result = parser.parse("{dA.dG.dC.dT}")

# Nucleotide modifications
result = parser.parse("{m5C.m6A.psU}")

RDKit interop

from rdkit import Chem
from script.rdkit_bridge import SCRIPTFromMol, MolFromSCRIPT

# SMILES -> SCRIPT
mol = Chem.MolFromSmiles("CN1CCC[C@H]1c2cccnc2")   # Nicotine
script_str = SCRIPTFromMol(mol)

# SCRIPT -> RDKit mol (100% InChI parity verified)
mol_back = MolFromSCRIPT(script_str)

Benchmark

Tested on a diverse 97-compound set (alkanes, rings, aromatics, stereocenters, drugs, natural products):

  • 100% InChI round-trip parity (SCRIPT -> RDKit -> InChI matches original)
  • 100% native round-trip (SCRIPT -> CoreMolecule -> SCRIPT)
  • 22/22 Materials Science tests passing (Alloys, Surfaces, Excited States)

License

MIT with Commons Clause. Free for academic and non-commercial use. Commercial licensing available separately.


Developed by SCRIPT Development Team. GitHub: script-notation/script

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

linearscript-3.1.0.tar.gz (37.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

linearscript-3.1.0-py3-none-any.whl (42.5 kB view details)

Uploaded Python 3

File details

Details for the file linearscript-3.1.0.tar.gz.

File metadata

  • Download URL: linearscript-3.1.0.tar.gz
  • Upload date:
  • Size: 37.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for linearscript-3.1.0.tar.gz
Algorithm Hash digest
SHA256 f68b26afc08fe0c83ef9228e2b47ce0facbb7c4b8a1d288b9075f2fd2002c9a1
MD5 2161c5d5eaf9cd8cec7c9858a713a484
BLAKE2b-256 12d5ad578285d6f90a3426dfeef093961b8efc154a9441413f2cdb01a17d6e1f

See more details on using hashes here.

File details

Details for the file linearscript-3.1.0-py3-none-any.whl.

File metadata

  • Download URL: linearscript-3.1.0-py3-none-any.whl
  • Upload date:
  • Size: 42.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for linearscript-3.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 603f32a742177f5f5b6e66513a9664baf6e29849410e0932f3c198cf76f1cc1d
MD5 8b1d4b3afc86ece494fa1c4d0a7e7441
BLAKE2b-256 6ab61ee86f5a32a4db9cdba1e35a2263ba396082569eb60f26ace4ea7334a382

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page