SCRIPT V3.1: A deterministic, RDKit-independent molecular notation with 100% round-trip stereo parity, materials science extensions, biopolymer support, and formal LALR grammar.
Project description
SCRIPT: Structural Chemical Representation In Plain Text
SCRIPT is a deterministic molecular notation system with an RDKit-independent core engine. Every molecule has exactly one canonical SCRIPT string. No ambiguity. No post-hoc sanitization.
Aspirin in SMILES: CC(=O)Oc1ccccc1C(=O)O (one of many valid forms)
Aspirin in SCRIPT: CC(=O)OC:C:C:C:C:C&6:C(=O)O (always and only this)
Install
# Core (no RDKit required)
pip install linearscript
# With RDKit bridge for SMILES interop
pip install linearscript[rdkit]
Key Features
| Feature | SMILES | SCRIPT |
|---|---|---|
| Canonical | No | Yes (DFS + Morgan) |
| Human-readable | Yes | Yes |
| Validation on parse | No | Yes (Sandhi state machine) |
| Organometallics | Partial | Full (dative, haptic, coordinate) |
| Alloys / Fractional occupancy | No | Yes (<~0.9>) |
| Crystallographic context | No | Yes ([[Rutile]]) |
| Surface chemistry | No | Yes (` |
| Electronic / excited states | No | Yes (<s:3>, <*>) |
| Biopolymers (peptide / nucleic) | No | Yes ({A.G.S}) |
| Query atoms (SMARTS-style) | No | Yes ([#6], [R], [v3]) |
| Polymers / stochastic chains | No | Yes ({[CC]}n) |
| Reactions | Partial | Yes (3-part R>A>P) |
| RDKit-free core | No | Yes |
Quick Start
Parse and canonicalize
from script.parser import SCRIPTParser
from script.canonical import SCRIPTCanonicalizer
parser = SCRIPTParser()
result = parser.parse("CC(=O)Oc1ccccc1C(=O)O") # aspirin from SMILES-style input
mol = result["molecule"]
print(len(mol.atoms)) # 13
print(len(mol.bonds)) # 13
canon = SCRIPTCanonicalizer().canonicalize_core(mol)
print(canon) # CC(=O)OC:C:C:C:C:C&6:C(=O)O
Stereochemistry
result = parser.parse("C[C@H](O)C(=O)O") # L-Lactic acid
mol = result["molecule"]
# Chirality stored in mol.chiral_centers — DFS-invariant, CIP-verified
Reactions
result = parser.parse("[C:1]OCO>>[C:1]O") # reaction with atom mapping
rxn = result["molecule"] # Reaction object
print(rxn.reactants, rxn.products)
Materials Science
# Alloy with fractional site occupancy
result = parser.parse("Ti<~0.9>N<~0.1>")
mol = result["molecule"]
print(mol.atoms[0].occupancy) # 0.9
# Crystallographic phase
result = parser.parse("[[Rutile]] Ti(O)2")
print(result["molecule"].macroscopic_context) # "Rutile"
# Surface adsorption
result = parser.parse("[[Pt_111]] | >C=O")
print(result["success"]) # True
# Triplet oxygen
result = parser.parse("O=O<s:3>")
print(result["molecule"].atoms[-1].spin) # 3
Biopolymers
# Peptide chain (expands to atomic graph)
result = parser.parse("{A.G.S}")
# DNA oligonucleotide
result = parser.parse("{dA.dG.dC.dT}")
# Nucleotide modifications
result = parser.parse("{m5C.m6A.psU}")
RDKit interop
from rdkit import Chem
from script.rdkit_bridge import SCRIPTFromMol, MolFromSCRIPT
# SMILES -> SCRIPT
mol = Chem.MolFromSmiles("CN1CCC[C@H]1c2cccnc2") # Nicotine
script_str = SCRIPTFromMol(mol)
# SCRIPT -> RDKit mol (100% InChI parity verified)
mol_back = MolFromSCRIPT(script_str)
Benchmark
Tested on a diverse 97-compound set (alkanes, rings, aromatics, stereocenters, drugs, natural products):
- 100% InChI round-trip parity (SCRIPT -> RDKit -> InChI matches original)
- 100% native round-trip (SCRIPT -> CoreMolecule -> SCRIPT)
- 22/22 Materials Science tests passing (Alloys, Surfaces, Excited States)
License
MIT with Commons Clause. Free for academic and non-commercial use. Commercial licensing available separately.
Developed by SCRIPT Development Team.
GitHub: sangeet01/script
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file linearscript-3.1.1.tar.gz.
File metadata
- Download URL: linearscript-3.1.1.tar.gz
- Upload date:
- Size: 37.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d90b1c6b00504e2cd8743e4baea9246f02210418ab7fd1eb99fb5127130f6014
|
|
| MD5 |
37decd859841bed8cb9929345b9e1000
|
|
| BLAKE2b-256 |
15fa8703f4f9229c49b499786b78cc3337a5a28370e69510e75f74bf07c2b8be
|
File details
Details for the file linearscript-3.1.1-py3-none-any.whl.
File metadata
- Download URL: linearscript-3.1.1-py3-none-any.whl
- Upload date:
- Size: 42.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f95f845813694f0e2df60ef900903ceb108f6faa9242fb7c7dcea450bac5ed8e
|
|
| MD5 |
03d9ddf4b0c67c6116df6232dbf976d7
|
|
| BLAKE2b-256 |
b7d262ecd6db82fe98001b582880036716a6f7bc90ae46f86e44d11082356c35
|