Skip to main content

Minisandbox for Javascript interpreter.

Project description

dcnr-javascriptbox

A self-contained, standard-library-only Python toolkit that tokenizes, parses, and executes a JavaScript-like language. The AST design mirrors Python's ast module; the interpreter is embeddable and extensible with host-provided functions, objects, and classes.


Table of contents

  1. Project structure
  2. Quick start
  3. Parser — programmer's reference
  4. Interpreter — programmer's reference
  5. Implementation deep-dive
  6. Extending the parser
  7. Extending the interpreter
  8. Supported language features
  9. Edge cases & known limitations
  10. Future extensions roadmap

1. Project structure

dcnr-jsbox/
│
├── jsparse/                    # Parser package
│   ├── __init__.py             # Public API: tokenize, parse, dump
│   ├── errors.py               # LexError, ParseError (with caret messages)
│   ├── tokens.py               # TokenType enum, Token dataclass, tokenize()
│   ├── ast_nodes.py            # AST dataclasses + iter_child_nodes
│   ├── parser.py               # Recursive-descent Parser
│   └── pprint.py               # dump() — ast.dump-style pretty-printer
│
├── jsexec/                     # Interpreter package
│   ├── __init__.py             # Public API: Interpreter, value types
│   ├── errors.py               # JSRuntimeError + control-flow signals
│   ├── values.py               # UNDEFINED, JSFunction, JSNativeFunction,
│   │                           #   JSClass, JSObject
│   ├── environment.py          # Environment (lexical scope chain)
│   └── interpreter.py          # Tree-walking Interpreter (dispatch table)
│
└── demos
    ├── demo.py                     # Parser demo (tokenize + parse + dump)
    ├── demo_exec.py                # Interpreter demo (full extension API)
    ├── demo_functions.py           # function + lambda forms verification
    ├── demo_operators.py           # typeof / instanceof / delete / void verification
    ├── demo_dowhile.py             # do { } while verification
    ├── demo_switch.py              # switch / case verification
    ├── demo_forinof.py             # for ... in / for ... of verification
    ├── demo_trycatch.py            # try / catch / finally / throw verification
    ├── demo_regex.py               # /pattern/flags regex literal verification
    ├── demo_template.py            # `template ${expr}` string verification
    └── README.md                   # This file

2. Quick start

from jsparse import tokenize, parse, dump
from jsexec import Interpreter, UNDEFINED

# ── Parse ──
source = 'let x = 2 + 3; print(x);'
tokens  = tokenize(source)     # list[Token], ending with EOF
program = parse(source)        # ast_nodes.Program
print(dump(program))           # indented multi-line AST

# ── Execute ──
interp = Interpreter()
interp.register_function("print", lambda *a: print(*a))
interp.run(program)            # prints: 5

Run the bundled demos:

python demo.py              REM parser demo
python demo_exec.py         REM interpreter demo
python demo_functions.py    REM function/lambda verification
python demo_operators.py    REM typeof / instanceof / delete / void
python demo_dowhile.py      REM do { } while verification
python demo_switch.py       REM switch / case verification
python demo_forinof.py      REM for ... in / for ... of verification
python demo_trycatch.py     REM try / catch / finally / throw verification
python demo_regex.py        REM regex literal verification
python demo_template.py     REM template string verification

3. Parser — programmer's reference

3.1 Tokenizer (jsparse.tokens)

from jsparse import tokenize, Token, TokenType

tokenize(source: str) -> list[Token]

Splits source into a flat list of Token objects terminated by a final Token(type=TokenType.EOF).

Token fields

Field Type Description
type TokenType Enum member identifying the token category
value object Parsed value: int/float for NUMBER, str for STRING/IDENT, True/False/None for booleans/null, keyword text otherwise
line int 1-based line number where the token starts
column int 1-based column number where the token starts
start int Character offset into source where the token starts
end int Character offset just past the last char (half-open)

TokenType members

Literals: NUMBER, STRING, IDENT

Keywords: KW_VAR, KW_LET, KW_CONST, KW_FUNCTION, KW_RETURN, KW_IF, KW_ELSE, KW_WHILE, KW_FOR, KW_TRUE, KW_FALSE, KW_NULL, KW_UNDEFINED, KW_NEW, KW_BREAK, KW_CONTINUE, KW_LAMBDA, KW_TYPEOF, KW_INSTANCEOF, KW_DELETE, KW_VOID, KW_DO, KW_SWITCH, KW_CASE, KW_DEFAULT, KW_IN, KW_OF, KW_TRY, KW_CATCH, KW_FINALLY, KW_THROW

Punctuation: LPAREN, RPAREN, LBRACE, RBRACE, LBRACKET, RBRACKET, COMMA, SEMI, COLON, DOT, QUESTION, ARROW (=>)

Operators: ASSIGN (=), PLUS, MINUS, STAR, SLASH, PERCENT, EQ (==), NEQ (!=), SEQ (===), SNEQ (!==), LT, GT, LTE, GTE, AND (&&), OR (||), NOT (!), PLUSPLUS, MINUSMINUS, PLUSEQ, MINUSEQ, STAREQ, SLASHEQ

Misc: EOF

Regex: REGEX — emitted only in expression contexts (when / cannot be the division operator). Its .value is a 2-tuple (pattern, flags), both strings. See §8 for the disambiguation rules.

Template: TEMPLATE — emitted for `text${expr}more` literals. Its .value is a 2-tuple (quasis, expr_sources) where quasis is the list of literal-text chunks (length N+1) and expr_sources is a list of (source, line, col) triples (length N) for each ${...} placeholder. The parser re-runs itself on each placeholder source to produce the embedded expression's AST.

Keyword map

The KEYWORDS dict in tokens.py maps source-level keyword strings to their TokenType. To add a keyword, insert one entry here and one enum member above.

KEYWORDS = {
    "var": TokenType.KW_VAR,   "let": TokenType.KW_LET,
    "const": TokenType.KW_CONST, "function": TokenType.KW_FUNCTION,
    "return": TokenType.KW_RETURN, "if": TokenType.KW_IF,
    "else": TokenType.KW_ELSE,  "while": TokenType.KW_WHILE,
    "for": TokenType.KW_FOR,    "true": TokenType.KW_TRUE,
    "false": TokenType.KW_FALSE, "null": TokenType.KW_NULL,
    "undefined": TokenType.KW_UNDEFINED, "new": TokenType.KW_NEW,
    "break": TokenType.KW_BREAK, "continue": TokenType.KW_CONTINUE,
    "lambda": TokenType.KW_LAMBDA,
    "typeof": TokenType.KW_TYPEOF, "instanceof": TokenType.KW_INSTANCEOF,
    "delete": TokenType.KW_DELETE, "void": TokenType.KW_VOID,
}

Lexer features

Feature Details
Comments // line comments; /* ... */ block comments (nesting not supported); unterminated block comments raise LexError
Identifiers [a-zA-Z_$][a-zA-Z0-9_$]*; checked against KEYWORDS
Numbers Decimal integers, floats (1.5), exponents (1e5, 2.5E-3), hex (0xFF)
Strings Single or double quoted; escape sequences: `\n \t \r \0 \ ' " ``
Operators Longest-match first: === / !== (3-char), then two-char combos (==, !=, <=, >=, &&, `

3.2 AST nodes (jsparse.ast_nodes)

Every node is a @dataclass inheriting from Node. Position info is stored in kw-only fields line and col (both default to 0). The iter_fields helper skips these so pretty-printing focuses on structure.

Base class

@dataclass
class Node:
    line: int = field(default=0, kw_only=True)
    col:  int = field(default=0, kw_only=True)

Statement nodes

Node Fields Description
Program body: list[Statement] Root node
Block body: list[Statement] { ... } block
ExpressionStatement expression: Expression Expression used as statement
VariableDeclaration kind: str, declarations: list[VariableDeclarator] var/let/const
VariableDeclarator name: str, init: Expression? Single binding
FunctionDeclaration name: str, params: list[str], body: Block function name(...) { ... }
ReturnStatement argument: Expression? return expr;
IfStatement test, consequent, alternate? if (...) ... else ...
WhileStatement test, body while (...) ...
ForStatement init?, test?, update?, body C-style for
BreakStatement (no fields) break;
ContinueStatement (no fields) continue;

Expression nodes

Node Fields Description
Identifier name: str Variable reference
Literal value: Any, raw: str Number, string, bool, null, undefined
RegexLiteral pattern: str, flags: str /abc/i, /^[a-z]+$/g
TemplateLiteral quasis: list[str], expressions: list[Expression] `hi, ${name}!`
ArrayExpression elements: list[Expression] [1, 2, 3]
ObjectExpression properties: list[ObjectProperty] { key: value }
ObjectProperty key: Identifier|Literal, value: Expression Single property
FunctionExpression name: str?, params: list[str], body: Block function (x) { ... }
ArrowFunction params: list[str], body: Expr|Block, expression: bool (x) => x + 1 or lambda x: x + 1
UnaryOp op: str, operand, prefix: bool !x, -x, +x
UpdateOp op: str, operand, prefix: bool ++x, x--
BinaryOp op: str, left, right a + b, a * b, a < b, etc.
LogicalOp op: str, left, right a && b, a || b
AssignmentExpression op: str, target, value x = 1, x += 2
ConditionalExpression test, consequent, alternate a ? b : c
MemberAccess object, property, computed: bool obj.x or obj[expr]
CallExpression callee, arguments: list[Expression] fn(a, b)
NewExpression callee, arguments: list[Expression] new Cls(a, b)

Tree-walking helpers

from jsparse.ast_nodes import iter_fields, iter_child_nodes

for name, value in iter_fields(node):    # yields (field_name, value), skips line/col
    ...

for child in iter_child_nodes(node):     # yields direct Node children (recursive visitor friendly)
    ...

3.3 Parser (jsparse.parser)

from jsparse import parse
from jsparse.parser import Parser

program = parse(source)            # convenience wrapper
# or
parser = Parser(source)
program = parser.parse()           # returns Program node

The Parser class is a recursive-descent parser with precedence climbing for expressions. It consumes the token list produced by tokenize() and emits AST nodes.

Key internal methods

Method Returns Role
parse() Program Entry point: statement loop until EOF
_parse_statement() Node Statement-level dispatch by token type
_parse_var_declaration() VariableDeclaration var / let / const with declarators
_parse_function_declaration() FunctionDeclaration function name(...) { ... }
_parse_return() ReturnStatement return expr?;
_parse_if() IfStatement if (...) ... else ...
_parse_while() WhileStatement while (...) ...
_parse_for() ForStatement C-style for (...; ...; ...) ...
_parse_block() Block { statement* }
_parse_expression_statement() ExpressionStatement Wraps any expression as a statement
_parse_assignment() Node Top of the expression precedence chain
_try_parse_arrow() ArrowFunction? Lookahead for => arrow syntax
_parse_conditional() Node Ternary ? :
_parse_logical_or() Node ||
_parse_logical_and() Node &&
_parse_equality() Node ==, !=, ===, !==
_parse_relational() Node <, >, <=, >=
_parse_additive() Node +, -
_parse_multiplicative() Node *, /, %
_parse_unary() Node !, -, +, ++, -- (prefix)
_parse_postfix() Node ++, -- (postfix)
_parse_call() Node (), .prop, [expr] chains
_parse_primary() Node Literals, identifiers, grouping, new, lambda, etc.
_parse_lambda() ArrowFunction All lambda surface forms
_parse_function_expression() FunctionExpression function name?(...) { ... }
_parse_new() NewExpression new callee(args)
_parse_array_literal() ArrayExpression [elements]
_parse_object_literal() ObjectExpression { key: value, ... }
_binary_left(sub, ops) Node Generic left-assoc binary chain helper

Token helpers

Helper Description
_peek(off=0) Look at a token without consuming
_check(*types) Is the next token one of these types?
_match(*types) Consume and return the token if it matches, else None
_expect(type_, what="") Consume or raise ParseError
_consume_optional_semi() Eat a ; if present (permissive ASI)

Semicolons

Semicolons are optional. The parser calls _consume_optional_semi() after statements — it eats a ; if present, otherwise continues. This is a permissive approximation of JavaScript's ASI.


3.4 Pretty-printer (jsparse.pprint)

from jsparse import dump

text = dump(node)                                    # default: 2-space indent, no position
text = dump(node, indent=4, include_position=True)   # @line:col on every node

Output looks like ast.dump():

Program(
  body=[
    VariableDeclaration(
      kind='let',
      declarations=[
        VariableDeclarator(
          name='x',
          init=Literal(
            value=42,
            raw='42'
          )
        )
      ]
    )
  ]
)

3.5 Error handling (jsparse.errors)

Exception Raised by When
JSParseError (base) Never directly; common base for below
LexError Tokenizer Bad character, unterminated string/comment
ParseError Parser Unexpected token, missing expected token

All carry message, line, column, and optional source. Their __str__ renders a caret-style diagnostic:

ParseError: Expected RPAREN, got SEMI (';') at line 3, column 12
    foo(bar;
           ^

3.6 Grammar reference

program        := statement* EOF

statement      := varDecl | funcDecl | returnStmt
                | ifStmt | whileStmt | forStmt
                | breakStmt | continueStmt
                | block | exprStmt

varDecl        := ('var'|'let'|'const') declarator (',' declarator)* ';'?
declarator     := IDENT ('=' assignment)?
funcDecl       := 'function' IDENT '(' params? ')' block
returnStmt     := 'return' assignment? ';'?
ifStmt         := 'if' '(' assignment ')' statement ('else' statement)?
whileStmt      := 'while' '(' assignment ')' statement
forStmt        := 'for' '(' (varDecl | exprStmt | ';')
                          assignment? ';'
                          assignment? ')' statement
breakStmt      := 'break' ';'?
continueStmt   := 'continue' ';'?
block          := '{' statement* '}'
exprStmt       := assignment ';'?

assignment     := conditional ( ('='|'+='|'-='|'*='|'/=') assignment )?
conditional    := logicalOr ( '?' assignment ':' assignment )?
logicalOr      := logicalAnd ('||' logicalAnd)*
logicalAnd     := equality   ('&&' equality)*
equality       := relational (('=='|'!='|'==='|'!==') relational)*
relational     := additive   (('<'|'>'|'<='|'>='|'instanceof') additive)*
additive       := multiplicative (('+'|'-') multiplicative)*
multiplicative := unary      (('*'|'/'|'%') unary)*
unary          := ('!'|'-'|'+'|'++'|'--'|'typeof'|'void'|'delete') unary
                | postfix
postfix        := call ('++' | '--')?
call           := primary ( '(' args? ')' | '.' IDENT | '[' assignment ']' )*

primary        := NUMBER | STRING | 'true' | 'false' | 'null' | 'undefined'
                | IDENT | '(' assignment ')' | arrayLit | objectLit
                | funcExpr | lambdaExpr | 'new' call | arrowFn

arrayLit       := '[' (assignment (',' assignment)* ','?)? ']'
objectLit      := '{' (prop (',' prop)* ','?)? '}'
prop           := (IDENT | STRING) ':' assignment
funcExpr       := 'function' IDENT? '(' params? ')' block

lambdaExpr     := 'lambda' '(' params? ')' block           -- block body
                | 'lambda' '(' params? ')' '=>'? assignment -- expression body
                | 'lambda' (IDENT (',' IDENT)*)? ':' assignment -- Python-style

arrowFn        := IDENT '=>' (assignment | block)
                | '(' params? ')' '=>' (assignment | block)

Operator precedence (lowest → highest):

Level Operators / construct Associativity
1 =, +=, -=, *=, /= Right
2 ? : Right
3 || Left
4 && Left
5 ==, !=, ===, !== Left
6 <, >, <=, >=, instanceof Left
7 +, - Left
8 *, /, % Left
9 !, -, +, ++, --, typeof, void, delete (pre) Right (unary)
10 ++, -- (post)
11 (), ., [] Left (call)

4. Interpreter — programmer's reference

4.1 Interpreter class

from jsexec import Interpreter

interp = Interpreter()

Public methods

Method Description
run(program: Program) -> Any Execute a parsed program. Returns value of last expression statement.
register_function(name, callable) -> JSNativeFunction Bind a Python callable as a const global.
register_object(name, obj) -> Any Bind a value as a const global. Dicts auto-wrapped as JSObject.
register_class(cls: JSClass) -> JSClass Register a JSClass under cls.name as a const global.

Properties

Property Type Description
globals Environment Top-level scope; all registered values live here.

4.2 Environment & scoping

from jsexec import Environment

Environment implements a lexical scope chain with parent pointers.

Methods

Method Description
declare(name, value, kind) Bind name in the current scope. kind is "let", "const", or "var". Rejects duplicate let/const declarations.
declare_var(name, value) var hoisting: walks up to the nearest is_function_scope=True frame, binds there.
get(name) -> Any Walk the scope chain upward. Raises JSRuntimeError if not found.
has(name) -> bool Walk the scope chain; returns whether the name exists.
assign(name, value) -> Any Walk chain to find the binding, update it. Raises on const reassignment or if name doesn't exist.
child(function_scope=False) Create a new child Environment linked to this one.

Binding kinds

Kind Block-scoped? Reassignable? Hoisted?
let Yes Yes No (lives in declaring block)
const Yes No No
var No Yes Yes (to nearest function scope)

When does a new Environment open?

Situation How
Every { } block env.child()
for statement's init env.child()
Every function call env.child(function_scope=True) — this is the var hoisting boundary

4.3 Runtime value types

All values live in jsexec.values.

Type Purpose
UNDEFINED Singleton sentinel (_Undefined()), falsy, repr"undefined". Distinct from Python None which represents JS null.
JSNativeFunction Wraps a host Python callable. Fields: name, fn. Protocol: call(interp, args) -> Any.
JSFunction User-defined function. Fields: name, params, body (Block AST), closure (Environment), bound_this.
JSObject Dict-backed object with optional class link. Fields: properties: dict, cls: JSClass?. Methods: get(name), set(name, value).
JSClass Class definition. Fields: name, methods: dict, attributes: dict, init: callable?. Methods: instantiate(interp, args), lookup_method(name).

Callable protocol. Anything with a .call(interp, args) method can be invoked from JS code. Plain Python callables also work — the interpreter falls back to fn(*args).

JSFunction.bind(this_obj) returns a copy of the function with bound_this set. The interpreter calls this automatically on obj.method() invocations so this is correctly bound.

JSClass.instantiate(interp, args) creates a JSObject with cls=self, copies class attributes as initial properties, then calls init (if set) with the instance and constructor arguments.


4.4 Control flow internals

Control flow is implemented with lightweight BaseException subclasses in jsexec.errors. They inherit from BaseException (not Exception) so user code's normal exception handling never catches them.

Signal Raised by Caught by
BreakSignal break statement for / while / do-while / switch
ContinueSignal continue statement for / while / do-while loops
ReturnSignal return statement JSFunction.call()
ThrowSignal throw statement try block (or surfaces as JSRuntimeError at top level)

break and continue outside a loop produce a clear JSRuntimeError("'break' used outside of a loop", line, col) thanks to an _inside_loop depth counter on the interpreter. Function calls save/restore this counter so a break inside a function body defined inside a loop is correctly flagged as invalid.


4.5 Truthiness rules

Value Truthy?
false No
null (Python None) No
undefined (UNDEFINED) No
0, 0.0 No
"" (empty string) No
Everything else Yes (including [], {}, "0")

4.6 Member access & property protocol

obj.prop and obj[expr] resolve through _member_get():

obj type Behavior
JSObject obj.get(key) → own properties, then class methods/attrs
JSClass Class-level: static attributes first, then methods
dict Python dict .get(key, UNDEFINED)
list .lengthlen(); numeric index → element; out-of-range → UNDEFINED
str .lengthlen(); numeric index → character
anything Falls back to Python getattr(obj, key); callables are auto-wrapped in JSNativeFunction

Assignment through _member_set():

obj type Behavior
JSObject obj.set(key, value)
dict obj[key] = value
list Numeric index; auto-grows with UNDEFINED fill if past end
other Raises JSRuntimeError

4.7 Extension API

Registering a custom function

interp.register_function("print", lambda *args: print(*args))
interp.register_function("sqrt", lambda x: x ** 0.5)

The callable receives Python-native values: numbers are int/float, strings are str, booleans are bool, null is None, undefined is UNDEFINED, arrays are list, objects are JSObject.

Registering a custom object

# Dict → auto-wrapped as JSObject, so dot access works from JS:
interp.register_object("config", {"name": "app", "version": 42})

# Or pass a JSObject directly:
from jsexec import JSObject
interp.register_object("state", JSObject(properties={"count": 0}))

Registering a custom class

from jsexec import JSClass, JSNativeFunction, JSObject

def _init(instance: JSObject, x, y):
    instance.set("x", x)
    instance.set("y", y)

def _length(instance: JSObject):
    return (instance.get("x") ** 2 + instance.get("y") ** 2) ** 0.5

def _scale(instance: JSObject, factor):
    instance.set("x", instance.get("x") * factor)
    instance.set("y", instance.get("y") * factor)

Point = JSClass(
    name="Point",
    attributes={"kind": "2D"},                          # class-level attrs
    init=JSNativeFunction("Point.init", _init),         # constructor
    methods={
        "length": _wrap_method(_length),                # instance methods
        "scale":  _wrap_method(_scale),
    },
)
interp.register_class(Point)

From JS code:

let p = new Point(3, 4);
print(p.x, p.y);          // 3 4
print(p.length());         // 5.0
p.scale(2);
print(p.x, p.y);          // 6 8
print(Point.kind);         // "2D"

Method-wrapping protocol: any object with .bind(instance) → copy and .call(interp, args) can serve as a method. See _wrap_method() in demo_exec.py for a minimal implementation:

class _BoundableMethod:
    def __init__(self, fn, this=None):
        self.fn = fn
        self._this = this
        self.name = fn.__name__

    def bind(self, instance):
        return _BoundableMethod(self.fn, this=instance)

    def call(self, interp, args):
        return self.fn(self._this, *args)

def _wrap_method(py_fn):
    return _BoundableMethod(py_fn)

4.8 typeof / instanceof / delete / void

All four are parsed as standard operators (no special-cased syntax) and implemented entirely inside the interpreter. They are real keywords — typeof, instanceof, delete, and void are reserved and cannot be used as identifiers.

typeof operand — unary, returns a string

Operand kind Result
undefined "undefined"
null "object" (JS quirk, preserved)
Boolean (true / false) "boolean"
Number (int / float) "number"
String "string"
JSFunction, JSNativeFunction, JSClass, any callable "function"
JSObject, dict, list, anything else "object"

Special rule: typeof <undeclaredIdent> returns "undefined" instead of raising — this matches JavaScript and is a common feature-detection idiom. Only direct identifier operands get this treatment; typeof undeclaredObj.prop still raises because the .prop access is evaluated.

value instanceof cls — binary, returns a boolean

Sits at the relational precedence level (same as <, >, <=, >=).

cls argument Behavior
JSClass True iff value is a JSObject with cls set to exactly that class
Python type Falls back to isinstance(value, cls)
Anything else (number, function, etc.) False

Without a prototype chain, this implementation does not currently recognize value instanceof someJSFunction — host-defined JSClass is the canonical class facility and what instanceof reasons about.

delete target — unary, returns a boolean

Target form Behavior
obj.prop / obj[expr] on JSObject Removes property; returns True
obj.prop / obj[expr] on dict dict.pop(key, None); returns True
arr[i] on a Python list Sets arr[i] = UNDEFINED; returns True
Plain identifier (delete x) No-op; returns False (lexical bindings are not removable)
Other host containers No-op; returns False

Deleting a non-existent property is not an error — it returns True, mirroring JS.

void operand — unary, always returns undefined

Evaluates operand for its side effects, discards the result, and returns the UNDEFINED singleton. The classic use is void 0 as a guaranteed-undefined value, but any expression works.

let x = 5;
void (x = x + 10);   // returns undefined; x is now 15

All four operators produce a UnaryOp AST node with op set to the keyword ("typeof", "delete", or "void"); instanceof produces a BinaryOp with op="instanceof".


5. Implementation deep-dive

5.1 Lexer implementation

The lexer in tokens.py is a hand-written scanner (no regex in the hot path). It maintains:

  • i — current character index into the source string
  • line / col — 1-based position tracking
  • tokens — output list being built

Main loop: _skip_ws_and_comments()_scan_token() → repeat until end of source, then append an EOF token.

_scan_token() dispatch by first character:

  1. [a-zA-Z_$]_scan_ident() — reads the full word, looks it up in KEYWORDS dict; stores True/False/None as Python values for boolean/null literals.
  2. [0-9]_scan_number() — handles decimal integers, floats (1.5), exponents (1e5, 2.5E-3), hex (0xFF). Stores int or float as the value.
  3. ' or "_scan_string() — handles escape sequences (\n, \t, \r, \0, \\, \', \", \`). Raises LexError on unterminated strings or stray newlines.
  4. Otherwise: tries 3-char operators (===, !==), then 2-char operators (from the two_map dict), then single-char operators (from the single_map dict). Falls through to LexError("Unexpected character") if nothing matches.

Comments: _skip_ws_and_comments() handles both // line comments (consume until newline) and /* ... */ block comments (consume until closing */, raising LexError if unterminated).

5.2 Parser implementation

The parser in parser.py is a recursive-descent parser with a _binary_left() helper for left-associative binary operator chains:

def _binary_left(self, sub, ops, cls=BinaryOp):
    node = sub()
    while self._peek().type in ops:
        tok = self.tokens[self.pos]; self.pos += 1
        right = sub()
        node = cls(op=ops[tok.type], left=node, right=right, ...)
    return node

Expression parsing chains these calls from lowest to highest precedence:

_parse_assignment
  └→ _try_parse_arrow (lookahead)
  └→ _parse_conditional
       └→ _parse_logical_or
            └→ _parse_logical_and
                 └→ _parse_equality
                      └→ _parse_relational
                           └→ _parse_additive
                                └→ _parse_multiplicative
                                     └→ _parse_unary
                                          └→ _parse_postfix
                                               └→ _parse_call
                                                    └→ _parse_primary

Statement parsing in _parse_statement() checks the current token's type and dispatches:

def _parse_statement(self):
    tok = self._peek()
    if tok.type in (KW_VAR, KW_LET, KW_CONST): return self._parse_var_declaration()
    if tok.type is KW_FUNCTION and peek(1) is IDENT: return self._parse_function_declaration()
    if tok.type is KW_RETURN: return self._parse_return()
    if tok.type is KW_IF:     return self._parse_if()
    if tok.type is KW_WHILE:  return self._parse_while()
    if tok.type is KW_FOR:    return self._parse_for()
    if tok.type is KW_BREAK:  # consume, optional semi, return BreakStatement
    if tok.type is KW_CONTINUE:  # ditto for ContinueStatement
    if tok.type is LBRACE:    return self._parse_block()
    return self._parse_expression_statement()  # fallthrough

Arrow function detection uses two-token lookahead in _try_parse_arrow(): it checks for IDENT '=>' or scans ahead through balanced parentheses to confirm '(' ... ')' '=>' before committing to the arrow parse path. If lookahead fails, it returns None and the parser falls through to normal expression parsing without consuming any tokens.

Lambda parsing is triggered by the KW_LAMBDA token in _parse_primary(). The _parse_lambda() method accepts four surface forms and produces an ArrowFunction node for all of them:

Form Example Body type
Parenthesized + block lambda (a, b) { return a+b; } Block
Parenthesized + expression lambda (x) => x * x Expression
Python-style with params lambda a, b: a + b Expression
Python-style zero-arg lambda: 42 Expression

5.3 Interpreter dispatch table

interpreter.py maps each AST class to a handler using a plain Python dict:

self._dispatch = {
    A.Program:               self._exec_program,
    A.Block:                 self._exec_block,
    A.ExpressionStatement:   self._exec_expression_statement,
    A.VariableDeclaration:   self._exec_variable_declaration,
    A.FunctionDeclaration:   self._exec_function_declaration,
    A.ReturnStatement:       self._exec_return,
    A.IfStatement:           self._exec_if,
    A.WhileStatement:        self._exec_while,
    A.ForStatement:          self._exec_for,
    A.BreakStatement:        self._exec_break,
    A.ContinueStatement:     self._exec_continue,
    A.Literal:               self._eval_literal,
    A.Identifier:            self._eval_identifier,
    A.ArrayExpression:       self._eval_array,
    A.ObjectExpression:      self._eval_object,
    A.FunctionExpression:    self._eval_function_expr,
    A.ArrowFunction:         self._eval_arrow,
    A.UnaryOp:               self._eval_unary,
    A.UpdateOp:              self._eval_update,
    A.BinaryOp:              self._eval_binary,
    A.LogicalOp:             self._eval_logical,
    A.AssignmentExpression:  self._eval_assignment,
    A.ConditionalExpression: self._eval_conditional,
    A.MemberAccess:          self._eval_member,
    A.CallExpression:        self._eval_call,
    A.NewExpression:         self._eval_new,
}

The single _evaluate(node, env) method performs the lookup:

def _evaluate(self, node, env):
    handler = self._dispatch.get(type(node))
    if handler is None:
        raise JSRuntimeError(f"No handler for {type(node).__name__}")
    return handler(node, env)

Naming convention:

  • _exec_* — statement handlers; return None (side-effects only)
  • _eval_* — expression handlers; return a runtime value

Adding a new node = one line in _dispatch + one handler method.

5.4 Scoping & hoisting implementation

Each _exec_block opens a child environment:

def _exec_block(self, node, env):
    block_env = env.child()
    for stmt in node.body:
        self._evaluate(stmt, block_env)

var declarations use env.declare_var() which walks up to the nearest is_function_scope=True environment:

def declare_var(self, name, value):
    env = self
    while env.parent is not None and not env.is_function_scope:
        env = env.parent
    env.bindings[name] = _Binding(value=value, kind="var")

The global Environment is created with is_function_scope=True, so var declarations in the top-level always land there.

5.5 Function calls, closures & this

When a FunctionDeclaration or FunctionExpression is evaluated, the interpreter captures the current env as closure:

fn = JSFunction(name=..., params=..., body=..., closure=env)

JSFunction.call() creates a fresh frame whose parent is the closure (not the call site), giving correct lexical scoping:

frame = Environment(parent=self.closure)
frame.is_function_scope = True
for i, p in enumerate(self.params):
    frame.declare(p, args[i] if i < len(args) else UNDEFINED, kind="let")

Named function expressions get an extra intermediate scope so the function can refer to itself by name without leaking into the outer scope:

if node.name:
    inner = env.child()
    fn = JSFunction(name=node.name, ..., closure=inner)
    inner.declare(node.name, fn, kind="const")

Arrow functions / lambdas with expression bodies are wrapped in a synthetic Block([ReturnStatement(body)]) so JSFunction.call sees a uniform shape.

this binding: _eval_call detects method-style calls (obj.method()) and calls fn.bind(receiver) which sets bound_this. Inside the function frame, this is declared as a const. For host-defined methods, any object implementing .bind(instance) + .call(interp, args) participates in the same protocol.

Loop-depth across function boundaries: JSFunction.call() saves interp._inside_loop, resets it to 0 for the function body, and restores it in a finally block. This ensures break inside a function (even one defined inside a loop) is correctly flagged.


6. Extending the parser

6.1 Adding a new keyword / token

Step 1: Add a TokenType member in tokens.py:

class TokenType(Enum):
    # ...existing members...
    KW_SWITCH = auto()

Step 2: Register the keyword in the KEYWORDS dict:

KEYWORDS = {
    # ...existing entries...
    "switch": TokenType.KW_SWITCH,
}

That's it — the lexer will now emit Token(type=KW_SWITCH, value="switch", ...) whenever it sees the word switch in source code.

For a new operator character (e.g., |): add the enum member, then add an entry to single_map (1-char) or two_map (2-char) inside _scan_token().

6.2 Adding a new statement

Step 1: Define an AST node in ast_nodes.py:

@dataclass
class SwitchStatement(Node):
    discriminant: Expression = None
    cases: List["SwitchCase"] = field(default_factory=list)

@dataclass
class SwitchCase(Node):
    test: Optional[Expression] = None   # None = default case
    consequent: List[Statement] = field(default_factory=list)

Step 2: Add the node to __all__ in ast_nodes.py.

Step 3: In parser.py, import the new node and add a dispatch branch in _parse_statement():

def _parse_statement(self):
    tok = self._peek()
    # ...existing branches...
    if tok.type is TokenType.KW_SWITCH:
        return self._parse_switch()
    # ...

Step 4: Write the parse method:

def _parse_switch(self) -> SwitchStatement:
    kw = self._expect(TokenType.KW_SWITCH)
    self._expect(TokenType.LPAREN)
    disc = self._parse_assignment()
    self._expect(TokenType.RPAREN)
    self._expect(TokenType.LBRACE)
    cases = []
    while not self._check(TokenType.RBRACE, TokenType.EOF):
        cases.append(self._parse_switch_case())
    self._expect(TokenType.RBRACE)
    return SwitchStatement(discriminant=disc, cases=cases,
                           line=kw.line, col=kw.column)

6.3 Adding a new expression or operator precedence level

To add a new operator between existing levels (e.g., bitwise OR | between logical AND and equality):

Step 1: Add PIPE = auto() to TokenType and "|" to the single-char map in _scan_token().

Step 2: Insert a new method and wire it into the chain. The chain is:

_parse_logical_and → calls → _parse_equality

Insert between them:

def _parse_logical_and(self) -> Node:
    return self._binary_left(self._parse_bitwise_or,    # ← changed target
                             {TokenType.AND: "&&"}, cls=LogicalOp)

def _parse_bitwise_or(self) -> Node:                    # ← new level
    return self._binary_left(self._parse_equality,
                             {TokenType.PIPE: "|"})

Step 3: Optionally add a BitwiseOp AST node if you want it distinct from BinaryOp, or reuse BinaryOp with op="|".

6.4 Worked example: do { } while (...)

Full walkthrough of adding a new statement from start to finish.

tokens.py:

class TokenType(Enum):
    # ...
    KW_DO = auto()

KEYWORDS = {
    # ...
    "do": TokenType.KW_DO,
}

ast_nodes.py:

@dataclass
class DoWhileStatement(Node):
    test: Expression = None
    body: Statement = None

Add "DoWhileStatement" to __all__.

parser.py — import and dispatch:

from .ast_nodes import ..., DoWhileStatement

def _parse_statement(self):
    tok = self._peek()
    # ...existing branches...
    if tok.type is TokenType.KW_DO:
        return self._parse_do_while()
    # ...

Parse method:

def _parse_do_while(self) -> DoWhileStatement:
    kw = self._expect(TokenType.KW_DO)
    body = self._parse_statement()
    self._expect(TokenType.KW_WHILE)
    self._expect(TokenType.LPAREN)
    test = self._parse_assignment()
    self._expect(TokenType.RPAREN)
    self._consume_optional_semi()
    return DoWhileStatement(test=test, body=body,
                            line=kw.line, col=kw.column)

Files touched: tokens.py (2 lines), ast_nodes.py (4 lines + 1 in __all__), parser.py (15 lines).


7. Extending the interpreter

7.1 Adding a handler for a new AST node

Step 1: The new AST node is imported via from jsparse import ast_nodes as A so A.DoWhileStatement is automatically available (no extra import needed after adding it to ast_nodes.py).

Step 2: Add an entry in the _dispatch dict inside __init__:

A.DoWhileStatement: self._exec_do_while,

Step 3: Implement the handler. Convention:

  • Name: _exec_* for statements, _eval_* for expressions.
  • Signature: (self, node: A.TheNode, env: Environment) -> Any.
  • Statements return None; expressions return the computed value.
  • Use self._evaluate(child_node, env) to recurse into children.
def _exec_do_while(self, node: A.DoWhileStatement,
                   env: Environment) -> None:
    self._inside_loop += 1
    try:
        while True:
            try:
                self._evaluate(node.body, env)
            except ContinueSignal:
                pass
            except BreakSignal:
                break
            if not _is_truthy(self._evaluate(node.test, env)):
                break
    finally:
        self._inside_loop -= 1

Files touched: interpreter.py only (1 line in _dispatch, ~15 lines for the method).

7.2 Adding a new control-flow construct

If your new construct needs non-local control flow (like throw / catch):

Step 1: Define a signal in jsexec/errors.py:

class ThrowSignal(_ControlSignal):
    """Carries the thrown value."""
    def __init__(self, value: Any = None):
        self.value = value

Step 2: Raise it in the handler:

def _exec_throw(self, node, env):
    raise ThrowSignal(self._evaluate(node.argument, env))

Step 3: Catch it in the owning construct:

def _exec_try(self, node, env):
    try:
        self._evaluate(node.block, env)
    except ThrowSignal as e:
        catch_env = env.child()
        catch_env.declare(node.catch_param, e.value, kind="let")
        self._evaluate(node.catch_block, catch_env)
    finally:
        if node.finally_block is not None:
            self._evaluate(node.finally_block, env)

7.3 Worked example: switch / case

Assuming the parser produces SwitchStatement and SwitchCase nodes from section 6.2:

interpreter.py:

# In __init__:
A.SwitchStatement: self._exec_switch,

def _exec_switch(self, node: A.SwitchStatement, env: Environment) -> None:
    disc = self._evaluate(node.discriminant, env)
    matched = False
    self._inside_loop += 1       # allow break inside switch
    try:
        for case in node.cases:
            if not matched:
                if case.test is None:    # default case
                    matched = True
                elif _equals(disc, self._evaluate(case.test, env),
                             strict=True):
                    matched = True
            if matched:
                try:
                    for stmt in case.consequent:
                        self._evaluate(stmt, env)
                except BreakSignal:
                    return           # break exits the switch
    finally:
        self._inside_loop -= 1

Files touched: interpreter.py only (1 dispatch entry + ~20 lines).


8. Supported language features

Statements

Feature Syntax example
Variable declaration var x = 1; / let y = 2; / const z = 3;
Multiple declarators let a = 1, b = 2;
Function declaration function foo(a, b) { return a + b; }
Return return expr; or bare return;
If / else if (x > 0) { ... } else { ... }
While loop while (cond) { ... }
Do-while loop do { ... } while (cond);
For loop (C-style) for (let i = 0; i < 10; i = i + 1) { ... }
For-in loop for (let k in obj) { ... } (yields property keys)
For-of loop for (let v of arr) { ... } (yields values; works on arrays, strings, dicts)
Switch / case switch (x) { case 1: ...; default: ... } (C-style fall-through)
Try / catch / finally / throw try { ... } catch (e) { ... } finally { ... } / throw expr; (catches JSRuntimeError too)
Break / continue break; / continue; (inside loops only)
Block { let x = 1; ... }

Expressions

Feature Syntax example
Numeric literals 42, 3.14, 1e5, 0xFF
String literals "hello", 'world', "tab:\there"
Boolean / null / undefined true, false, null, undefined
Identifiers x, myVar, $, _foo
Arithmetic +, -, *, /, %
String concatenation "hello" + " " + "world"
Comparison <, >, <=, >=
Equality ==, !=, ===, !==
Logical &&, ||, !
Assignment =, +=, -=, *=, /=
Update ++x, x++, --x, x--
Type / reflection typeof x, x instanceof Cls, delete obj.prop, void expr
Ternary cond ? a : b
Member access obj.prop, arr[i]
Function call fn(a, b)
new new Point(3, 4)
Array literal [1, 2, 3] (trailing comma OK)
Object literal { name: "x", value: 1 } (trailing comma OK)
Regex literal /abc/i, /^[a-z]+$/g (with .test / .exec)
Template string `hello, ${name}!` (escapes, newlines, nesting OK)
Function expression function (x) { return x; }
Named function expression function fact(n) { ... fact(n-1); }
Arrow function (x) => x + 1, (a, b) => { return a + b; }
Lambda (parenthesized + block) lambda (x) { return x * x; }
Lambda (parenthesized + expr) lambda (x) => x * x
Lambda (Python-style) lambda a, b: a + b
Lambda (zero-arg) lambda: 42
Comments // line, /* block */

9. Edge cases & known limitations

Handled edge cases

  • Unterminated strings / block commentsLexError with line/column.
  • === / !== — 3-char tokens parsed before 2-char and 1-char.
  • Right-associative assignment: a = b = c parses as a = (b = c).
  • Right-associative ternary: a ? b ? c : d : e parses correctly.
  • Optional semicolons: never required; consumed when present.
  • Trailing commas in [1, 2,] and {a: 1,} are accepted.
  • new Foo(args) vs new Foo: both work; the no-args form produces NewExpression(callee=Foo, arguments=[]).
  • Named function expressions self-bind: the name is visible inside the body but does not leak to the outer scope.
  • break / continue outside loopsJSRuntimeError (not a silent bug). Loop-depth tracking is saved/restored across function calls.
  • const reassignmentJSRuntimeError.
  • Division by zeroJSRuntimeError.
  • String + number coercion: "x" + 1"x1" (JS-like).
  • null == undefined is true; null === undefined is false.
  • Implicit globals: assigning to an undeclared name creates a var in the global scope (sloppy-mode behavior).

Known limitations

  • No destructuring (let {a, b} = obj;).
  • No spread / rest (...args).
  • No class keyword — classes are host-provided via register_class.
  • No import / export.
  • No bitwise / shift operators (&, |, ^, <<, >>).
  • No strict ASI — semicolons are always optional everywhere.
  • No prototype chain — only single-level class → instance. Consequently, instanceof checks JSObject.cls identity rather than walking a chain.

10. Future extensions roadmap

Each item below is designed to be a clean, localized addition following the patterns described in sections 6 and 7:

Feature Parser work Interpreter work
switch / case (implemented — see §7.3 and demo_switch.py) (implemented)
try / catch / throw (implemented — see §7.2 and demo_trycatch.py) (implemented; also catches JSRuntimeError)
Template strings (implemented — see demo_template.py) (implemented; uses host _to_string)
Regex literals (implemented — see §7.x and demo_regex.py) (implemented; wraps Python re module)
for ... of / for ... in (implemented — see demo_forinof.py) (implemented)
Destructuring VariableDeclarator.name → pattern node Extend _exec_variable_declaration
Spread / rest Spread node in args, params, array literals Unpack in relevant eval methods
class keyword ClassDeclaration node, method definitions Convert to JSClass at runtime
import / export ImportDeclaration etc. Module loader subsystem
Bitwise / shift Token types + one precedence level each Cases in _eval_binary
Strict mode ASI Track newlines on tokens; real insertion rules No interpreter change
NodeVisitor / NodeTransformer (no parser change) Generic visitor using iter_child_nodes

Design principles

  • Dataclasses everywhere — every AST node and runtime value is a @dataclass; equality, repr, and field iteration come for free.
  • One concern per file — tokens, AST, parser, errors, pretty-printer, environment, values, interpreter are all separate modules.
  • Dispatch tables over giant if/elif chains — both the parser (statement dispatch) and interpreter (node dispatch) use lookups.
  • Errors always carry positionLexError, ParseError, and JSRuntimeError all include line and column.
  • Extending = small, local changes — add a token, an AST node, a parse method, a dispatch entry, and a handler. No file requires changes to more than a few lines.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dcnr_jsbox-1.0.0.tar.gz (78.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dcnr_jsbox-1.0.0-py3-none-any.whl (57.0 kB view details)

Uploaded Python 3

File details

Details for the file dcnr_jsbox-1.0.0.tar.gz.

File metadata

  • Download URL: dcnr_jsbox-1.0.0.tar.gz
  • Upload date:
  • Size: 78.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for dcnr_jsbox-1.0.0.tar.gz
Algorithm Hash digest
SHA256 d824df6deee8ae344ecf9789429e9e5cc531bd75ba09ffac56bdfc3d8cfaf243
MD5 10d674351b6a2cd12c12dfb1a3457243
BLAKE2b-256 ca1b0e5dce58cd15861993234f6dae47f2656d456dc8e84ad49e8920f2c4a762

See more details on using hashes here.

File details

Details for the file dcnr_jsbox-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: dcnr_jsbox-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 57.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for dcnr_jsbox-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a312ccef14dc196f9185159227ffa8739a8ebbbd5b24029404a003f1617e3e33
MD5 11d292959397cd14ca66965e37101400
BLAKE2b-256 992d661ae7a88d28b28cc067f34802995f08dfea23b20ed7b594bd122ca517f0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page