Minisandbox for Javascript interpreter.
Project description
dcnr-javascriptbox
A self-contained, standard-library-only Python toolkit that tokenizes,
parses, and executes a JavaScript-like language. The AST design mirrors
Python's ast module; the
interpreter is embeddable and extensible with host-provided functions,
objects, and classes.
Table of contents
- Project structure
- Quick start
- Parser — programmer's reference
- 3.1 Tokenizer
- 3.2 AST nodes
- 3.3 Parser
- 3.4 Pretty-printer
- 3.5 Error handling
- 3.6 Grammar reference
- Interpreter — programmer's reference
- Implementation deep-dive
- Extending the parser
- Extending the interpreter
- Supported language features
- Edge cases & known limitations
- Future extensions roadmap
1. Project structure
dcnr-jsbox/
│
├── jsparse/ # Parser package
│ ├── __init__.py # Public API: tokenize, parse, dump
│ ├── errors.py # LexError, ParseError (with caret messages)
│ ├── tokens.py # TokenType enum, Token dataclass, tokenize()
│ ├── ast_nodes.py # AST dataclasses + iter_child_nodes
│ ├── parser.py # Recursive-descent Parser
│ └── pprint.py # dump() — ast.dump-style pretty-printer
│
├── jsexec/ # Interpreter package
│ ├── __init__.py # Public API: Interpreter, value types
│ ├── errors.py # JSRuntimeError + control-flow signals
│ ├── values.py # UNDEFINED, JSFunction, JSNativeFunction,
│ │ # JSClass, JSObject
│ ├── environment.py # Environment (lexical scope chain)
│ └── interpreter.py # Tree-walking Interpreter (dispatch table)
│
└── demos
├── demo.py # Parser demo (tokenize + parse + dump)
├── demo_exec.py # Interpreter demo (full extension API)
├── demo_functions.py # function + lambda forms verification
├── demo_operators.py # typeof / instanceof / delete / void verification
├── demo_dowhile.py # do { } while verification
├── demo_switch.py # switch / case verification
├── demo_forinof.py # for ... in / for ... of verification
├── demo_trycatch.py # try / catch / finally / throw verification
├── demo_regex.py # /pattern/flags regex literal verification
├── demo_template.py # `template ${expr}` string verification
└── README.md # This file
2. Quick start
from jsparse import tokenize, parse, dump
from jsexec import Interpreter, UNDEFINED
# ── Parse ──
source = 'let x = 2 + 3; print(x);'
tokens = tokenize(source) # list[Token], ending with EOF
program = parse(source) # ast_nodes.Program
print(dump(program)) # indented multi-line AST
# ── Execute ──
interp = Interpreter()
interp.register_function("print", lambda *a: print(*a))
interp.run(program) # prints: 5
Run the bundled demos:
python demo.py REM parser demo
python demo_exec.py REM interpreter demo
python demo_functions.py REM function/lambda verification
python demo_operators.py REM typeof / instanceof / delete / void
python demo_dowhile.py REM do { } while verification
python demo_switch.py REM switch / case verification
python demo_forinof.py REM for ... in / for ... of verification
python demo_trycatch.py REM try / catch / finally / throw verification
python demo_regex.py REM regex literal verification
python demo_template.py REM template string verification
3. Parser — programmer's reference
3.1 Tokenizer (jsparse.tokens)
from jsparse import tokenize, Token, TokenType
tokenize(source: str) -> list[Token]
Splits source into a flat list of Token objects terminated by a final
Token(type=TokenType.EOF).
Token fields
| Field | Type | Description |
|---|---|---|
type |
TokenType |
Enum member identifying the token category |
value |
object |
Parsed value: int/float for NUMBER, str for STRING/IDENT, True/False/None for booleans/null, keyword text otherwise |
line |
int |
1-based line number where the token starts |
column |
int |
1-based column number where the token starts |
start |
int |
Character offset into source where the token starts |
end |
int |
Character offset just past the last char (half-open) |
TokenType members
Literals: NUMBER, STRING, IDENT
Keywords: KW_VAR, KW_LET, KW_CONST, KW_FUNCTION, KW_RETURN,
KW_IF, KW_ELSE, KW_WHILE, KW_FOR, KW_TRUE, KW_FALSE,
KW_NULL, KW_UNDEFINED, KW_NEW, KW_BREAK, KW_CONTINUE,
KW_LAMBDA, KW_TYPEOF, KW_INSTANCEOF, KW_DELETE, KW_VOID,
KW_DO, KW_SWITCH, KW_CASE, KW_DEFAULT, KW_IN, KW_OF,
KW_TRY, KW_CATCH, KW_FINALLY, KW_THROW
Punctuation: LPAREN, RPAREN, LBRACE, RBRACE, LBRACKET,
RBRACKET, COMMA, SEMI, COLON, DOT, QUESTION, ARROW (=>)
Operators: ASSIGN (=), PLUS, MINUS, STAR, SLASH,
PERCENT, EQ (==), NEQ (!=), SEQ (===), SNEQ (!==),
LT, GT, LTE, GTE, AND (&&), OR (||), NOT (!),
PLUSPLUS, MINUSMINUS, PLUSEQ, MINUSEQ, STAREQ, SLASHEQ
Misc: EOF
Regex: REGEX — emitted only in expression contexts (when / cannot
be the division operator). Its .value is a 2-tuple (pattern, flags),
both strings. See §8 for the disambiguation rules.
Template: TEMPLATE — emitted for `text${expr}more` literals.
Its .value is a 2-tuple (quasis, expr_sources) where quasis is the
list of literal-text chunks (length N+1) and expr_sources is a list of
(source, line, col) triples (length N) for each ${...} placeholder.
The parser re-runs itself on each placeholder source to produce the
embedded expression's AST.
Keyword map
The KEYWORDS dict in tokens.py maps source-level keyword strings to
their TokenType. To add a keyword, insert one entry here and one enum
member above.
KEYWORDS = {
"var": TokenType.KW_VAR, "let": TokenType.KW_LET,
"const": TokenType.KW_CONST, "function": TokenType.KW_FUNCTION,
"return": TokenType.KW_RETURN, "if": TokenType.KW_IF,
"else": TokenType.KW_ELSE, "while": TokenType.KW_WHILE,
"for": TokenType.KW_FOR, "true": TokenType.KW_TRUE,
"false": TokenType.KW_FALSE, "null": TokenType.KW_NULL,
"undefined": TokenType.KW_UNDEFINED, "new": TokenType.KW_NEW,
"break": TokenType.KW_BREAK, "continue": TokenType.KW_CONTINUE,
"lambda": TokenType.KW_LAMBDA,
"typeof": TokenType.KW_TYPEOF, "instanceof": TokenType.KW_INSTANCEOF,
"delete": TokenType.KW_DELETE, "void": TokenType.KW_VOID,
}
Lexer features
| Feature | Details |
|---|---|
| Comments | // line comments; /* ... */ block comments (nesting not supported); unterminated block comments raise LexError |
| Identifiers | [a-zA-Z_$][a-zA-Z0-9_$]*; checked against KEYWORDS |
| Numbers | Decimal integers, floats (1.5), exponents (1e5, 2.5E-3), hex (0xFF) |
| Strings | Single or double quoted; escape sequences: `\n \t \r \0 \ ' " `` |
| Operators | Longest-match first: === / !== (3-char), then two-char combos (==, !=, <=, >=, &&, ` |
3.2 AST nodes (jsparse.ast_nodes)
Every node is a @dataclass inheriting from Node. Position info is
stored in kw-only fields line and col (both default to 0). The
iter_fields helper skips these so pretty-printing focuses on structure.
Base class
@dataclass
class Node:
line: int = field(default=0, kw_only=True)
col: int = field(default=0, kw_only=True)
Statement nodes
| Node | Fields | Description |
|---|---|---|
Program |
body: list[Statement] |
Root node |
Block |
body: list[Statement] |
{ ... } block |
ExpressionStatement |
expression: Expression |
Expression used as statement |
VariableDeclaration |
kind: str, declarations: list[VariableDeclarator] |
var/let/const |
VariableDeclarator |
name: str, init: Expression? |
Single binding |
FunctionDeclaration |
name: str, params: list[str], body: Block |
function name(...) { ... } |
ReturnStatement |
argument: Expression? |
return expr; |
IfStatement |
test, consequent, alternate? |
if (...) ... else ... |
WhileStatement |
test, body |
while (...) ... |
ForStatement |
init?, test?, update?, body |
C-style for |
BreakStatement |
(no fields) | break; |
ContinueStatement |
(no fields) | continue; |
Expression nodes
| Node | Fields | Description |
|---|---|---|
Identifier |
name: str |
Variable reference |
Literal |
value: Any, raw: str |
Number, string, bool, null, undefined |
RegexLiteral |
pattern: str, flags: str |
/abc/i, /^[a-z]+$/g |
TemplateLiteral |
quasis: list[str], expressions: list[Expression] |
`hi, ${name}!` |
ArrayExpression |
elements: list[Expression] |
[1, 2, 3] |
ObjectExpression |
properties: list[ObjectProperty] |
{ key: value } |
ObjectProperty |
key: Identifier|Literal, value: Expression |
Single property |
FunctionExpression |
name: str?, params: list[str], body: Block |
function (x) { ... } |
ArrowFunction |
params: list[str], body: Expr|Block, expression: bool |
(x) => x + 1 or lambda x: x + 1 |
UnaryOp |
op: str, operand, prefix: bool |
!x, -x, +x |
UpdateOp |
op: str, operand, prefix: bool |
++x, x-- |
BinaryOp |
op: str, left, right |
a + b, a * b, a < b, etc. |
LogicalOp |
op: str, left, right |
a && b, a || b |
AssignmentExpression |
op: str, target, value |
x = 1, x += 2 |
ConditionalExpression |
test, consequent, alternate |
a ? b : c |
MemberAccess |
object, property, computed: bool |
obj.x or obj[expr] |
CallExpression |
callee, arguments: list[Expression] |
fn(a, b) |
NewExpression |
callee, arguments: list[Expression] |
new Cls(a, b) |
Tree-walking helpers
from jsparse.ast_nodes import iter_fields, iter_child_nodes
for name, value in iter_fields(node): # yields (field_name, value), skips line/col
...
for child in iter_child_nodes(node): # yields direct Node children (recursive visitor friendly)
...
3.3 Parser (jsparse.parser)
from jsparse import parse
from jsparse.parser import Parser
program = parse(source) # convenience wrapper
# or
parser = Parser(source)
program = parser.parse() # returns Program node
The Parser class is a recursive-descent parser with precedence climbing
for expressions. It consumes the token list produced by tokenize() and
emits AST nodes.
Key internal methods
| Method | Returns | Role |
|---|---|---|
parse() |
Program |
Entry point: statement loop until EOF |
_parse_statement() |
Node |
Statement-level dispatch by token type |
_parse_var_declaration() |
VariableDeclaration |
var / let / const with declarators |
_parse_function_declaration() |
FunctionDeclaration |
function name(...) { ... } |
_parse_return() |
ReturnStatement |
return expr?; |
_parse_if() |
IfStatement |
if (...) ... else ... |
_parse_while() |
WhileStatement |
while (...) ... |
_parse_for() |
ForStatement |
C-style for (...; ...; ...) ... |
_parse_block() |
Block |
{ statement* } |
_parse_expression_statement() |
ExpressionStatement |
Wraps any expression as a statement |
_parse_assignment() |
Node |
Top of the expression precedence chain |
_try_parse_arrow() |
ArrowFunction? |
Lookahead for => arrow syntax |
_parse_conditional() |
Node |
Ternary ? : |
_parse_logical_or() |
Node |
|| |
_parse_logical_and() |
Node |
&& |
_parse_equality() |
Node |
==, !=, ===, !== |
_parse_relational() |
Node |
<, >, <=, >= |
_parse_additive() |
Node |
+, - |
_parse_multiplicative() |
Node |
*, /, % |
_parse_unary() |
Node |
!, -, +, ++, -- (prefix) |
_parse_postfix() |
Node |
++, -- (postfix) |
_parse_call() |
Node |
(), .prop, [expr] chains |
_parse_primary() |
Node |
Literals, identifiers, grouping, new, lambda, etc. |
_parse_lambda() |
ArrowFunction |
All lambda surface forms |
_parse_function_expression() |
FunctionExpression |
function name?(...) { ... } |
_parse_new() |
NewExpression |
new callee(args) |
_parse_array_literal() |
ArrayExpression |
[elements] |
_parse_object_literal() |
ObjectExpression |
{ key: value, ... } |
_binary_left(sub, ops) |
Node |
Generic left-assoc binary chain helper |
Token helpers
| Helper | Description |
|---|---|
_peek(off=0) |
Look at a token without consuming |
_check(*types) |
Is the next token one of these types? |
_match(*types) |
Consume and return the token if it matches, else None |
_expect(type_, what="") |
Consume or raise ParseError |
_consume_optional_semi() |
Eat a ; if present (permissive ASI) |
Semicolons
Semicolons are optional. The parser calls _consume_optional_semi()
after statements — it eats a ; if present, otherwise continues. This is
a permissive approximation of JavaScript's ASI.
3.4 Pretty-printer (jsparse.pprint)
from jsparse import dump
text = dump(node) # default: 2-space indent, no position
text = dump(node, indent=4, include_position=True) # @line:col on every node
Output looks like ast.dump():
Program(
body=[
VariableDeclaration(
kind='let',
declarations=[
VariableDeclarator(
name='x',
init=Literal(
value=42,
raw='42'
)
)
]
)
]
)
3.5 Error handling (jsparse.errors)
| Exception | Raised by | When |
|---|---|---|
JSParseError |
(base) | Never directly; common base for below |
LexError |
Tokenizer | Bad character, unterminated string/comment |
ParseError |
Parser | Unexpected token, missing expected token |
All carry message, line, column, and optional source. Their
__str__ renders a caret-style diagnostic:
ParseError: Expected RPAREN, got SEMI (';') at line 3, column 12
foo(bar;
^
3.6 Grammar reference
program := statement* EOF
statement := varDecl | funcDecl | returnStmt
| ifStmt | whileStmt | forStmt
| breakStmt | continueStmt
| block | exprStmt
varDecl := ('var'|'let'|'const') declarator (',' declarator)* ';'?
declarator := IDENT ('=' assignment)?
funcDecl := 'function' IDENT '(' params? ')' block
returnStmt := 'return' assignment? ';'?
ifStmt := 'if' '(' assignment ')' statement ('else' statement)?
whileStmt := 'while' '(' assignment ')' statement
forStmt := 'for' '(' (varDecl | exprStmt | ';')
assignment? ';'
assignment? ')' statement
breakStmt := 'break' ';'?
continueStmt := 'continue' ';'?
block := '{' statement* '}'
exprStmt := assignment ';'?
assignment := conditional ( ('='|'+='|'-='|'*='|'/=') assignment )?
conditional := logicalOr ( '?' assignment ':' assignment )?
logicalOr := logicalAnd ('||' logicalAnd)*
logicalAnd := equality ('&&' equality)*
equality := relational (('=='|'!='|'==='|'!==') relational)*
relational := additive (('<'|'>'|'<='|'>='|'instanceof') additive)*
additive := multiplicative (('+'|'-') multiplicative)*
multiplicative := unary (('*'|'/'|'%') unary)*
unary := ('!'|'-'|'+'|'++'|'--'|'typeof'|'void'|'delete') unary
| postfix
postfix := call ('++' | '--')?
call := primary ( '(' args? ')' | '.' IDENT | '[' assignment ']' )*
primary := NUMBER | STRING | 'true' | 'false' | 'null' | 'undefined'
| IDENT | '(' assignment ')' | arrayLit | objectLit
| funcExpr | lambdaExpr | 'new' call | arrowFn
arrayLit := '[' (assignment (',' assignment)* ','?)? ']'
objectLit := '{' (prop (',' prop)* ','?)? '}'
prop := (IDENT | STRING) ':' assignment
funcExpr := 'function' IDENT? '(' params? ')' block
lambdaExpr := 'lambda' '(' params? ')' block -- block body
| 'lambda' '(' params? ')' '=>'? assignment -- expression body
| 'lambda' (IDENT (',' IDENT)*)? ':' assignment -- Python-style
arrowFn := IDENT '=>' (assignment | block)
| '(' params? ')' '=>' (assignment | block)
Operator precedence (lowest → highest):
| Level | Operators / construct | Associativity |
|---|---|---|
| 1 | =, +=, -=, *=, /= |
Right |
| 2 | ? : |
Right |
| 3 | || |
Left |
| 4 | && |
Left |
| 5 | ==, !=, ===, !== |
Left |
| 6 | <, >, <=, >=, instanceof |
Left |
| 7 | +, - |
Left |
| 8 | *, /, % |
Left |
| 9 | !, -, +, ++, --, typeof, void, delete (pre) |
Right (unary) |
| 10 | ++, -- (post) |
— |
| 11 | (), ., [] |
Left (call) |
4. Interpreter — programmer's reference
4.1 Interpreter class
from jsexec import Interpreter
interp = Interpreter()
Public methods
| Method | Description |
|---|---|
run(program: Program) -> Any |
Execute a parsed program. Returns value of last expression statement. |
register_function(name, callable) -> JSNativeFunction |
Bind a Python callable as a const global. |
register_object(name, obj) -> Any |
Bind a value as a const global. Dicts auto-wrapped as JSObject. |
register_class(cls: JSClass) -> JSClass |
Register a JSClass under cls.name as a const global. |
Properties
| Property | Type | Description |
|---|---|---|
globals |
Environment |
Top-level scope; all registered values live here. |
4.2 Environment & scoping
from jsexec import Environment
Environment implements a lexical scope chain with parent pointers.
Methods
| Method | Description |
|---|---|
declare(name, value, kind) |
Bind name in the current scope. kind is "let", "const", or "var". Rejects duplicate let/const declarations. |
declare_var(name, value) |
var hoisting: walks up to the nearest is_function_scope=True frame, binds there. |
get(name) -> Any |
Walk the scope chain upward. Raises JSRuntimeError if not found. |
has(name) -> bool |
Walk the scope chain; returns whether the name exists. |
assign(name, value) -> Any |
Walk chain to find the binding, update it. Raises on const reassignment or if name doesn't exist. |
child(function_scope=False) |
Create a new child Environment linked to this one. |
Binding kinds
| Kind | Block-scoped? | Reassignable? | Hoisted? |
|---|---|---|---|
let |
Yes | Yes | No (lives in declaring block) |
const |
Yes | No | No |
var |
No | Yes | Yes (to nearest function scope) |
When does a new Environment open?
| Situation | How |
|---|---|
Every { } block |
env.child() |
for statement's init |
env.child() |
| Every function call | env.child(function_scope=True) — this is the var hoisting boundary |
4.3 Runtime value types
All values live in jsexec.values.
| Type | Purpose |
|---|---|
UNDEFINED |
Singleton sentinel (_Undefined()), falsy, repr → "undefined". Distinct from Python None which represents JS null. |
JSNativeFunction |
Wraps a host Python callable. Fields: name, fn. Protocol: call(interp, args) -> Any. |
JSFunction |
User-defined function. Fields: name, params, body (Block AST), closure (Environment), bound_this. |
JSObject |
Dict-backed object with optional class link. Fields: properties: dict, cls: JSClass?. Methods: get(name), set(name, value). |
JSClass |
Class definition. Fields: name, methods: dict, attributes: dict, init: callable?. Methods: instantiate(interp, args), lookup_method(name). |
Callable protocol. Anything with a .call(interp, args) method can be
invoked from JS code. Plain Python callables also work — the interpreter
falls back to fn(*args).
JSFunction.bind(this_obj) returns a copy of the function with
bound_this set. The interpreter calls this automatically on
obj.method() invocations so this is correctly bound.
JSClass.instantiate(interp, args) creates a JSObject with
cls=self, copies class attributes as initial properties, then calls
init (if set) with the instance and constructor arguments.
4.4 Control flow internals
Control flow is implemented with lightweight BaseException subclasses in
jsexec.errors. They inherit from BaseException (not Exception) so
user code's normal exception handling never catches them.
| Signal | Raised by | Caught by |
|---|---|---|
BreakSignal |
break statement |
for / while / do-while / switch |
ContinueSignal |
continue statement |
for / while / do-while loops |
ReturnSignal |
return statement |
JSFunction.call() |
ThrowSignal |
throw statement |
try block (or surfaces as JSRuntimeError at top level) |
break and continue outside a loop produce a clear
JSRuntimeError("'break' used outside of a loop", line, col) thanks to
an _inside_loop depth counter on the interpreter. Function calls
save/restore this counter so a break inside a function body defined
inside a loop is correctly flagged as invalid.
4.5 Truthiness rules
| Value | Truthy? |
|---|---|
false |
No |
null (Python None) |
No |
undefined (UNDEFINED) |
No |
0, 0.0 |
No |
"" (empty string) |
No |
| Everything else | Yes (including [], {}, "0") |
4.6 Member access & property protocol
obj.prop and obj[expr] resolve through _member_get():
obj type |
Behavior |
|---|---|
JSObject |
obj.get(key) → own properties, then class methods/attrs |
JSClass |
Class-level: static attributes first, then methods |
dict |
Python dict .get(key, UNDEFINED) |
list |
.length → len(); numeric index → element; out-of-range → UNDEFINED |
str |
.length → len(); numeric index → character |
| anything | Falls back to Python getattr(obj, key); callables are auto-wrapped in JSNativeFunction |
Assignment through _member_set():
obj type |
Behavior |
|---|---|
JSObject |
obj.set(key, value) |
dict |
obj[key] = value |
list |
Numeric index; auto-grows with UNDEFINED fill if past end |
| other | Raises JSRuntimeError |
4.7 Extension API
Registering a custom function
interp.register_function("print", lambda *args: print(*args))
interp.register_function("sqrt", lambda x: x ** 0.5)
The callable receives Python-native values: numbers are int/float,
strings are str, booleans are bool, null is None, undefined is
UNDEFINED, arrays are list, objects are JSObject.
Registering a custom object
# Dict → auto-wrapped as JSObject, so dot access works from JS:
interp.register_object("config", {"name": "app", "version": 42})
# Or pass a JSObject directly:
from jsexec import JSObject
interp.register_object("state", JSObject(properties={"count": 0}))
Registering a custom class
from jsexec import JSClass, JSNativeFunction, JSObject
def _init(instance: JSObject, x, y):
instance.set("x", x)
instance.set("y", y)
def _length(instance: JSObject):
return (instance.get("x") ** 2 + instance.get("y") ** 2) ** 0.5
def _scale(instance: JSObject, factor):
instance.set("x", instance.get("x") * factor)
instance.set("y", instance.get("y") * factor)
Point = JSClass(
name="Point",
attributes={"kind": "2D"}, # class-level attrs
init=JSNativeFunction("Point.init", _init), # constructor
methods={
"length": _wrap_method(_length), # instance methods
"scale": _wrap_method(_scale),
},
)
interp.register_class(Point)
From JS code:
let p = new Point(3, 4);
print(p.x, p.y); // 3 4
print(p.length()); // 5.0
p.scale(2);
print(p.x, p.y); // 6 8
print(Point.kind); // "2D"
Method-wrapping protocol: any object with .bind(instance) → copy and
.call(interp, args) can serve as a method. See _wrap_method() in
demo_exec.py for a minimal implementation:
class _BoundableMethod:
def __init__(self, fn, this=None):
self.fn = fn
self._this = this
self.name = fn.__name__
def bind(self, instance):
return _BoundableMethod(self.fn, this=instance)
def call(self, interp, args):
return self.fn(self._this, *args)
def _wrap_method(py_fn):
return _BoundableMethod(py_fn)
4.8 typeof / instanceof / delete / void
All four are parsed as standard operators (no special-cased syntax) and
implemented entirely inside the interpreter. They are real keywords —
typeof, instanceof, delete, and void are reserved and cannot be
used as identifiers.
typeof operand — unary, returns a string
| Operand kind | Result |
|---|---|
undefined |
"undefined" |
null |
"object" (JS quirk, preserved) |
Boolean (true / false) |
"boolean" |
Number (int / float) |
"number" |
| String | "string" |
JSFunction, JSNativeFunction, JSClass, any callable |
"function" |
JSObject, dict, list, anything else |
"object" |
Special rule: typeof <undeclaredIdent> returns "undefined" instead
of raising — this matches JavaScript and is a common feature-detection
idiom. Only direct identifier operands get this treatment; typeof undeclaredObj.prop still raises because the .prop access is evaluated.
value instanceof cls — binary, returns a boolean
Sits at the relational precedence level (same as <, >, <=, >=).
cls argument |
Behavior |
|---|---|
JSClass |
True iff value is a JSObject with cls set to exactly that class |
Python type |
Falls back to isinstance(value, cls) |
| Anything else (number, function, etc.) | False |
Without a prototype chain, this implementation does not currently
recognize value instanceof someJSFunction — host-defined JSClass is
the canonical class facility and what instanceof reasons about.
delete target — unary, returns a boolean
| Target form | Behavior |
|---|---|
obj.prop / obj[expr] on JSObject |
Removes property; returns True |
obj.prop / obj[expr] on dict |
dict.pop(key, None); returns True |
arr[i] on a Python list |
Sets arr[i] = UNDEFINED; returns True |
Plain identifier (delete x) |
No-op; returns False (lexical bindings are not removable) |
| Other host containers | No-op; returns False |
Deleting a non-existent property is not an error — it returns
True, mirroring JS.
void operand — unary, always returns undefined
Evaluates operand for its side effects, discards the result, and
returns the UNDEFINED singleton. The classic use is void 0 as a
guaranteed-undefined value, but any expression works.
let x = 5;
void (x = x + 10); // returns undefined; x is now 15
All four operators produce a UnaryOp AST node with op set to the
keyword ("typeof", "delete", or "void"); instanceof produces a
BinaryOp with op="instanceof".
5. Implementation deep-dive
5.1 Lexer implementation
The lexer in tokens.py is a hand-written scanner (no regex in the
hot path). It maintains:
i— current character index into the source stringline/col— 1-based position trackingtokens— output list being built
Main loop: _skip_ws_and_comments() → _scan_token() → repeat until
end of source, then append an EOF token.
_scan_token() dispatch by first character:
[a-zA-Z_$]→_scan_ident()— reads the full word, looks it up inKEYWORDSdict; storesTrue/False/Noneas Python values for boolean/null literals.[0-9]→_scan_number()— handles decimal integers, floats (1.5), exponents (1e5,2.5E-3), hex (0xFF). Storesintorfloatas the value.'or"→_scan_string()— handles escape sequences (\n,\t,\r,\0,\\,\',\",\`). RaisesLexErroron unterminated strings or stray newlines.- Otherwise: tries 3-char operators (
===,!==), then 2-char operators (from thetwo_mapdict), then single-char operators (from thesingle_mapdict). Falls through toLexError("Unexpected character")if nothing matches.
Comments: _skip_ws_and_comments() handles both // line comments
(consume until newline) and /* ... */ block comments (consume until
closing */, raising LexError if unterminated).
5.2 Parser implementation
The parser in parser.py is a recursive-descent parser with a
_binary_left() helper for left-associative binary operator chains:
def _binary_left(self, sub, ops, cls=BinaryOp):
node = sub()
while self._peek().type in ops:
tok = self.tokens[self.pos]; self.pos += 1
right = sub()
node = cls(op=ops[tok.type], left=node, right=right, ...)
return node
Expression parsing chains these calls from lowest to highest precedence:
_parse_assignment
└→ _try_parse_arrow (lookahead)
└→ _parse_conditional
└→ _parse_logical_or
└→ _parse_logical_and
└→ _parse_equality
└→ _parse_relational
└→ _parse_additive
└→ _parse_multiplicative
└→ _parse_unary
└→ _parse_postfix
└→ _parse_call
└→ _parse_primary
Statement parsing in _parse_statement() checks the current token's
type and dispatches:
def _parse_statement(self):
tok = self._peek()
if tok.type in (KW_VAR, KW_LET, KW_CONST): return self._parse_var_declaration()
if tok.type is KW_FUNCTION and peek(1) is IDENT: return self._parse_function_declaration()
if tok.type is KW_RETURN: return self._parse_return()
if tok.type is KW_IF: return self._parse_if()
if tok.type is KW_WHILE: return self._parse_while()
if tok.type is KW_FOR: return self._parse_for()
if tok.type is KW_BREAK: # consume, optional semi, return BreakStatement
if tok.type is KW_CONTINUE: # ditto for ContinueStatement
if tok.type is LBRACE: return self._parse_block()
return self._parse_expression_statement() # fallthrough
Arrow function detection uses two-token lookahead in
_try_parse_arrow(): it checks for IDENT '=>' or scans ahead through
balanced parentheses to confirm '(' ... ')' '=>' before committing to
the arrow parse path. If lookahead fails, it returns None and the
parser falls through to normal expression parsing without consuming any
tokens.
Lambda parsing is triggered by the KW_LAMBDA token in
_parse_primary(). The _parse_lambda() method accepts four surface
forms and produces an ArrowFunction node for all of them:
| Form | Example | Body type |
|---|---|---|
| Parenthesized + block | lambda (a, b) { return a+b; } |
Block |
| Parenthesized + expression | lambda (x) => x * x |
Expression |
| Python-style with params | lambda a, b: a + b |
Expression |
| Python-style zero-arg | lambda: 42 |
Expression |
5.3 Interpreter dispatch table
interpreter.py maps each AST class to a handler using a plain Python
dict:
self._dispatch = {
A.Program: self._exec_program,
A.Block: self._exec_block,
A.ExpressionStatement: self._exec_expression_statement,
A.VariableDeclaration: self._exec_variable_declaration,
A.FunctionDeclaration: self._exec_function_declaration,
A.ReturnStatement: self._exec_return,
A.IfStatement: self._exec_if,
A.WhileStatement: self._exec_while,
A.ForStatement: self._exec_for,
A.BreakStatement: self._exec_break,
A.ContinueStatement: self._exec_continue,
A.Literal: self._eval_literal,
A.Identifier: self._eval_identifier,
A.ArrayExpression: self._eval_array,
A.ObjectExpression: self._eval_object,
A.FunctionExpression: self._eval_function_expr,
A.ArrowFunction: self._eval_arrow,
A.UnaryOp: self._eval_unary,
A.UpdateOp: self._eval_update,
A.BinaryOp: self._eval_binary,
A.LogicalOp: self._eval_logical,
A.AssignmentExpression: self._eval_assignment,
A.ConditionalExpression: self._eval_conditional,
A.MemberAccess: self._eval_member,
A.CallExpression: self._eval_call,
A.NewExpression: self._eval_new,
}
The single _evaluate(node, env) method performs the lookup:
def _evaluate(self, node, env):
handler = self._dispatch.get(type(node))
if handler is None:
raise JSRuntimeError(f"No handler for {type(node).__name__}")
return handler(node, env)
Naming convention:
_exec_*— statement handlers; returnNone(side-effects only)_eval_*— expression handlers; return a runtime value
Adding a new node = one line in _dispatch + one handler method.
5.4 Scoping & hoisting implementation
Each _exec_block opens a child environment:
def _exec_block(self, node, env):
block_env = env.child()
for stmt in node.body:
self._evaluate(stmt, block_env)
var declarations use env.declare_var() which walks up to the nearest
is_function_scope=True environment:
def declare_var(self, name, value):
env = self
while env.parent is not None and not env.is_function_scope:
env = env.parent
env.bindings[name] = _Binding(value=value, kind="var")
The global Environment is created with is_function_scope=True, so var
declarations in the top-level always land there.
5.5 Function calls, closures & this
When a FunctionDeclaration or FunctionExpression is evaluated, the
interpreter captures the current env as closure:
fn = JSFunction(name=..., params=..., body=..., closure=env)
JSFunction.call() creates a fresh frame whose parent is the closure
(not the call site), giving correct lexical scoping:
frame = Environment(parent=self.closure)
frame.is_function_scope = True
for i, p in enumerate(self.params):
frame.declare(p, args[i] if i < len(args) else UNDEFINED, kind="let")
Named function expressions get an extra intermediate scope so the function can refer to itself by name without leaking into the outer scope:
if node.name:
inner = env.child()
fn = JSFunction(name=node.name, ..., closure=inner)
inner.declare(node.name, fn, kind="const")
Arrow functions / lambdas with expression bodies are wrapped in a
synthetic Block([ReturnStatement(body)]) so JSFunction.call sees a
uniform shape.
this binding: _eval_call detects method-style calls
(obj.method()) and calls fn.bind(receiver) which sets bound_this.
Inside the function frame, this is declared as a const. For
host-defined methods, any object implementing .bind(instance) +
.call(interp, args) participates in the same protocol.
Loop-depth across function boundaries: JSFunction.call() saves
interp._inside_loop, resets it to 0 for the function body, and
restores it in a finally block. This ensures break inside a function
(even one defined inside a loop) is correctly flagged.
6. Extending the parser
6.1 Adding a new keyword / token
Step 1: Add a TokenType member in tokens.py:
class TokenType(Enum):
# ...existing members...
KW_SWITCH = auto()
Step 2: Register the keyword in the KEYWORDS dict:
KEYWORDS = {
# ...existing entries...
"switch": TokenType.KW_SWITCH,
}
That's it — the lexer will now emit Token(type=KW_SWITCH, value="switch", ...)
whenever it sees the word switch in source code.
For a new operator character (e.g., |): add the enum member, then
add an entry to single_map (1-char) or two_map (2-char) inside
_scan_token().
6.2 Adding a new statement
Step 1: Define an AST node in ast_nodes.py:
@dataclass
class SwitchStatement(Node):
discriminant: Expression = None
cases: List["SwitchCase"] = field(default_factory=list)
@dataclass
class SwitchCase(Node):
test: Optional[Expression] = None # None = default case
consequent: List[Statement] = field(default_factory=list)
Step 2: Add the node to __all__ in ast_nodes.py.
Step 3: In parser.py, import the new node and add a dispatch branch
in _parse_statement():
def _parse_statement(self):
tok = self._peek()
# ...existing branches...
if tok.type is TokenType.KW_SWITCH:
return self._parse_switch()
# ...
Step 4: Write the parse method:
def _parse_switch(self) -> SwitchStatement:
kw = self._expect(TokenType.KW_SWITCH)
self._expect(TokenType.LPAREN)
disc = self._parse_assignment()
self._expect(TokenType.RPAREN)
self._expect(TokenType.LBRACE)
cases = []
while not self._check(TokenType.RBRACE, TokenType.EOF):
cases.append(self._parse_switch_case())
self._expect(TokenType.RBRACE)
return SwitchStatement(discriminant=disc, cases=cases,
line=kw.line, col=kw.column)
6.3 Adding a new expression or operator precedence level
To add a new operator between existing levels (e.g., bitwise OR |
between logical AND and equality):
Step 1: Add PIPE = auto() to TokenType and "|" to the
single-char map in _scan_token().
Step 2: Insert a new method and wire it into the chain. The chain is:
_parse_logical_and → calls → _parse_equality
Insert between them:
def _parse_logical_and(self) -> Node:
return self._binary_left(self._parse_bitwise_or, # ← changed target
{TokenType.AND: "&&"}, cls=LogicalOp)
def _parse_bitwise_or(self) -> Node: # ← new level
return self._binary_left(self._parse_equality,
{TokenType.PIPE: "|"})
Step 3: Optionally add a BitwiseOp AST node if you want it distinct
from BinaryOp, or reuse BinaryOp with op="|".
6.4 Worked example: do { } while (...)
Full walkthrough of adding a new statement from start to finish.
tokens.py:
class TokenType(Enum):
# ...
KW_DO = auto()
KEYWORDS = {
# ...
"do": TokenType.KW_DO,
}
ast_nodes.py:
@dataclass
class DoWhileStatement(Node):
test: Expression = None
body: Statement = None
Add "DoWhileStatement" to __all__.
parser.py — import and dispatch:
from .ast_nodes import ..., DoWhileStatement
def _parse_statement(self):
tok = self._peek()
# ...existing branches...
if tok.type is TokenType.KW_DO:
return self._parse_do_while()
# ...
Parse method:
def _parse_do_while(self) -> DoWhileStatement:
kw = self._expect(TokenType.KW_DO)
body = self._parse_statement()
self._expect(TokenType.KW_WHILE)
self._expect(TokenType.LPAREN)
test = self._parse_assignment()
self._expect(TokenType.RPAREN)
self._consume_optional_semi()
return DoWhileStatement(test=test, body=body,
line=kw.line, col=kw.column)
Files touched: tokens.py (2 lines), ast_nodes.py (4 lines + 1 in
__all__), parser.py (15 lines).
7. Extending the interpreter
7.1 Adding a handler for a new AST node
Step 1: The new AST node is imported via from jsparse import ast_nodes as A
so A.DoWhileStatement is automatically available (no extra import needed
after adding it to ast_nodes.py).
Step 2: Add an entry in the _dispatch dict inside __init__:
A.DoWhileStatement: self._exec_do_while,
Step 3: Implement the handler. Convention:
- Name:
_exec_*for statements,_eval_*for expressions. - Signature:
(self, node: A.TheNode, env: Environment) -> Any. - Statements return
None; expressions return the computed value. - Use
self._evaluate(child_node, env)to recurse into children.
def _exec_do_while(self, node: A.DoWhileStatement,
env: Environment) -> None:
self._inside_loop += 1
try:
while True:
try:
self._evaluate(node.body, env)
except ContinueSignal:
pass
except BreakSignal:
break
if not _is_truthy(self._evaluate(node.test, env)):
break
finally:
self._inside_loop -= 1
Files touched: interpreter.py only (1 line in _dispatch, ~15 lines
for the method).
7.2 Adding a new control-flow construct
If your new construct needs non-local control flow (like throw / catch):
Step 1: Define a signal in jsexec/errors.py:
class ThrowSignal(_ControlSignal):
"""Carries the thrown value."""
def __init__(self, value: Any = None):
self.value = value
Step 2: Raise it in the handler:
def _exec_throw(self, node, env):
raise ThrowSignal(self._evaluate(node.argument, env))
Step 3: Catch it in the owning construct:
def _exec_try(self, node, env):
try:
self._evaluate(node.block, env)
except ThrowSignal as e:
catch_env = env.child()
catch_env.declare(node.catch_param, e.value, kind="let")
self._evaluate(node.catch_block, catch_env)
finally:
if node.finally_block is not None:
self._evaluate(node.finally_block, env)
7.3 Worked example: switch / case
Assuming the parser produces SwitchStatement and SwitchCase nodes from
section 6.2:
interpreter.py:
# In __init__:
A.SwitchStatement: self._exec_switch,
def _exec_switch(self, node: A.SwitchStatement, env: Environment) -> None:
disc = self._evaluate(node.discriminant, env)
matched = False
self._inside_loop += 1 # allow break inside switch
try:
for case in node.cases:
if not matched:
if case.test is None: # default case
matched = True
elif _equals(disc, self._evaluate(case.test, env),
strict=True):
matched = True
if matched:
try:
for stmt in case.consequent:
self._evaluate(stmt, env)
except BreakSignal:
return # break exits the switch
finally:
self._inside_loop -= 1
Files touched: interpreter.py only (1 dispatch entry + ~20 lines).
8. Supported language features
Statements
| Feature | Syntax example |
|---|---|
| Variable declaration | var x = 1; / let y = 2; / const z = 3; |
| Multiple declarators | let a = 1, b = 2; |
| Function declaration | function foo(a, b) { return a + b; } |
| Return | return expr; or bare return; |
| If / else | if (x > 0) { ... } else { ... } |
| While loop | while (cond) { ... } |
| Do-while loop | do { ... } while (cond); |
| For loop (C-style) | for (let i = 0; i < 10; i = i + 1) { ... } |
| For-in loop | for (let k in obj) { ... } (yields property keys) |
| For-of loop | for (let v of arr) { ... } (yields values; works on arrays, strings, dicts) |
| Switch / case | switch (x) { case 1: ...; default: ... } (C-style fall-through) |
| Try / catch / finally / throw | try { ... } catch (e) { ... } finally { ... } / throw expr; (catches JSRuntimeError too) |
| Break / continue | break; / continue; (inside loops only) |
| Block | { let x = 1; ... } |
Expressions
| Feature | Syntax example |
|---|---|
| Numeric literals | 42, 3.14, 1e5, 0xFF |
| String literals | "hello", 'world', "tab:\there" |
| Boolean / null / undefined | true, false, null, undefined |
| Identifiers | x, myVar, $, _foo |
| Arithmetic | +, -, *, /, % |
| String concatenation | "hello" + " " + "world" |
| Comparison | <, >, <=, >= |
| Equality | ==, !=, ===, !== |
| Logical | &&, ||, ! |
| Assignment | =, +=, -=, *=, /= |
| Update | ++x, x++, --x, x-- |
| Type / reflection | typeof x, x instanceof Cls, delete obj.prop, void expr |
| Ternary | cond ? a : b |
| Member access | obj.prop, arr[i] |
| Function call | fn(a, b) |
new |
new Point(3, 4) |
| Array literal | [1, 2, 3] (trailing comma OK) |
| Object literal | { name: "x", value: 1 } (trailing comma OK) |
| Regex literal | /abc/i, /^[a-z]+$/g (with .test / .exec) |
| Template string | `hello, ${name}!` (escapes, newlines, nesting OK) |
| Function expression | function (x) { return x; } |
| Named function expression | function fact(n) { ... fact(n-1); } |
| Arrow function | (x) => x + 1, (a, b) => { return a + b; } |
| Lambda (parenthesized + block) | lambda (x) { return x * x; } |
| Lambda (parenthesized + expr) | lambda (x) => x * x |
| Lambda (Python-style) | lambda a, b: a + b |
| Lambda (zero-arg) | lambda: 42 |
| Comments | // line, /* block */ |
9. Edge cases & known limitations
Handled edge cases
- Unterminated strings / block comments →
LexErrorwith line/column. ===/!==— 3-char tokens parsed before 2-char and 1-char.- Right-associative assignment:
a = b = cparses asa = (b = c). - Right-associative ternary:
a ? b ? c : d : eparses correctly. - Optional semicolons: never required; consumed when present.
- Trailing commas in
[1, 2,]and{a: 1,}are accepted. new Foo(args)vsnew Foo: both work; the no-args form producesNewExpression(callee=Foo, arguments=[]).- Named function expressions self-bind: the name is visible inside the body but does not leak to the outer scope.
break/continueoutside loops →JSRuntimeError(not a silent bug). Loop-depth tracking is saved/restored across function calls.constreassignment →JSRuntimeError.- Division by zero →
JSRuntimeError. - String + number coercion:
"x" + 1→"x1"(JS-like). null == undefinedistrue;null === undefinedisfalse.- Implicit globals: assigning to an undeclared name creates a
varin the global scope (sloppy-mode behavior).
Known limitations
- No destructuring (
let {a, b} = obj;). - No spread / rest (
...args). - No
classkeyword — classes are host-provided viaregister_class. - No
import/export. - No bitwise / shift operators (
&,|,^,<<,>>). - No strict ASI — semicolons are always optional everywhere.
- No prototype chain — only single-level class → instance.
Consequently,
instanceofchecksJSObject.clsidentity rather than walking a chain.
10. Future extensions roadmap
Each item below is designed to be a clean, localized addition following the patterns described in sections 6 and 7:
| Feature | Parser work | Interpreter work |
|---|---|---|
switch / case |
(implemented — see §7.3 and demo_switch.py) |
(implemented) |
try / catch / throw |
(implemented — see §7.2 and demo_trycatch.py) |
(implemented; also catches JSRuntimeError) |
| Template strings | (implemented — see demo_template.py) |
(implemented; uses host _to_string) |
| Regex literals | (implemented — see §7.x and demo_regex.py) |
(implemented; wraps Python re module) |
for ... of / for ... in |
(implemented — see demo_forinof.py) |
(implemented) |
| Destructuring | VariableDeclarator.name → pattern node |
Extend _exec_variable_declaration |
| Spread / rest | Spread node in args, params, array literals |
Unpack in relevant eval methods |
class keyword |
ClassDeclaration node, method definitions |
Convert to JSClass at runtime |
import / export |
ImportDeclaration etc. |
Module loader subsystem |
| Bitwise / shift | Token types + one precedence level each | Cases in _eval_binary |
| Strict mode ASI | Track newlines on tokens; real insertion rules | No interpreter change |
NodeVisitor / NodeTransformer |
(no parser change) | Generic visitor using iter_child_nodes |
Design principles
- Dataclasses everywhere — every AST node and runtime value is a
@dataclass; equality, repr, and field iteration come for free. - One concern per file — tokens, AST, parser, errors, pretty-printer, environment, values, interpreter are all separate modules.
- Dispatch tables over giant if/elif chains — both the parser (statement dispatch) and interpreter (node dispatch) use lookups.
- Errors always carry position —
LexError,ParseError, andJSRuntimeErrorall includelineandcolumn. - Extending = small, local changes — add a token, an AST node, a parse method, a dispatch entry, and a handler. No file requires changes to more than a few lines.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dcnr_jsbox-1.0.0.tar.gz.
File metadata
- Download URL: dcnr_jsbox-1.0.0.tar.gz
- Upload date:
- Size: 78.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d824df6deee8ae344ecf9789429e9e5cc531bd75ba09ffac56bdfc3d8cfaf243
|
|
| MD5 |
10d674351b6a2cd12c12dfb1a3457243
|
|
| BLAKE2b-256 |
ca1b0e5dce58cd15861993234f6dae47f2656d456dc8e84ad49e8920f2c4a762
|
File details
Details for the file dcnr_jsbox-1.0.0-py3-none-any.whl.
File metadata
- Download URL: dcnr_jsbox-1.0.0-py3-none-any.whl
- Upload date:
- Size: 57.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a312ccef14dc196f9185159227ffa8739a8ebbbd5b24029404a003f1617e3e33
|
|
| MD5 |
11d292959397cd14ca66965e37101400
|
|
| BLAKE2b-256 |
992d661ae7a88d28b28cc067f34802995f08dfea23b20ed7b594bd122ca517f0
|