Skip to main content

A package for compiling and interpreting S Language code

Project description

S Compiler & Interpreter

Table of Contents

Installation

Install (and update) using pip:

pip install -U s_interpreter

or you can clone the repository using git:

git clone https://github.com/SamuelSill/s_interpreter.git

A Simple Example

The following is a program in S that always returns the value of X1:

# Sugar for performing a non-conditional jump
> GOTO {Label L}
        Z <- Z + 1
        IF Z != 0 GOTO {L}

# Sugar for incrementing variable V1 by the value of V2
> {Variable V1} += {Variable V2}
        IF {V2} != 0 GOTO A
        GOTO E
    [A] {V2} <- {V2} - 1
        Z <- Z + 1
        IF {V2} != 0 GOTO A
    [B] Z <- Z - 1
        {V2} <- {V2} + 1
        {V1} <- {V1} + 1
        IF Z != 0 GOTO B
    [E] Y <- Y

# The program starts here
> MAIN
        Y += X1

Compiling/Running the program:

# Compile the program (in file main.slang) to binary file 'output.txt':
s_compiler -f main.slang -b output.txt

# Now 'output.txt' contains *pure* S language code, no sugars.
# Run the interpreter on the binary file on input '42'
s_interpreter -b output.txt 42

The console output:

Output: 42

You can learn more about the slang syntax here.

You can learn more about the basic syntax of the S Language here.

The S Compiler


The S Compiler is the central module of this repository.

In short, it allows us to easily compile slang files to binaries that the S Interpreter can run.

Compiler Usage

To compile a given slang file to a binary file, run the following command:

s_compiler -f /slang/file/path -o /binary/file/path

You could also pass a flag to print the encoding of the program instead, like so:

s_compiler -f /slang/file/path --encode

But be careful, as program encodings can grow incredibly large even with very few instructions, so the compiler could throw an error instead.

You could pass the flag print to print the program to stdout like so:

s_compiler -f /slang/file/path --print

You can also pass the verbose flag for additional prints regarding the compilation process like so:

s_compiler -f /slang/file/path -o /binary/file/path --verbose

You can also provide the input program by passing its encoding (a number) instead of a slang file like so:

s_compiler -d {program-encoding} -o /binary/file/path

Slang Files

In order to compile S Language code, we write it in .slang files as a convention.

Notice that the compiler is inherently case and whitespace insensitive. The compiler only considers the line continuations between instructions, and between separate words/variables in the same instruction.

.slang files are structured into sections, as follows:

MAIN

The MAIN section will contain the actual program you wish to compile. Please note that it must appear at the end of the file.

Take the following example:

# This program always returns 'Y = X1' 
> MAIN
        IF X != 0 GOTO A
        Z <- Z + 1
        IF Z != 0 GOTO E
    [A] X <- X - 1
        Y <- Y + 1
        IF X != 0 GOTO A
  • Again, note that the indentation is only for ease of reading, and the compiler completely ignores it.

Sugar Sections

The previous example uses the pattern of the GOTO L syntactic sugar.

What if we wanted to use this pattern multiple times? We can define and use a syntactic sugar like so:

> GOTO {Label L}
        Z <- Z + 1
        IF Z != 0 GOTO {L}

> MAIN
        IF X != 0 GOTO A
        GOTO E
    [A] X <- X - 1
        Y <- Y + 1
        IF X != 0 GOTO A

A few details to notice:

  • Notice that in the definition of the sugar, we specified the type of L. This sugar will only be used if we pass it with a valid label.
  • Notice the usage of variable Z inside the GOTO L sugar. When the compiler expands the usage of the sugar, it replaces all internal variables used in the sugar to variables guaranteed to be unused anywhere else in the program. The only exception to this rule is with I/O variables (Y/X1, X2, ...). They can be manipulated internally from sugar code.
  • A given sugar may only use other sugars that are defined before it in the file.
  • Sugar definitions are allowed to overlap in terms of their usage patterns. The sugar that will eventually be used in the compiled output is the first in the file that matches the string.
Sugar Types

When defining sugars, we need to specify the type of their arguments.

The supported sugar argument types are:

  • Label - To only match labels
  • Variable - To only match variables
  • Const - To only match const numbers (e.g 0, 1, 2, 3, ...)
  • Numeric - To match both variables and consts

Note that you can overload sugar definitions only differing by the sugar arguments' types!

While it seems pretty clear as to how Label/Variable are used, how would one use Const in valid S Language?

Well, there is only one way to use a Const, and that is with the REPEAT statement.

REPEAT Statement

Say for example we wanted to implement the sugar for V += 5.

Well, we'll just write it like so:

> {Variable V} += 5
      {V} <- {V} + 1
      {V} <- {V} + 1
      {V} <- {V} + 1
      {V} <- {V} + 1
      {V} <- {V} + 1

But what if we wanted to implement the sugar V += k for ANY const number k?

  • (Notice the difference here between a const number k and a variable V. While on a variable we can perform our regular S instructions, a const is a value that's known IN COMPILE-TIME and is not stored anywhere)

In that case, we'd have to use the REPEAT statement:

> {Variable V} += {Const K}
      {REPEAT K}
      {V} <- {V} + 1
      {END REPEAT}

You might be thinking: "This is cheating! REPEAT is not a sugar nor is it a supported instruction!".

Well, as explained before, syntactic sugars are only A CONVENTION that we can all agree on when writing S Language code.

The REPEAT keyword is a convention for consts, that the lines shall be repeated as the size of the const.

If you still think it's cheating, you're welcome to implement V += k as many times as you want for different k's.

The REPEAT Tradeoff

Currently, the only supported functionality of Const types is the REPEAT statement. The initial plan was to implement another functionality called {REPEAT I TO K}.

Similarly to REPEAT, it would repeat the block K times, but it would also allow us to use another compile-time variable I, that would represent the current block repetition index. This way, we can repeat the block while still changing the behavior of how it generates based on the value of const variable I. We could even perform a nested {REPEAT I} based on const value I.

But we can go even further: Given a const K, we can write a Python function that generates different instructions based on K's value! After all, sugars are a convention and therefore if we all agree that a given sugar is compiled by the output of a Python function, there should be no problem!

But if we do that, we might as well write our code in Python or some other high level language instead of S.

So that is why consts only support the REPEAT keyword. Any additional keyword that we add will remove us of the challenge of writing in S, and defeat the purpose altogether.

You can still do meta-programming in S if you want though. Just not with consts (as S does not natively support them).

The Numeric Type

While the Const type helps reduce code repetition, there could still be code repetition between const/variable implementations.

Suppose we've already implemented sugars for V <- 0 and V1 += V2, and now we wish to implement V1 <- V2.

The straightforward approach would be to implement it as follows:

> {Variable V1} <- {Variable V2}
      {V1} <- 0
      {V1} += {V2}

This implementation works fine, but now we decide to implement the same for consts. So we implement the sugar V += K, and now we need to implement V <- K:

> {Variable V} <- {Const K}
      {V} <- 0
      {V} += {K}

But notice the code repetition between the two implementations! Yuck!

Well, Numeric comes to the rescue:

> {Variable V} <- {Numeric N}
      {V} <- 0
      {V} += {N}

The numeric helps us create the two implementations at once! This works as intended. When we use the sugar for consts, the compiler uses sugar V += K to expand the second line, and when we use the sugar for variables, it uses V1 += V2.

You might think this is pretty minor, but imagine you wanted to implement V1 <- V2 + V3. You'd have to implement the same sugar 4 times to allow for both const/variable usages.

The Pitfalls of Sugars
Sugar Argument Edge Cases

Suppose you've implemented the sugars {Variable V} <- {Const K} and {Variable V1} -= {Variable V2} correctly, and you wish to use them to implement the sugar {Variable OUT} <- {Const K} - {Variable V}.

Well, the straightforward approach is to implement it like so:

> {Variable OUT} <- {Const K} - {Variable V}
        {OUT} <- {K}
        {OUT} -= {V}

But there's an edge case here that needs to be addressed. Can you see it?

What if we expand the line Z <- 100 - Z? It would expand to:

Z <- 100
Z -= Z

No matter the initial value of Z, the given lines will set it to 0!

So let's fix the case for when V is the same variable as OUT:

> {Variable OUT} <- {Const K} - {Variable V}
        Z <- {V}
        {OUT} <- {K}
        {OUT} -= Z

Now you may argue that copying the value of V to Z in the common case where OUT is not the same variable as V is inefficient.

That would be quite silly of you, as it's funny to argue about efficiency in this language to begin with, but you'd still be right.

Well we can add an exception to this case! We can overload the sugar like so:

> {Variable OUT} <- {Const K} - {Variable OUT}
        Z <- {OUT}
        {OUT} <- {K}
        {OUT} -= Z

> {Variable OUT} <- {Const K} - {Variable V}
        {OUT} <- {K}
        {OUT} -= {V}

The order here MATTERS! As explained previously, the sugar chosen to expand a given line will be the first in the file that matches it. If declared the other way around, the more general-case sugar will always take place, and we'll be left with the same bug we started with.

Sugar Internal Variables

Suppose you wished to initialize a given variable Z to 5 in the MAIN section.

You could simply write it as Z <- Z + 1 5 times. After all, Z is initialized to 0, so we don't need to set it to 0 beforehand.

But what about sugars? You might think that it's safe to do the same with sugars, as the compiler guarantees that the sugar's internal variables are not used anywhere else in the program.

Well, while you may be right that the compiler takes care of us with regard to the uniqueness of the variables, a problem amy still occur if the sugar is used inside a loop.

Let's see an example.

Suppose you've implemented {Variable V1} <- {Variable V2} correctly. Let's look at the following code:

> {Variable OUT} <- 1
      Z <- Z + 1
      {OUT} <- Z

> MAIN
  [A] Y <- 1
      X <- X - 1
      IF X != 0 GOTO A

You would assume for input X = 2 the value of Y would be 1. But it would actually be 2, as the lines that Y <- 1 is expanded to work with the same variable, whatever the compiler may choose it to be.

Of course this issue would not be present if we were to write our MAIN section like so:

> MAIN
    Y <- 1
    Y <- 1

This code will in fact always return 1, as the two lines expand to different lines, hence with different Z variables.

Sugar local variables may seem to behave like static variables initialized to 0 at first, but they're only shared between different 'calls' to the same occurrence of the sugar in our code.

As a result of this, a guideline for writing your sugars should be to make sure the internal variables you use are initialized to 0, either by actively zero-ing them at the start of the sugar, or by zero-ing them before exiting the sugar.

You don't have to do that though if you don't actually care about the exact value of your variable (e.g GOTO L).

The Algorithm

Sugar Parsing

Firstly, the compiler parses the different sugar sections and generates regex patterns that would detect potential usages for each sugar. It also validates the sugar's most basic syntax, like making sure that for each REPEAT keyword comes an END REPEAT eventually.

Compiling The Code

The compiler starts with the MAIN section.

It goes over each instruction in the section:

  • If it's a supported S instruction, just compile it as usual
  • Otherwise, go over all known sugars in order, and find the first matching sugar.
    • If there is none, the program does not compile.
    • Otherwise, the compiler saves the sugar and its matching occurrence for later compilation.

The compiler then goes over all sugars that were saved for later compilation and expands each sugar in-place (using the previously compiled instructions to avoid name collision).

Sugar Expansion

The sugar expansion algorithm is probably where most of the brains of the compiler lies.

It's a bit too intricate to be discussed here, so it is recommended you read the algorithm in file compiler.py, under the SyntacticSugar.compile function.

The S Interpreter

While the S Compiler does all the work of compiling .slang files to binaries, the interpreter's purpose is only to run those compiled binaries.

Interpreter Usage

To run the interpreter on a given binary file, run the following command:

s_interpreter <x1-input> <x2-input> ... <xn-input> -b /binary/file/path

The interpreter will print out the result of the binary on the given input (variable Y).

You could also pass an additional flag to print extra info about the run performed like so:

s_interpreter <x1-input> <x2-input> ... <xn-input> -b /binary/file/path --run_info

The S Language


The S Language is a model of computation that is taught in course Theory of Computation at The Academic College of Tel Aviv, Israel.

The Model

Variables

The main thing to note about the S Language, is that it only supports non-negative integers.

The language supports three different variable types:

  • Input Variables: X1/X, X2, X3...
  • Temporary/Work Variables: Z1/Z, Z2, Z3...
  • One Output Variable: Y1/Y

Labels

You can mark instructions in the S Language with labels.

The label names could only be of the following:

A1/A, B1/B, C1/C, D1/D, E1/E, A2, B2, ..., E2, A3, B3, ...

Instructions

The S Language only supports 4 different instructions:

Given any variable V and label L:

  • V <- V - A blank instruction (do nothing)
  • V <- V + 1 - Increment V by 1
  • V <- V - 1 - If V is 0 do nothing, Otherwise decrement V by 1
  • IF V != 0 GOTO L - If V isn't 0, jump to the first occurrence of label L (if nonexistent the program exits), otherwise continue from the next instruction.

The Program

A program written in the S Language is composed of a series of instructions, with or without labels.

At the start of the program, all variables are initialized to 0.

Only the input variables (X1, X2, ...) are initialized with the user input. All input variables that weren't given before running the program are also initialized to 0.

A given program may behave in one of two ways:

  • Run infinitely, in which case it returns nothing
  • Finish running, in which case ONLY the value of variable Y is returned

The program runs the instructions sequentially one after the other (unless a jump is performed), and may exit by two ways:

  • If it reaches the end of the program
  • If it performs a jump to a nonexistent label.

Example

Let's write a program that always returns the value of its first input (X1).

Our first approach would be to write something like this:

[A] X <- X - 1
    Y <- Y + 1
    IF X != 0 GOTO A

This program does the trick for all inputs except for X = 0.

Let's fix the edge case:

    IF X != 0 GOTO A
    Z <- Z + 1
    IF Z != 0 GOTO E
[A] X <- X - 1
    Y <- Y + 1
    IF X != 0 GOTO A

Notice that in order to jump unconditionally to non-existent label E (and therefore exit), we had to increment a temporary variable to make sure it's not 0.

Syntactic Sugars

Due to the simplicity of the S Language, we might want to define conventions for patterns that often appear in our code.

Take for example the unconditional jump pattern to some label L:

Z <- Z + 1
IF Z != 0 GOTO L

We can agree that whenever someone writes GOTO L in their program, it would mean that they wrote the above pattern.

Of course the Z variable would have to be unique and unused anywhere in the program, and not hard-coded Z. We wouldn't want our sugars to change the rest of the program's functionality.

This DOES NOT MEAN that GOTO L is now a supported instruction in the S Language. It's just a convention between us programmers.

To compile a program that uses syntactic sugars, we'd still need to expand them to their underlying instructions.

The syntactic sugars will allow us to write higher level code while still respecting the language's restrictions.

Encoding

Since our language only supports integers, we might want to encode structured data into integers so that we can process it.

PairWise

If we wish to encode the pair <a, b> where a, b are non-negative integers, we follow the formula:

$$\left< a, b \right> = 2^a(2b + 1) - 1$$

ListWise

If we wish to encode the list [e1, e2, ..., en], where all elements are non-negative integers, we follow the formula:

$$[e_1, e_2, ..., e_n] = \sum_{i=1}^n p_i^{e_i}$$

Where pi is the prime at position i (e.g p1=2, p3=5)

  • Notice that adding 0 to the end of the list does not change its encoding. Therefore, if we wish to know the length of the list, we assume it won't end with 0 elements.

Instruction Encoding

Encoding Variables: If we wish to encode a given variable, we just look at its index in the following sequence:

Y, X1, Z1, X2, Z2, X3, Z3, ...

  • For example, #Y = 1, #Z2 = 5 etc.

Encoding Labels: If we wish to encode a given label, we just look at its index in the following sequence:

A1, B1, C1, D1, E1, A2, B2, ...

  • For example, A1 = 1, #C1 = 3 etc.

Encoding Instructions: If we wish to encode a given instruction (with or without label), it is encoded as <a, <b, c>>, where:

  • a:
    • 0 if the instruction has no label
    • Otherwise the encoding of the label
  • b:
    • 0 if V <- V
    • 1 if V <- V + 1
    • 2 if V <- V - 1
    • (2 + #L) if IF V != 0 GOTO L
  • c: #V - 1 where V is the variable in the instruction

Program Encoding

If we wish to encode the program:

I1
I2
.
.
.
In

We just follow the formula:

#[#I1, #I2, ..., #In] - 1

  • Notice that the encoding for Y <- Y is 0. This means that adding such instruction without a label to the end of a program will not change its encoding. This is problematic, so as a convention, no program will end with Y <- Y without a label.

Conventions

  • If the index of a variable/label is 1, we can omit it.
  • No program will end with a non-labeled Y <- Y instruction.
  • Although it's not mandatory, label E (E1) is never used as a label of an instruction, so that we can always jump to it if we wish to exit the program.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

s_interpreter-0.0.5.tar.gz (26.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

s_interpreter-0.0.5-py3-none-any.whl (19.9 kB view details)

Uploaded Python 3

File details

Details for the file s_interpreter-0.0.5.tar.gz.

File metadata

  • Download URL: s_interpreter-0.0.5.tar.gz
  • Upload date:
  • Size: 26.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.16

File hashes

Hashes for s_interpreter-0.0.5.tar.gz
Algorithm Hash digest
SHA256 10e8059397d166b416b30af208d83b260f98fd414ac3eba664fdea77ad400f3d
MD5 3b85aefb79f6419df9c56a1ccaf3bcc0
BLAKE2b-256 e4bf7d83303058899997604937841bd49cfd808efbc86582dfd32a716d79a2e5

See more details on using hashes here.

File details

Details for the file s_interpreter-0.0.5-py3-none-any.whl.

File metadata

  • Download URL: s_interpreter-0.0.5-py3-none-any.whl
  • Upload date:
  • Size: 19.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.16

File hashes

Hashes for s_interpreter-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 da3ee8dd632ceaebd9d7ddcc2f9577ebccbc7abbb9bc2a35620d96a585a3ef77
MD5 9eaa51a013804d7d13f05da105c89525
BLAKE2b-256 e403aea9d95d4d480d74c4383b422f75d795971c087ed2932851b25a5bb37ffa

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page