Playing with Forth

As I described in the Previous article, I’m interested in Forth language as of recently and want to bootstrap it on Javascript.

In summary, Forth is a stack based programming language, as many people heard before. However, it is also has some peculiar meta programming capability that makes it harder to classify: If the direct stack handling looks between C e ASM, its syntax extension and code abstraction makes higher than anything I worked before.

For me it is actually a different philosophy to computing, a road forked many years ago and evolved in a different perspective, with emphasis in abstracting code instead of the data. It can be classified as structured programming, it has control structures and heavily induces you to organize the code in procedures. It however has no baked in type checking, you need to keep track of it on your own. You also have to keep track of balancing the stack, as it is famously known for.

I still want to play with it, so let’s begin implementing it in Javascript! I’m heavily basing myself on https://forth-standard.org/standard/core, to use the conventional names and behavior, but complete compliance is not the objective (at least right now, be gone yak!). The core is the minimal required, with just over 100 words, to call it a complete Forth.

Forth programs mostly consists on a sequence of words, separated by space. We can push integer on the data stack (also called the stack) by simply typing it, like 123 (pushes 123) and 999 42 (pushes 999, then pushes 42). To be able to do that we can use a simple js Array like this:

function createStack() {
  /** @type {number[]} */
  const stack = []
  return {
    push: val => stack.push(val | 0), // |0 ensures int32
    pop: () => stack.pop(),
    size: () => stack.length,
    top: () => stack[stack.length - 1],
    dump: () => stack.map(v => v),
    clear: () => stack.splice(0),
  }
}
const dataStack = createStack()

// These save some keystrokes, as these functions are frequently used
const { push, pop } = dataStack

We can type this on the console or on the nodejs REPL to get a feel of this and see the changes on dataStack. We don’t want other manipulations on the stack, so I encapsulated it inside the createStack() function. The functions are self-explanatory, dump() being the only outlier, but it just copies the stack for easier debug.

Other words works as functions, but instead of parameters explicitly, the word just access the stacks, takes what it needs and pushes its results, if any. for example 40 2 + pushes 40 and 2, then the word + pops both and puts its results (42) on stack. It is a convention to use a notation to describe the stack effects as ( <before> -- <after>), for example:

And yep, the word ( starts a comment, that goes up to the next ). As it is a word, so it needs a space to be properly parsed: this is valid ( a comment), but this isn’t (a comment).

I also use a similar notation from the standard to describe examples code -> stack, with code on the left and the stack after it on the right:

 2 2 + -> 4
-7 7 + -> 0
   9 1 -> 9 1
     + -> 10
10 6 - -> 4

As we are building an interpreter, we need be able to recognize the name and its equivalent action. In Forth this is usually done with a dictionary that is implemented as a linked list. But we have Objects in javascript that are basically a string to any dictionary already, so let’s use that:

function createDictionary() {
  /** @type {Object.<string, Definition>} */
  const dictionary = {}
  return {
    /** @returns {Definition} */
    define: (name, action) => dictionary[name] = { name, action },
    find: name => dictionary[name],
    dump: () => Object.values(dictionary),
  }
}
const dictionary = createDictionary()
dictionary.define('+', () => push(pop() + pop()))
dictionary.define('-', () => push(-pop() + pop()))
dictionary.define('DROP', pop)
dictionary.define('DUP', () => push(dataStack.top()))
dictionary.define('DEPTH', () => push(dataStack.size()))
dictionary.define('SWAP', () => {
  const a = pop()
  const b = pop()
  push(a)
  push(b)
})

I defined some of the example words, so we already can interpret one word at time by trying to find the word, calling it if found, or parse and push as number otherwise. We could also, naively, get an input line somehow, split it on spaces and have a rudimentary evaluator, like this:

const evaluateWord = word => {
  const a = dictionary.find(word)?.action;
  if (a) { a() } else { push(word) }
}
const evaluateRudely = str => str.split(/\s+/).forEach(evaluateWord)

This is not really correct because some words can actually consume the input ahead as a parameter, like the ( ...comments... ) we mentioned above or VARIABLE <NAME>, that creates a variable named <NAME>. It is enough to test the words we have defined so far, tough.

Going back to the words described above, we can define the word NEGATE in terms of already existing ones: 0 SWAP -. To demonstrate, let’s assume we have a integer n on the top of our stack, then you can see the steps below will work:

0    -> n 0 
SWAP -> 0 n
-    -> -(n)

So instead of defining it as a js function, lets use our Forth to compile it! We could define some function like () => evaluateRudely('0 SWAP -'), but this is will parse and interpret every time call it, which is slow, ugly and boring (and wrong too considering the scope rules). So instead, let’s create a way to compile this to js. First we create an array and store all word’s actions there, so we can refer then by its index:

// All code is stored here
const codeSpace = [push]

// The address of LIT, as it is not visible on dictionary
const xtLIT = codeSpace.length - 1

// Assign the index for already defined words
for (const d of dictionary.dump()) {
  d.code = codeSpace.length
  codeSpace.push(d.action)
}

function addressOf(word) {
  const xt = dictionary.find(word)?.code ?? null
  if (xt === null)
    throw new Error(`No word defined with name ${word}`)
  return xt
}

// Fills both codeSpace and dictionary
function define(name, action) {
  const code = codeSpace.push(action) - 1
  const d = dictionary.define(name, action)
  d.code = code
  return d
}

With this we can describe a composed function as an array of their code ids, like [/*push 0*/*, addressOf('SWAP'), addressOf('-)] and we just need to have go through it. We need to have some way to describe “push”, I will go by the tradition of threading the code and have something like [xtLIT, 0, 6 /* SWAP*/, 2 /*-*/]. Then we just need an executor function to go through it.

// Another stack, but mainly used for return address bookkeeping
let returnStack = createStack()

// Execute sequential steps within a thread
// Return the next pc, with the next thread xt on returnStack
function executorStep(thread, pc) {
  const currentAddress = returnStack.pop()
  while (pc < thread.length) {
    const address = thread[pc++]
    const action = codeSpace[address]
    if (action.thread) {
      returnStack.push(currentAddress)
      returnStack.push(pc)
      returnStack.push(address)
      return 0
    }
    action.apply(null, thread.slice(pc, pc += action.length))
  }
  return returnStack.pop()
}
// Execute the thread recursively
function executor(root) {
  const savedStack = returnStack

  returnStack = createStack()
  returnStack.push(-1)
  let pc = 0
  while (returnStack.size()) {
    const address = returnStack.top()
    const thread = address === -1 ? root : codeSpace[address].thread
    pc = executorStep(thread, pc)
  }
  returnStack = savedStack
}

The executor is somewhat complex, but in summary it steps through (with executorStep()) the threads until the returnStack is empty. executorStep, on the other hand, just iterates through the steps until it either pushes or pops the returnStack.

With that we just need somehow to compile words into the lists and wrap it in a method, so I define the Compiler class as follows:

// Mounts a compilation unit for us
function Compiler() {
  this.buffer = []
}
Compiler.prototype.code = function (xt, ...params) {
  this.buffer.push(xt, ...params)
  return this
}
Compiler.prototype.word = function (word) {
  return this.code(addressOf(word));
}
Compiler.prototype.lit = function (num) {
  return this.code(xtLIT, +num)
}
Compiler.prototype.build = function () {
  const thread = this.buffer
  const runner = () => executor(thread)
  runner.thread = thread
  return runner
}
define('NEGATE', new Compiler().lit(0).word('SWAP').word('-').build())
define('INVERT', new Compiler().word('NEGATE').lit(1).word('-')
  .build())

This above allows defining words based on existing ones. As examples, I defined NEGATE (the unary -) and INVERT (the binary not). You can test them with the evaluateRudely above and see they working:

    10 -> 10
NEGATE -> -10
NEGATE -> 10
INVERT -> -11
INVERT -> 10

This above allows defining words based on existing ones. We could have some sort of compile() though, so we could just use forth syntax directly. Let’s start by first defining a correct way to parse the text with the InputBuffer below.

// InputBuffer 
function InputBuffer(text) {
  this.buffer = text
  this.offset = 0
}
let inputBuffer = new InputBuffer('')

InputBuffer.prototype.match = function (delim) {
  if (this.offset >= this.buffer.length) {
    return false
  }
  const ch = this.buffer[this.offset]
  if (ch === delim) return true
  return delim === ' ' && !!ch.match(/[\x00-\x20\x7f]/)
}
InputBuffer.prototype.skip = function (delim) {
  const len = this.buffer.length
  const off = this.offset
  while (this.offset < len) {
    if (!this.match(delim)) {
      break
    }
    this.offset++
  }
  return this.offset - off
}
InputBuffer.prototype.parse = function (delim) {
  const len = this.buffer.length
  const off = this.offset

  while (this.offset < len) {
    if (this.match(delim)) {
      const text = this.buffer.slice(off, this.offset)
      this.offset++ // Discard the delimiter
      return text
    }
    this.offset++
  }
  return this.buffer.slice(off)
}
InputBuffer.prototype.parseName = function () {
  this.skip(' ')
  return this.parse(' ')
}

While only parseName() was necessary for now, I made it composed by skip() and parse() to support non word parameters, like the ( and \:

define('(', () => inputBuffer.parse(')')).immediate = true
define('\\', () => inputBuffer.parse('\n')).immediate = true

Observe we mark them both as immediate. This is so we can execute the comments even when compiling. Their effect is just ignoring text, but other commands could do more complicated things.

We can now create our compile() function and even its sibling evaluate():

// Stores the current word being compiled
let compiler = new Compiler()

// Flag to know if we are compiling or interpreting right now
let compiling = false

// Do just one step with the given word, respecting the compiling flag
function doStep(word) {
  const d = dictionary.find(word)
  if (d) {
    if (!compiling || d.immediate) {
      return d.action()
    }
    compiler.code(d.code)
  } else if (compiling) {
    compiler.lit(word)
  } else {
    push(word)
  }
}
// Run though our current inputBuffer until the end
function run() {
  let word
  while ((word = inputBuffer.parseName())) {
    doStep(word)
  }
}
// Evaluate code!
function evaluate(text) {
  const savedBuffer = inputBuffer
  inputBuffer = new InputBuffer(text)
  run()
  inputBuffer = savedBuffer
}
// Compile fragment into the existing compiler
Compiler.prototype.compile = function (text) {
  const savedCompiler = compiler
  const savedCompiling = compiling

  compiler = this
  compiling = true
  evaluate(text)

  compiler = savedCompiler
  compiling = savedCompiling
  return this
}
// Compile!
function compile(text) {
  return new Compiler().compile(text).build()
}

Sp now we can use compile to create functions using our syntax. First, lets go back and define for once the words OVER and ROT:

// These allow using return stack as a temporary stack
define('>R', () => returnStack.push(pop()))
define('R>', () => push(returnStack.pop()))
define('R@', () => push(returnStack.top()))
define('OVER', compile('>R DUP R> SWAP'))
define('ROT', compile('>R SWAP R> SWAP'))

We now have a working evaluate method, a few arithmetic words and a few stack manipulation words, I almost can make an useful interpreter but we are still missing some way to get feedback from the operations as We don’t have any way to output anything. So lets fix that:

// Pops number top of the stack and show its value plus space.
define('.', () => output(pop().toString() + ' '))

// Pops number top of the stack and show it as a UTF16 code point.
define('EMIT', () => output(String.fromCharCode(pop())))

// Shows the entire stack. No popping is involved
define('.S', () => output(dataStack.dump().join(', ') + '\n'))

// Prints space
define('CR', compile('10 EMIT'))

// Prints new line
define('SPACE', compile('32 EMIT'))

// Pushes First CHAR on the stack
define('CHAR', () => push(inputBuffer.parseName().charCodeAt(0)))

This assumes we have an output function that will somehow work. One easy way to define it would be:

let outBuffer = ''
function output(str) {
  const newLineIndex = str.lastIndexOf('\n')
  if (newLineIndex === -1) {
    outBuffer += str
  } else {
    console.log(outBuffer + str.slice(0, newLineIndex))
    outBuffer = str.slice(newLineIndex + 1)
  }
}

You can see an example page here, using the component created previously. If you speak nodejs, then you could take the input using readline module like this:

const rl = readline.createInterface({
  input: process.stdin,
  output: process.stdout,
  prompt: '% ',
})

rl.prompt()

rl.on('line', line => {
  if (line.trim() === '.quit') {
    console.log('Bye!')
    rl.close()
    return
  }
  evaluate(line)
  evaluate('CHAR O EMIT CHAR k EMIT CR')
  rl.prompt()
}).on('close', () => {
  process.exit(0)
})

You can make some tests, like:

2 2 + . / Prints 4
CHAR H EMIT CHAR i EMIT / A bastard hello world

But we can’t yet define our own words, nor do any sort of control flow nor even really make a hello world conveniently. That’s what I’ll cover on the next part: Forth innards exposed. See you there!

Code examples available on Gist and includes the cli and the browser interpreters.

Updated on 17 November, 2021: Add some type comments and synonyms for commonly used functions push, pop and top.