Forth innards exposed
I continued playing with Forth. TLDR: Here the source and here the live demo. Here on the description of what I have done in order to make it Turing complete. In hindsight I should had divided the work better, in more regular chunks, but now it’s done.
Last on previous post, we got to define new words using existing words by the compile(), but we can not define new ones within our interpreter yet. One step to achieve that is to have means of switching between compile and interpreter mode. To understand this let’s first see a limitation with our current compiler.
Let’s create an word to add the .
ascii value on stack:
define('DOT', compile('CHAR .')) // Won't work!
It won’t work as we wanted: It will try to get the current input buffer when it is called, for example DOT +
will put +
on the stack and then print its value as number. We need some way to make CHAR to be executed on compilation time and make its value available on run time. For this we have these new words:
// ( C: x -- ) ( -- x )
// Pos value at compile time and makes current definition push it
// on its run time
define('LITERAL', () => { compiler.lit(pop()) }).immediate = true
// ( -- ) Exits compilation mode
define('[', () => compiling = false).immediate = true
// ( -- ) Enters compilation mode
define(']', () => compiling = true)
With this, we can temporarily go to execution mode execute some words and then go back to compile mode and append its result to it:
define('DOT', compile('[ CHAR . ] LITERAL') // This works
Now it works like we wanted to, when evaluate DOT
it will push the ascii value for ‘.’ on stack just like we wanted. We also now have a mechanism to switch between compilation and execute mode, so we can use it to implement the word that lets you define other words, the colon-definition:
let curDef = ''
// ( "<spaces>name"-- ) parse name and append to new definition
define('DEFER', () => curDef = inputBuffer.parseName())
define(':', compile('DEFER ]'))
define('BUILD', () => {
push(codeSpace.push(compiler.build()) - 1)
compiler = new Compiler()
})
define('PERSIST-DEF', () => {
const xt = pop()
dictionary.define(curDef, codeSpace[xt]).code = xt
})
define('COMPILE,', () => compiler.code(pop()))
define('POSTPONE', () => {
const name = inputBuffer.parseName()
const d = dictionary.find(name)
if (!d) {
throw new Error(`word ${name} not found ${dictionary.dump().map(a => a.name)} `)
}
if (d.immediate) {
compiler.code(d.code)
} else {
compiler.lit(d.code).word('COMPILE,')
}
}).immediate = true
define(';', compile('BUILD PERSIST-DEF POSTPONE [')
).immediate = true
With this implemented, let’s implement IMMEDIATE, that allows us to define [CHAR]
that makes the DOT above simpler:
// Marks last definition as immediate
define('IMMEDIATE', () =>
dictionary.find(curDef).immediate = true)
// ( C: "<spaces>name" -- ) ( -- x ) x is the first char from name
evaluate(': [CHAR] CHAR POSTPONE LITERAL ; IMMEDIATE')
The DOT can be defined as simple as:
: DOT [CHAR] . ;
Another type of useful defining word is CODE
. This allows us defining javascript words withing the interpreter:
define('CODE', () => {
curDef = inputBuffer.parseName()
const start = inputBuffer.offset
let delimCandidate
do {
const end = inputBuffer.offset
delimCandidate = inputBuffer.parseName()
if (delimCandidate === 'END-CODE') {
return define(curDef, eval(inputBuffer
.buffer.slice(start, end - 1)))
}
} while (delimCandidate)
throw new Error('SOURCE finished before END-CODE was found')
})
This is a little hacky, but does its job and allow us to develop everything in forth here on. It will involve some copy and pasting, but this should be worth it in the end.
Cool, right? With that out of way, let start thinking on variables and constants. We can make constants by simply defining a word, like : BL 32 ;
, which just pushes the value of space (0x20) into stack. But here shines the meta programming characteristics of Forth: We can define a word that defines this for us:
: CONSTANT : POSTPONE LITERAL POSTPONE ; ;
0 CONSTANT FALSE
FALSE INVERT CONSTANT TRUE
32 CONSTANT BL
See, we defined CONSTANT, which pops from stack and assign the value to the given name. I used it to defined the constants FALSE
, TRUE
and BL
, and then even used the value of FALSE to define TRUE! But of course we could use some variables too, but now we need someplace to store them. So first we create the following to manage dataSpace storage:
// 1KiB of RAM!
const dataSpace = new DataView(new ArrayBuffer(1024))
const herePtr = 8
dataSpace.setInt32(herePtr, 12)
push(herePtr)
( a-addr -- x ) \ gets the cell value stores at a-addr
CODE @
() => push(dataSpace.getInt32(pop()))
END-CODE
( x a-addr -- ) \ set the value of a-addr to x
CODE !
() => dataSpace.setInt32(pop(), pop())
END-CODE
\ Creating a variable manually. The pos of HERE is on Stack
CONSTANT HERE
: ALLOT ( n -- ) \ allocates (or de-allocates if n < 0) n bytes
HERE @ + HERE ! ;
: CELL+ ( a-addr1 -- a-addr2 ) \ a-addr2 is a-addr1 plus one cell
4 + ;
: CELLS ( n1 -- n2 ) \ n2 os the address in bytes for n1 cells
DUP + DUP + ;
( c-addr -- x ) \ Like @ but only gets a UTF16 Char code
CODE C@
() => dataStack.push(dataSpace.getUint16(pop()))
END-CODE
( x c-addr -- ) \ Like ! but only sets a UTF16 Char code
CODE C!
() => dataSpace.setUint16(pop(), pop())
END-CODE
: CHAR+ ( c-addr1 -- c-addr2 ) \ c-addr2 is c-addr1 plus 1 UTF16 char
2 + ;
: CHARS ( n1 -- n2 ) \ n2 os the address in bytes for n1 UTF16 chars
DUP + ;
CODE B@
() => dataStack.push(dataSpace.getUint8(pop()))
END-CODE
In summary, HERE returns the pointer to the next available byte. @ dereferences a pointer, ! sets the value pointed by a pointer. ALLOTS allocates and de-allocates space.
I organized the code into multiple lines, but the interpreter does not handle this well, so I put the sources in their respective .4th
files and use the following code to wrap then into javascript:
const fs = require('fs')
const funcNames = []
for (const source of process.argv.slice(2)) {
const lastSlash = source.lastIndexOf('/')
const lastDot = source.lastIndexOf('.')
const name = source.slice(lastSlash > 0 ? lastSlash : 0,
lastDot > lastSlash ? lastDot : source.length)
const funcName = 'load'
+ name[0].toUpperCase()
+ name.slice(1).replace(/-(\w)/g, (val) => val[1].toUpperCase())
funcNames.push(funcName)
console.log(`function ${funcName}(evaluate) {`)
const code = fs.readFileSync(source, 'utf8')
.trim()
.replace(/([\\'"`])/g, '\\$1')
.replace(/\n/g, '\\n\'+\n \'')
console.log(` return evaluate(\n \'${code}\'\n )`)
console.log('}')
}
console.log('if (typeof module !== \'undefined\') {')
for (const funcName of funcNames) {
console.log(` module.exports.${funcName} = ${funcName}`)
}
console.log('}')
With this I can
$ node wrapSource.js *.4th > core.js
And the I just need to make sure core.js is properly imported and the load* functions are called in order.
Back on track, we now can define the words to manipulate variables. Now on variables:
: CREATE ( "<spaces>name" -- ) \ Create pointer to contiguous data
HERE @ CONSTANT ;
: VARIABLE ( "<spaces>name" -- ) \ Create pointer and allocate 1 cell
CREATE 1 CELLS ALLOT ;
: , ( x -- ) \ Allocates an extra cell and stores x there
HERE @ ! 1 CELLS ALLOT ;
: C, ( x -- ) \ Allocates an extra cell and stores x there
HERE @ C! 1 CHARS ALLOT ;
These words, specially ALLOT and the comma words, also allow accessing multiple addresses like an array.
The previously created CODE
is an optional word, from the programming toolset. It allows defining code for the underlining platform, generally assembly, but can be anything the implementation want it to be. The .S
I implemented on previous article was also from this toolset. These are useful for tools for programming and debugging, so I decide to fill the remaining words of the basic part:
( -- ) \ Show all WORDS currently defined
CODE WORDS
() => output(dictionary.dump()
.map(d => `${d.code}\t${d.name}\n`).join(''))
END-CODE
: ? ( a-addr -- ) \ Print the value stored at this a-address
@ . ;
( "<spaces>name" -- ) \ Print the given word in a readable format
CODE SEE
() => {
const name = inputBuffer.parseName()
const d = dictionary.find(name)
if (!d) {
throw new Error(`Can't find word named ${name}`)
}
const postfix = d.immediate ? 'IMMEDIATE ' : ''
const action = d.action
const body = action.thread
? `() => executor([${action.thread.join(', ')}])`
: action.toString().trim()
output(`CODE ${d.name}\n ${body}\nEND-CODE ${postfix}`)
}
END-CODE
These complete the basic set of of the optional Programming tools, by the way. .S
was already here, so no need to go back to it. WORDS just list all the words we have, ?
gets the value at given address and SEE lets us see the word definition (this last is not as friendly as I wanted)
But we still lack basic control-flow necessary to get the remaining DUMP. Let’s fix this by first defining those low level control-flow:
: BRANCH ( n -- ) \ Branch by offset n (relative to next)
R> + >R ;
: RECURSE ( R: -- th ) \ Call the current word being executed
R> R@ SWAP >R >R 0 >R ;
: EXIT ( R: th -- ) \ Return early from a word
R> R> DROP DROP ;
( x -- TRUE | FALSE ) \ Returns whether the top is equal to zero
CODE 0=
() => push(pop() ? 0 : -1)
END-CODE
: 0BRANCH ( n x -- ) \ Branch by offset n, when x is 0
0= NEGATE DUP + BRANCH DROP EXIT R> + >R ;
( -- th-addr ) \ Gets address reserved to compile next instruction
CODE PC@
() => push(compiler.buffer.length)
END-CODE
: LIT@ ( C: -- orig ) \ Compiles a literal with value to be defined
0 POSTPONE LITERAL PC@ ;
( C: x orig -- ) \ Defines a pending literal created by LIT@
CODE LIT!
() => { compiler.buffer[pop() - 1] = pop() }
END-CODE
RECURSE
and EXIT
are actually pretty useful as they are, RECURSE calling recursively the current word and EXIT being equivalent of return
in other programming languages. 0=
is also pretty simple too, it just compare to zero and pushes TRUE or FALSE. BRANCH
and 0BRANCH
, however, are pretty much low level, but allow us to build the higher level control flow, as such:
: IF ( C: -- orig ) ( x -- ) \ Jumps to next unpaired THEN or ELSE
\ when the x is 0
LIT@ POSTPONE SWAP POSTPONE 0BRANCH PC@ ; IMMEDIATE
: THEN ( C: orig -- ) \ Closes the last unpaired IF or ELSE
NEGATE PC@ + SWAP LIT! ; IMMEDIATE
: ELSE ( C: orig1 -- orig2 ) \ Closes the last unpaired IF and if its
\ x was not 0 jumps to next unpaired THEN
LIT@ POSTPONE BRANCH >R POSTPONE THEN R> PC@ ; IMMEDIATE
: BEGIN ( C: -- dest ) \ Starts a unbounded loop
PC@ ; IMMEDIATE
: UNTIL ( C: dest -- ) ( x -- ) \if x is zero, jumps to dest
PC@ - 4 - POSTPONE LITERAL POSTPONE SWAP POSTPONE 0BRANCH
; IMMEDIATE
: AGAIN ( C: dest -- ) ( x -- ) \ jumps to dest
PC@ - 3 - POSTPONE LITERAL POSTPONE BRANCH ; IMMEDIATE
: WHILE ( C: dest -- orig dest ) ( x -- ) \ jumps to next unpaired
\ REPEAT if x is zero
POSTPONE IF ROT ; IMMEDIATE
: REPEAT ( C: orig dest -- ) \ when last unpaired WHILE's x was not
\ zero, jump to dest
POSTPONE AGAIN POSTPONE THEN ; IMMEDIATE
: ?DUP ( n -- n n | 0 ) \ Duplicate only if it is not zero
DUP IF DUP THEN ;
: DUMP ( addr u -- ) \ Print all values within range
BEGIN ?DUP WHILE 1 - >R DUP B@ . 1 + R> REPEAT DROP ;
There is a few more control flows, but this is good enough for me right now, so I leave as an exercise for the reader :stuck_out_tongue:
Finally, lets deal with strings. I first create a few functions to allow moving string from and to our memory. I also will reserve 80 chars for temporary storing inputs. So finally we can implement SKIP, PARSE and PARSE-NAME. With that we can create several basic words below.
function storeStr(pos, str) {
for (const ch of str) {
dataSpace.setUint16(pos, ch.charCodeAt())
pos += 2
}
return str.length
}
function pushStr(pos, str) {
storeStr(pos, str)
push(pos)
push(str.length)
}
function loadStr(pos, len) {
const arr = new Array(len)
for (let i = 0; i < len; i++) {
arr[i] = String.fromCharCode(dataSpace
.getUint16(pos + i * 2))
}
return arr.join('')
}
function popStr() {
const sz = pop()
return loadStr(pop(), sz)
}
const scratchPtr = dataSpace.getInt32(herePtr)
dataSpace.setInt32(herePtr, scratchPtr + 160)
( n "nnnn*" -- ) \ Skip zero or more consecutive chars with value n
CODE SKIP
() => inputBuffer.skip(String.fromCharCode(pop()))
END-CODE
( n "[^n]*n" -- c-addr u ) \ Parse 0 or more characters until the next
\ char n is found or the input is exausted. Return the address and
\ size of parsed string. The delimiter is excluded, but not counted
CODE PARSE () => {
const str = inputBuffer.parse(String.fromCharCode(pop()))
pushStr(scratchPtr, str.slice(0, 80))
} END-CODE
( "<spaces>name" -- c-addr u ) \ Skip consecutive spaces, parse a
\ valid name and return its address and count
CODE PARSE-NAME () => {
const str = inputBuffer.parseName()
pushStr(scratchPtr, str.slice(0, 80))
} END-CODE
: 2DUP ( n1 n2 -- n1 n2 n1 n2 ) \ Duplicate the 2 elements on top
OVER OVER ;
: MOVE ( addr1 addr2 u -- ) \ Copy u bytes from addr1 to addr2
BEGIN ?DUP WHILE >R OVER @ OVER ! >R 1 + R> 1 + R> 1 - REPEAT
DROP DROP ;
: S" ( C: ".*<quote>" ) ( -- c-addr u ) \ Save content up to next
\ quote on compilation time and push it addres and chars count on
\ run time
[CHAR] " PARSE >R HERE @ SWAP OVER R@ CHARS DUP ALLOT
MOVE POSTPONE LITERAL R> POSTPONE LITERAL ; IMMEDIATE
: TYPE ( c-addr u -- ) \ Print u chars starting at c-addr
BEGIN ?DUP WHILE >R DUP C@ EMIT CHAR+ R> 1 - REPEAT DROP ;
: ." ( C: ".*<quote>" -- ) \ Save content up to next quote on compile
\ time and prints it on runtime
POSTPONE S" POSTPONE TYPE ; IMMEDIATE
: .( ( ".*<c-paren>" -- ) \ Save content up to next closing bracket
\ and print immediatelly
[CHAR] ) PARSE TYPE ; IMMEDIATE
Finally, to add the final touch to this, let’s have a few more meta functions and implement an interpreter itself using forth.
( "<spaces>name" -- xt ) \ Pushes the execution token for name
CODE '
() => push(addressOf(inputBuffer.parseName()))
END-CODE
: ['] ( C: "<spaces>name" -- ) ( -- xt ) \ On compilation, parses the
\ next name and on runtime pushes its execution token
' POSTPONE LITERAL ; IMMEDIATE
( c-addr u -- xt n | 0 ) \ Looks for string and return its xt and 1
\ if immediate or -1 if not. If not found reuturn just 0
CODE FIND-S () => {
const sz = pop()
const addr = pop()
const d = dictionary.find(loadStr(addr, sz))
if (d) {
push(d.code)
push(d.immediate ? 1 : -1)
} else {
push(addr)
push(sz)
push(0)
}
} END-CODE
( x*i xt -- x*j ) \ Pops the given execution token and executes it
CODE EXECUTE
() => codeSpace[pop()]()
END-CODE
( -- flag ) \ Pushes -1 if compiling, 0 if executing
CODE IS-COMPILING
() => push(compiling ? -1 : 0)
END-CODE
( n -- flag ) \ Return -1 if n less than 0, zero if equals or larger
CODE 0<
() => push(pop() < 0 ? -1 : 0)
END-CODE
: 0> ( n -- flag ) \ Return -1 if n larger than 0, zero if equals or
\ less
NEGATE 0< ;
: 0<> ( n -- flag ) \ Returns -1 if n is not equals to 0
0= INVERT ;
: =
- 0= ;
: <
- 0< ;
: >
- 0> ;
: <>
= INVERT ;
( x1 x2 -- x3 ) \ Return the binary 'or' between x1 and x2
CODE OR
() => push(pop() | pop())
END-CODE
( x1 x2 -- x3 ) \ Return the binary 'and' between x1 and x2
CODE AND
() => push(pop() & pop())
END-CODE
: WITHIN ( test low high -- flag )
2DUP < IF
>R OVER > INVERT SWAP R> < AND
ELSE
>R OVER > INVERT SWAP R> < OR
THEN
;
( x1 x2 -- x3 ) \ multiply
CODE *
() => push(pop() * pop())
END-CODE
: >NUMBER ( prev c-addr1 u1 -- curr c-addr2 n2 )
ROT >R -1
BEGIN OVER 0<> AND WHILE
OVER C@ DUP [CHAR] 0 [CHAR] 9 1 + WITHIN IF
[CHAR] 0 - R> 10 * + >R 1 - SWAP 1 CHARS + SWAP
-1
ELSE DROP 0 THEN
REPEAT
R> ROT ROT
;
VARIABLE ABORTED
0 ABORTED !
: ABORT ( -- ) \
1 ABORTED !
;
: ?ABORT ( n -- )
IF ." ?" ABORT THEN
;
: >NUMBER-WORD ( c-addr u -- n )
OVER C@ DUP [CHAR] - = IF
DROP 1 - SWAP
CHAR+ SWAP
0 ROT ROT
>NUMBER
?ABORT
DROP
NEGATE
ELSE
[CHAR] + = IF
1 - SWAP
CHAR+ SWAP
THEN
0 ROT ROT
>NUMBER
?ABORT
DROP
THEN
;
VARIABLE STATE
0 STATE !
: INTERPRET ( x*i -- y*j )
BEGIN PARSE-NAME ?DUP ABORTED @ 0= AND WHILE
FIND-S ?DUP IF
-1 = STATE @ AND IF
COMPILE,
ELSE
EXECUTE
THEN
IS-COMPILING STATE !
ELSE
>NUMBER-WORD
STATE @ IF POSTPONE LITERAL THEN
THEN
REPEAT
DROP
ABORTED @ IF ." Err" 0 ABORTED ! DROP ELSE ." Ok" THEN CR
;
And voilà, we have minimal weird forth system! As you can see on the sources here, we even use the INTERPRET as it at least correctly flag invalid words as error. You can also see the live demo here and inspect the source.
This was a little tiresome, but I still want to make some adjustments. First, I want to move the stacks an the user defined words to dataSpace, and other stuff that I now see a flawed, but this really is too big already, so Bye for now.