Update September 2016 :
I've added a token table which describes all the tokens that dflat understands - basically this is the list of statements, commands, functions and operators.
Core Features
The core features of dflat are:
- Integer only maths - just the 4 basic operators, plus comparison, shift and logical OR and AND
- Line numbers only used by the editor to store program lines in a sequence - no dflat commands directly refer to the line numbers.
- User defined procedures with parameters which can be made local. This is heavily inspired by the way BBC Basic did things.
- Simplified variable naming: Every variable has a prefixed type qualifier (e.g. %a is an integer variable). The opportunity with this is to perform easier syntax checking and runtime execution, which it does - sort of.
- Conditional and looping constructs : IF..THEN..ELSE..ENDIF, REPEAT..UNTIL, WHILE..WEND
- Array handling of scalar (integer variables) and strings. Integer variables can be 2 dimensional, strings can only have one dimension in addition to the length dimension that every string must have (this approach is inspired by Atari Basic)
- Execution can start at any procedure invoked from the command line - my own convention is to use "_start()"
Keywords
I thought perhaps I should just dump the keywords, functions and operators that dflat supports. Anyone with a memory of BBC, Atari or Oric BASIC will know most of these commands.
Current list of supported symbols:
- print, println : prints a sequence to output device, with or without new line
- def, enddef : define a procedure start and end
- dim : dimension one or more variables (up to 2 dimensions)
- repeat, until : repeat a section of code until a condition is true
- for x,y,z, next : initiate an iterative loop with starting value x, end value y and step size z. Works like a BASIC for loop but the variable cannot be specified in the next keyword
- if, else, elseif, endif : execute a section of code if a condition is true, else execute another section, else execution another section if another condition is true. Control passes to the next statement after endif.
- list : list the program
- load 'x', "file" : load a dflat program named "file" from device 'x', where x==0 means the serial port, and x==1 means the SD card
- save 'x', "file" : save a dflat program names "file" to device 'x', where x is as above
- input : take a stream of CR terminated characters and put in a destination variable (e.g. input $a)
- mode x : change screen to mode x, where x=0 means 40 column text mode and x=1 means 32 column colour mode
- plot x,y,c : at coordinates x,y plot character c. If c is an integer then plots the byte value, else if c is a string then plots the string from x,y
- cursor x : Hide the screen cursor (1) or unhide it (0)
- cls : clear the screen
- poke a,x, doke a,x : single or double byte write of value x to memory location a
- peek(a), deek(a) : single or double byte read from memory location a
- stick(x) : read joystick, and with mask x (to allow only a certain signal to be tested for). The joystick signals are active 0 as follows : bit 7 = fire, bit 6 = left, bit 5 = right, bit 4 = up, bit 3 = down.
- key(x) : read a key from the input stream synchronously (x=1) or asynchronously (x=0)
- setvdp reg, val: Set VDP register reg to value val (low control of the VDP is possible with this command)
- sprite sp, x, y, ch, col : set sprite number sp to position x,y using character pattern ch and colour col
- spritepos sp, x, y : set sprite number sp to position x,y
- sound ch,per,vol : set tone channel ch to period per and volume vol. If vol == 0 then channel will use envelope to modulate volume. If ch == 0 then set the period of the noise channel (vol is ignored)
- play te, ne, env, per: set up the tone and envelope configuration. te = 3 bit number for tone enable, ne = 3 bit number for noise mix with each tone channel, env = 4 bit envelope setting, per = 16 bit period.
- music ch, oct, note, vol: play music on channel ch, with note (0-11) and octave oct (0-6) and volume vol (0-15)
- left($a,%x), right($a,%x), mid($a,%x,%y),len($a) : standard string routines to get left, right, mid and length of a string
Some of you may recognise the sound commands - they are based on the approach used by Oric BASIC. This is both for nostalgia reasons and because both my computer and the Oric use the same sound chip.
I have a few unresolved bugs / limitations which I need to figure out and do something about:
- It's quite easy to crash dflat by mismatching variable types, so need more robust handling at the cost of a bit of speed
- String expressions are very limited. The best way to work on strings is to steadily build it up over a series of statements as it could go horribly wrong when trying to do it all at once!
- Numeric operator precedence is a bit bespoke! Every operator has a precedence e.g. + is higher than -, * is higher than /. So there is no such thing as operators with equal precedence. I will need to add a precedence table to fix this at some point - previously I thought I had made something that is logical but much simpler to implement, but I should have realised that Djikstra would have come up with this already if it was possible!
I now have the following features:
- Expression handler now does standard order of precedence. I thought I would need to use the shunting yard algorithm but when I started looking at it I realised it was much simpler (see below for more information).
- Basic data types of string and integer, with array capabilities for both.
- Flow of execution constructs: procedures, repeat..until loop, for..next loop, if-else-endif conditional execution.
- Input and output: input, print, println
- Basic graphics commands: mode, plot, cls
- Memory commands: Poke, Peek, Doke, Deek
Benchmarks
dflat has enough capability to try out benchmarks that test the performance of the language. I have used PCW (a UK monthly computer magazine) BASIC benchmarks from the early-mid 80s as this represents the closest era that my homebrew is equivalent to. The most complex benchmark tests the trig functions, but as my language is integer only (for now), the next most complex benchmark (number 7) is the one I have used.
I have tried to reproduce the original benchmark using dflat as faithfully as reasonable without trying to optimise further. Here is original and dflat code:
PCW Benchmark from early 1980s: 20 LET k=0 25 DIM m(5) 30 LET k=k+1 40 LET a=k/2*3+4-5 45 GOSUB 700 46 FOR l=1 TO 5 47 LET m(l)=a 48 NEXT l 50 IF k<1000 THEN GOTO 30 700 RETURN dflat equivalent from 2016! 100 def_bm7() 110 println "Start" 200 %k=0 210 dim %m[5] 220 repeat 230 %k=%k+1 240 %a=((%k/2)*3)+4-5 245 _sub() 260 for %l=1,5,1 270 %m[%l]=%a 290 next 300 until %k==1000 305 println "Stop" 310 enddef 320 def_sub() 330 enddef
I got the benchmarks from the following site (* don't click on the nonsense adverts *):
http://www.geocities.ws/peterochocki/computers/pcwbm.html
Near the bottom of this page is a very interesting table of benchmarks for various old to new(ish) computers. I mentioned before that my expectation was that my restricted integer-only language should be quite a bit faster than most 1980s 6502 and Z80 based systems, save for the BBC Micro which is known to have a very speedy implementation.
I ran the benchmark a few times, and the result is....drum roll.... 7.5 seconds. I am very happy with this, it easily beats all the 6502 and Z80 machines of that era - including the BBC. Typical timings are ZX Spectrum 77.6s, Commodore 64 47.5, Atari 800XL 60.1s, ZX80 (integer only) 39.2s, BBC B 21.9s.
However, I need to acknowledge that aside from the simplicity (and therefore speed) of only working with integers, my system is also clocked at 2.68Mhz, which is significantly faster than all the other machines including the BBC. However, even adjusting for clock speed, dflat is significantly faster, and that is down to the major factor of unsigned integer arithmetic, plus extensive tokenisation and optimisation before and during runtime. For example:
- All variables are converted to look-ups as part of the syntax checking
- For and Repeat loop addressed are stored on the stack, so no inefficient line searching needed during iteration
- Once a procedure is invoked the first time, its address is cached so that further invocations are direct
I will now go on to some essential functions (save, load) followed by some useful graphics and sound functions.
The main challenge now is ROM space. Currently I have less than 1.5K remaining. This is starting to feel tight as I have a few more features to implement (e.g. more graphics and sound), but I haven't really space optimised the code so hopefully it will be enough. Plus I know that I can free up almost 1K of font data if I am desperate.
Introduction
This log is going to document my investigations and implementation of an interpreted language capable of running from ROM. Once doing this, my system will have the built in facility to write, debug and run programs without any external dependency - at the moment I have to use an assembler on the PC.
Initial Research
So, where to start in building a computer language? This has been done many times before of course, but the purpose of doing it for myself is the personal learning and hobby interest I have in knowing how stuff works. And of course the slightly egotistical pleasure of having made a language for my own requirements.
I have done a lot of background reading, relearning about lexers, parsers etc. I say relearning because this is taking back to my computer science undergraduate days - it's reassuring to find the concepts haven't changed dramatically.
The academic side of understanding how to design and build a programming language is good, but I have real world constraints - an 8 bit CPU with limited RAM and ROM!
I looked at Forth - a very compact language, fairly well suited for 6502. In fact I remember the 48k of the Oric-1 came with Forth on tape (alas, I had the 16k version). So if it can work on a 1Mhz 6502, I am sure it would be fine on my homebrew. But the reason for the compactness of Forth is the stack orientation, terseness and reverse polish notation which basically means that expression evaluation and parsing is made rather more straightforward.
So Forth is out, it doesn't feel like 'progress' enough for me! My first language was BASIC (I am of that age), so might be a natural choice. But I wanted to make an advance on 80s BASIC by having no line numbers - and therefore all program control is through structured constructs like procedures, functions and blocks.
The implication of no line numbers is that I would need a full-screen editor capability and this was going to take yet more ROM space (which elsewhere I have noted I am using up quite rapidly).
So I have decided that my language will use line numbers - but only as a way of inserting, editing and removing lines. The first execution statement will not be the lowest line number, but a 'main' type procedure. All flow of control will be through structured programming constructs - there will be no GOTO or GOSUB!
So I now need to develop the syntax. I'll post examples of the syntax as I progress - still thinking through the grammar.
Syntax
I spent a while trying to work out a syntax which was human readable but also easy for the machine to decode. Keeping it easy for a 6502 to decode will help me keep the code small and running fast.
Variables:
All variables must be pre-fixed with a type - one of $, %, ^ or ! e.g.
- %a : integer variable a
- ^a : byte / character variable a
- $a : string variable a
- !a : floating point variable a
I've decided not to support floating point right now - need to get something working first.
The program editor uses line numbers, but does not have control and flow keywords which refer to line numbers directly - the line numbers are purely to organise code in memory. Some examples below:
100 ; This is a comment 110 ; Next line is an example of a conditional. 115 ; ':' separates statements on same line 120 if %a > %b then:%a = %b:else:%b = %a:endif 130 ; _ is reserved to defined and invoke user functions 140 _myfunction(1, 2) 150 ; The next line defines a user function 160 def_myfunction(%a, %b) 162 ; make variables local scope explicitly else they are 163 ; globally accessible 165 local %a 170 return %a + %b 180 enddef 190 ; for, while and repeat all supported 200 repeat 210 %a = %a + 1 220 until %a > 10 230 while %a <> 0 240 %a = %a - 1 250 wend 275 ; for loop syntax simplified to specifying start, end, step 270 for %a = 1, 10, 1 280 println %a 290 next %a 295 ; strings have to be dimensioned before use, as do number arrays 300 dim $a[10], %a[10,10] 310 ; Note in the above that $a and %a are different variables
Will stop there for the moment - let me know what you think of the syntax. It has been inspired by various influences including:
- Atari 8 bit BASIC e.g. approach to strings and the pre-tokenisation of text
- BBC 8 bit BASIC e.g. user functions and local variables
- Oric-1 BASIC e.g. repeat..until loop (although other BASICs have this too)
- General BASICs e.g. for..next loops
Whether I get all the constructs working the way I want will depend on space.
Numeric expression tokenisation and runtime
This rather wordy subtitle is capturing some notes on progress. One of the more complex areas I have come across is expression parsing and execution. Let me give an abstract example of an expression using a simplified BNF type notation, restricted to numbers only:
<expr> = <term>[<op><term>]
<term> = digit[digit]
digit = 0..9
<op> = + | -
The above notation allows simple expressions like
- 1 + 2 - 3
Great, I can basically parse a syntax tree like this linearly. But what if a term needs to be more complex i.e.
<term> = digit[digit] | (<expr>)
So this allows me to make expressions like
- 1 + (2 - 3)
This is all still quite graspable I hope you will agree. Basically <expr> is recursive as it refers to itself (as part of the definition of <term>, and nicely implementable using a friendly language like C or Java, but how to do it nicely using a lowly 6502 with 3 registers!?!?
The main thing to worry about at assembly level is saving state when recursing, or trying to eliminate state altogether. Actually that's not possible - we have to know where we recursed from to be able to return to that point. But that part is looked after by the 6502 itself - making a subroutine call (JSR) automatically pushes the program counter on to the processor stack.
The interpreter takes a line of input and attempts to tokenise and syntax check before saving to program space (or executing if it is in immediate mode). I have an input pointer and output pointer which tell me where the next character to be processed from of the input is, and where the next tokenised character will go. I only advance these pointers when I am committed to taking them. Examples of where I may not be committed is checking the operator symbol table to match against multi-byte characters (the above simplified example misses operators like '<<' and '>>'.
So that means I can have an assembly routine called (say) tok_nexpr to tokenise a numeric expression. I only advance the input pointer when I am committed to consuming it, and when I see the '(', I call tok_nexpr again. On return I do a mandatory check for ')' as part of the syntax check.
So that looks pretty straightforward, once I had built all the scaffolding routines to deal with non-bracketed expressions, it was easy to extend to arbitrarily complex expressions to be tokenised.
But the runtime is more complex. Now we're not simply checking the input stream, but executing the tokenised output stream. An expression like 1 + (2 - 3) requires executing in the correct order - i.e. the brackets have to be evaluated before the rest of the expression.
I do this by maintaining an argument and operator stack when executing the numeric evaluation routine called (say) rt_neval. As the runtime is executing, any 'terms' it sees are pushed on the argument stack and any 'operators' are pushed on to the operator stack. When the routine first runs, it pushes a 'floor' value on the operator stack so it knows how much of this stack was used during this evaluation. When the routine has pushed all arguments and operators, it is ready to process. It does this simply by popping an operator, and then invoking the operator code (e.g. the '+' operator will invoke the addition routine). The operator code pops two values off the stack, does the appropriate calculation and pushes the result back on the stack. The next operators is then popped, and so on until the 'floor' value is found, when means this iteration of the routine has completed. At this point a result is on the argument stack, and the routine returns.
When a bracket is encountered, the rt_eval is invoked recursively, and when it eventually returns, a close bracket is expected (it will always be present as the tokeniser checked for this).
Having just written all this down sounds quite simple! Recursion in assembly can be a bit mind numbing - the trick is to not think too hard about the thread of execution, but concentrate on the logic and what needs to be saved and restored during recursive iteration. The beauty of recursion is that if it is got right, then the complexity that it can handle is a natural outflow of the algorithm (see picture for a couple of small examples).
So, what this all means is that I basically have a functional tokeniser and runtime evaluation routine for numbers. Strings will be easier as I will simply process string expressions left to right with no operator precedence. In fact I am feeling lazy at the moment so I may not implement it even for the numeric evaluation routine - the reason is that I can always guarantee an order of evaluation using brackets. That will avoid having to re-order arguments and operators using the well documented 'shunting yard' algorithm first published by Djikstra (if you haven't read about him, please do, he's a really key figure in computer science).
UPDATE: I have implemented operator precedence and it was rather easier than I thought it would be. When I actually started to look at this properly, I realised that I didn't need to use Dijkstra's algorithm at all. This is due to the fact that rather than pushing arguments and operators on to their stacks and then processing them all in stack order, I can simply do the following to implement operator precedence:
- Before pushing a new operator on to the operator stack, check what is at the top of the operator stack
- If the top of the operator stack is higher precedence than the new operator, then execute the top of stack operator, which will result in the operator on the stack being popped and processed.
- If the top of the operator stack is not higher precedence than the new operator, then push the new operator on to the stack as per normal.
I am now going to busy myself finalising variable handling (I need to be able to declare and access arrays), and then I can turn to iteration and conditional structures - at the moment the interpreter runs sequentially through each line, which doesn't make for being able to make very interesting programs!
The core dflat now has additional useful commands to help with games, namely:
- The ability to poke and peek both main ram and video ram
- The ability to set sprite attributes (position, character, colour)
- The ability to set up sounds on the AY-3-8910
- The ability to read the joystick
I'm sure you can guess where this is heading.. So I will be creating a simple game using graphics and sound to demonstrate both the hardware working and the speed of dflat.
I have built enough of the core language to try out benchmarks from the 1980s. A popular and credible magazine of the 1980s in the UK was Personal Computer World (PCW). They had a range of benchmarks to test review machines, so I have subjected my system to one of these as a trial. The results have impressed me - dflat is quite fast! See later in the blog for more information.
Discussions
Become a Hackaday.io Member
Create an account to leave a comment. Already have an account? Log In.
Yes, there must be a tension between creating a language interface which is very close to what the specific hardware does, and creating one which is general (portable) but can't make the most of any specific chip.
Are you sure? yes | no
Very interesting! Is it worth chatting to Peter Noyes and sharing notes on what commands are needed to support games programming, and what the interfaces should look like? Even if you can't agree, you might each learn from the other's ideas!
(See his Dodo post here: http://forum.6502.org/viewtopic.php?f=2&t=4257 )
Are you sure? yes | no
Hi BigEd. Yes I'm aware of Peter's excellent efforts, it's very good indeed. I will take a close look at Peter's software thoughts. Mine choices were influenced directly by the hardware I'm using (e.g. the TM9918 sprite capability and the AY-3-8910 sound controls) and the BASIC languages I used (mainly Oric-1 and Atari 800XL).
Are you sure? yes | no