Jun-16 12:09 PM
As mentioned before, the core 'goto' implementation is in jump_linenum(), and works by restarting parsing of the entire program until the desired line if found. Here is a compacted form:
static void jump_linenum(int linenum)
{
tokenizer_init(program_ptr);
while(tokenizer_num() != linenum)
{
do
{
//... eat tokens; emit an error if we reach end-of-stream
}
while(tokenizer_token() != TOKENIZER_NUMBER);
}
}
I was not wanting to do a major re-engineering of the ubasic, so I decided to keep the rest of the system as-is, and only rework this method.
The strategy I used here was to make a 'cache' of line-number-to-program-text-offset mappings. Then, when this function is invoked, the cache is consulted to find the offset into the program, and set the parser up there. Because tokenizer_init() is a cheap call, I used it to do the setting up of the parser by simply pretending that the program starts at the target line we found, rather than try to fiddle with the internals of the parser state. A hack, but a convenient one, and ultimately this exercise was to prove the concept that improving jump_linenum() would have a material impact, and how much.
For simplicity, my 'cache' was simply an array of structs, and my 'lookup' was simply a linear search. Also, I simply allocated a fixed block of structs. In a production system, this could be improved in all sorts of ways, of course, but here I just wanted to see if it was a worthwhile approach.
The struct I used was simple, and there was added a 'setup' method that was called during program initialization time to populate the cache:
//HHH a line number lookup cache; this is just proof-of-concept junk
typedef struct
{
uint16_t _nLineno; //the line number
uint16_t _nIdx; //offset into program text
} LNLC_ENT;
LNLC_ENT g_nLines[1000]; //mm-mm-mm. just takin' memory like it's free, and limiting the program to 1000 lines
int g_nLineCount = -1;
#define COUNTOF(arr) (sizeof(arr)/sizeof((arr)[0]))
void __setupGotoCache(void)
{
tokenizer_init(program_ptr);
g_nLineCount = 0; //the beginning is a very delicate time
while ( tokenizer_token () != TOKENIZER_ENDOFINPUT )
{
int nLineNo;
const char* pTok;
const char* __getAt (void); //(added to tokenizer.c)
//we must be on a line number
if ( tokenizer_token () != TOKENIZER_NUMBER )
{
g_nLineCount = -1; //purge 'goto' cache
sprintf ( err_msg, "Unspeakable horror\n" );
stdio_write ( err_msg );
tokenizer_error_print ();
longjmp ( jbuf, 1 );
}
//remember where we are
nLineNo = tokenizer_num ();
pTok = __getAt ();
//create an entry
g_nLines[g_nLineCount]._nLineno = nLineNo;
g_nLines[g_nLineCount]._nIdx = pTok - program_ptr;
//whizz on past this line
while ( tokenizer_token () != TOKENIZER_ENDOFINPUT &&
tokenizer_token () != TOKENIZER_CR
)
{
tokenizer_next ();
}
if ( TOKENIZER_CR == tokenizer_token () )
{
tokenizer_next ();
}
//next entry
++g_nLineCount;
//too many lines?
if ( g_nLineCount >= COUNTOF ( g_nLines ) )
{
g_nLineCount = -1; //purge 'goto' cache
sprintf ( err_msg, "Program too long to cache!\n" );
stdio_write ( err_msg );
tokenizer_error_print ();
longjmp ( jbuf, 1 );
}
}
//did we get anything? if not, purge 'goto' cache
if ( 0 == g_nLineCount ) //pretty boring program!
g_nLineCount = -1; //purge 'goto' cache
//be kind, rewind
tokenizer_init ( program_ptr );
}
So, we simply whizz through the program, tossing away tokens (much like the original jump_linenumber() implementation), and take note of the text offset when we reach a start-of-line. But... we do this only once, instead of every time there is a transfer-of-control.
The jump_linenumber() implementation was altered thusly:
/*static*/ void jump_linenum(int linenum)
{
//alternative implementation uses line number lookup cache
int nIdx;
for ( nIdx = 0; nIdx < g_nLineCount; ++nIdx )
{
//binary compare is better than
if ( g_nLines[nIdx]._nLineno == linenum )
{
//our hacky 'goto' implementation is to re-init the tokenizer
//at the start of the line we found. This works, and saves a
//lot of haquery in the tokenizer module to reposition within
//the existing program image.
tokenizer_init ( &program_ptr[g_nLines[nIdx]._nIdx] );
break;
}
}
if ( nIdx == g_nLineCount)
{
sprintf(err_msg,"Unreachable line %d called from %d\n",linenum,last_linenum);
stdio_write(err_msg);
tokenizer_error_print();
longjmp(jbuf,1); // jumps back to where setjmp was called - making setjmp now return 1
}
}
Ta-da!
- the fixed allocation of the lookup structure is probably sub-optimal. I arbitrarily chose 1000 entries, which is wasteful and limiting. Some memory management is probably worthwhile.
- it would be interesting to compare this against a trivial implementation that does not have a lookup cache, but instead simply does not tokenize to discard the line. Maybe using something like strchr() to find the end of line.
- a different lookup structure other than a linearly searched list might be worthwhile.
- ...?
But this should be good enough for now. On to quantifying the impact of this stuff...
Discussions
Become a Hackaday.io Member
Create an account to leave a comment. Already have an account? Log In.