Truth In Numbers

Jun-16 12:09 PM

With the two new mechanisms available, it was time to test. Since the motivation for the activity was to get a feel for the improvements, I decided to test a little differently. This time I did not use the profiler, but rather added a simple stopwatch around the execution of the program. The test program was intended to be run forever, so I altered it to run for just 10 generations and then exit. Since I was running this on Windows, I used the QueryPerformanceCounter() api to make my stopwatch:

int main(int argc, char* argv[])
{
    LARGE_INTEGER liFreq;    //... so chic
    LARGE_INTEGER liCtrStart, liCtrEnd;
    QueryPerformanceFrequency ( &liFreq );

    QueryPerformanceCounter ( &liCtrStart );
    //setup to execute program.

    ubasic_init ( _achProgram, false );
    //run the program until error 'thrown'
    do {
    if ( !setjmp ( jbuf ) )
        ubasic_run();
    else
    {
        printf ( "\nBASIC error\n" );
        break;
    }
    } while ( ! ubasic_finished() );

    QueryPerformanceCounter ( &liCtrEnd );
    double dur = (double)(liCtrEnd.QuadPart - liCtrStart.QuadPart) / (double)liFreq.QuadPart;
    printf ( "duration:  %f sec\n", dur );

    return 0;
}

This should give the 'bottom line' numbers on how these two improvements worked out. Plus I wanted to visually see it!

Also, I wanted to subtract off the contribution of screen I/O. Screen I/O on Windows console is quite slow, and having that contribute to the numbers would be meaningless for the target platform, since it has a completely different I/O system (SPI LCD? I don't know). This was easy, fortunately, since the I/O from the BASIC is character oriented -- I simply put a conditional compile around the actual output routines:

//this lets us suppress screen output, which skews timing numbers
//towards console IO away from our program's computation time
//#define NO_OUTPUT 1

//write null-terminated string to standard output
uint8_t stdio_write ( int8_t * data )
{
#if NO_OUTPUT
    return 0;
#else
    return (uint8_t) puts ( (const char*)data );
#endif
}

//write one character to standard output
uint8_t stdio_c ( uint8_t data )
{
#if NO_OUTPUT
    return data;
#else
    return (uint8_t)_putch ( data );
#endif
}

Testing time! I created 4 test scenarios:

no modifications (other than what was required to run on the desktop)
improved tokenizer only
improved goto logic only
improved tokenizer and goto logic

I also created two flavors: suppressed screen output, and included screen output.

I ran each configuration a few times and used the 'best of three' timing for each. I used 'best of three' because wall-clock timing this way can be affected by all sorts of other stuff coincidentally going on in the operating system at the time, so I wanted to have a little insulation from that.

With screen output suppressed:

baseline implementation
duration: 5.244856 sec
relative speed: 1
improved tokenizer
duration: 0.450086 sec
relative speed: 11.653
improved goto logic
duration: 1.494159 sec
relative speed: 3.510239
both improved tokenizer and goto logic
duration: 0.130637 sec
relative speed: 40.14831

And to observe how much the screen I/O contributes to runtime in my test program:

baseline implementation
duration: 7.029072 sec
relative speed: 1
improved tokenizer
duration: 1.969379 sec
relative speed: 3.569181
improved goto logic
duration: 3.020004 sec
relative speed: 2.327504
both improved tokenizer and goto logic
duration: 1.592177 sec
relative speed: 4.414755

So there is was! It turned out my intuition about improving the tokenizer did indeed yield the biggest improvement, and my later intuition that focusing on improving the goto logic instead was wrong. But both improvements are worthwhile, and if you use them together, even in this simple way, you can expect a 40x improvement in program speed. 40x. I'll take it!

Also, running with the screen I/O in, you can see that it is indeed rather slow, and swamps out a lot of the BASIC improvement gains. In this case, the screen I/O incurred about a 1.4 sec fixed cost to the total execution time. These particular numbers don't mean much for the target platform, but it's probably worth keeping in mind.

I wrote up the final results and sent it off to [my correspondent on the project]. It was suggested that I write up the process in this project form, that someone might find it interesting, hence this blog.

Project complete!

Line Number Cache

Discussions

Become a Hackaday.io Member