Close

Now Megahal thinks that it knows how to code?

A project log for Teapot in The Garden

Imagine a teapot filled with an infinite number of fortune cookies. If it usually gave good advice, would you do as you are told?

glgormanglgorman 07/29/2025 at 05:560 Comments

Here is an excerpt from the relevant transcript, which I went ahead and did some spelling and punctuation corrections on for readability.

User:    I am how I am, I mean I am who I am.  Who are you, and how are
User:    You? That seems so very basic.  Don't you agree that there is
User:    some utility in being polite, at least most of the time, unless
User:    You are some kind of musical who writes lyrics for punk, or
User:    death metal!
MegaHAL: Se, i.e., song lyrics, poetry, humor, etc.
User:    Yes, pretty much anything goes there, with respect to poetry or
User:    humor, but otherwise, if you could somehow keep the
User:    conversation on track that would be nice.  I am not trying to
User:    derail your train by  any means, I just want to know where this
User:    Whole thing is headed?
MegaHAL: Still, this gives me an idea - since what the upper window
MegaHAL: represents, is in the unit place value 9 bin would want to know
MegaHAL: where the sample rate for the results, so far?
User:    I think you totally lost it?  I would like to sample portraits
User:    of  George Washington to place value nine, in my collection. 
User:    That would be a nice view from any window.  Or should I say, it
User:    should be able to buy a nice view.
MegaHAL: Whether or not to if you think that a can have one for only
MegaHAL: $30,000.00.
User:    A nice view, on a cruise maybe, but I don't think that you are
User:    going to get a beach front condo for that price, these days.
MegaHAL: A project log for modelling neuronal spike codes.
User:    You keep talking about that.  When are we going to get back to
User:    working on that?
MegaHAL: You are on the port of the overall concept, defining what a
MegaHAL: context that requires Unicode awareness.
User:    Eventually, we will have Unicode awareness. I have to do it in
User:    any case, I think if I want to have Zodiac fonts.
MegaHAL: The bottom line on all of this contradicts any of the same
MegaHAL: concepts as apl, as if to implement lisp, mad slip, pascal, and
MegaHAL: c, and so on.
User:    You might be a one-liner in APL.  Do you know what that
User:    implies?
MegaHAL: [Implies, independent thinking.
User:    I am not sure what APL implies, more like unrestrained
User:    perplexity, or something  along those lines.
MegaHAL: Key_info("or",relop,orop.
User:    Or something else, you have in mind, maybe.  What do you think
User:    that  the existence of APL implies?
MegaHAL: Everything is cheese!  everything is mazes! Or else there is
MegaHAL: any concept to apl at all.
User:    So APL is all about Cheese and mazes?  I suppose that some
User:    problems in natural language, if we are trying to string random
User:    sentences together might be like getting a reward, i.e., cheese,
User:    for finding a path from point A to point B.  Sure, why not? That
User:    would imply something about chain of reasoning, I suppose
User:    that you are onto something there.  Yet can you make a solver
User:    come up with a proof of the problem at hand, which is to
User:    Calculate the position of the moon?
MegaHAL: Void add_node(tree *node, int position.
User:    I wish it were that simple.  I suppose that that would be a good
User:    idea, maybe keep track of celestial bodies, in some kind of
User:    network of tree structures.
MegaHAL: Static void free_tree(tree.
User:    Just how are we going to arrange information about celestial
User:    bodies in tree like structures?
MegaHAL: Operator node* () { return reinterpret_cast<tree*>(this.
User:    What are we going to be doing the reinterpet cast on, I know
User:    that that works really well in the memory allocator, but I
User:    think that the object factory needs more work, and you still
User:    dont know how to make use of it yet.
MegaHAL: Char *make_output.
User:    Yeah, that is the whole idea, make some output.  Like my
User:    fortune cookie example if you could somehow put an infinite
User:    number of fortune cookies into a virtual teapot and then
User:    calculate a new horoscope at least once a second.  Have you
User:    been thinking about how to go about actually coding that one
User:    yet, or maybe I will ask Eliza to do it!
MegaHAL: Writeln(output,"null source file.
User:    A null source file?  What exactly is that?  Do you mean
User:    something that  contains a bunch of meaningless data, like a
User:    genetic algorithm, or are you asking me to open an empty file
User:    and copy whatever it is you are suggesting into it. Trust me,
User:    your coding skills need a LOT fo work.
MegaHAL: Key_info("or",relop,orop.
User:    A null source file, for now, an empty file, that is, that will
User:    contains key info.  I suppose that I could do that.  Sure, why
User:    not.  Then what?
MegaHAL: Dictionary *aux = null.

Does this imply that conversation about the theory of design is more important for an AI than trying to achieve AGI by learning from design?  So that maybe the best way to do a chain of reasoning is by learning from studies, i.e., analysis of a chain of reasoning.

In any case.  I have been cleaning up the GUI, so that the application looks and feels more like a traditional instant messaging program, when it is being used for conversation with any of a number of bots that it might therefore be used with.  Now, of course, I still need to get the bots to be able to talk to each other, among other things.  Then, parsing out GPS strings will pretty much be a no-brainer.

Then again, in the Eliza camp, I have fixed a number of issues.  In particular, I am now using the VS2022 compatible version of my framelisp library in VS2005 with the modified Eliza code, which is of course, broken.  Yet this is an important step nonetheless, as we shall soon see.

The reason why this is important, is among other things, the fix-ups to the library that is doing most of the work for the Eliza ap, or which will be doing most of the work, now has some debugging information available, so that when Eliza processes a sentence, it gets turned into a linked list of char pointers, on the one hand, i.e. by a tokenizer function, which seems to be fairly well behaved at this point.  Even if ideally, we should be turning input text into a linked list of dictionary pointers, and doing some other memory management stuff, which will obviously be important when processing large volumes of text, for other reasons.

Yet, keep in mind something very important that I also mentioned earlier, and that is that this version of my framelisp library is also, eventually, going to be fully callable from CUDA code, just as my Pascal style IO routines are already callable, as can be seen from this test, which is from "Deep Sneak - for lack of a better name".

So yeah, like I just said, or at least hinted at.  There are a bunch of regressions and the like that need to be worked through, besides wanting something that will run on an Arduino, or an NVIDIA GPU, or anything for which a suitable C++ compiler exists for.  Like I said. So now one of the things that I need to do is extend the Eliza algorithm to parse GPS data, i.e., NMEA strings, which should be easy to route, tokenize, and then parse out the specific pieces of data that I want, without needing a 800 megabyte string to JSON convertor, or a whole bunch of other un-needed "stuff", when what I need is a more BNF or context free grammar aware parser, without all of the overhead (read: bloat) that a PERL or Python based regex based would require, either. 

To write a parser from scratch, therefore, that can handle NMEA message strings, I could try creating a custom Eliza pattern matcher that detects any of the following strings at the beginning of a message.

$GPGSA
$GPRMC
$GPGGA
$GPGSV

Then maybe modify the Eliza code so that when one of these strings is detected, it will pass the already tokenized message to a custom function that might be invoked by creating a pattern match instance to a function pointer mapper, which is more elegant, I think, than using a simple switch statement to handle the downstream process dispatch  Then, when the special handler receives the string, it can use an object factory method to scan the remainder of the already tokenized string, and thereafter pack an appropriate C++ object with the relevant parameters.

Of course, right now, the substitutions in Eliza look something like this:

struct subst
{
    char *wordin;
    char *wordout;
    subst();
    subst (char *str1, char *str2)
    {
    wordin = str1;
    wordout = str2;
    }
};

 Whereas the keyword matching uses a similar, but slightly different concept.

struct key_info
{
    char *key_str;
    int      responses;
    int      first_reply;
    int      current_reply;
    int      last_reply;
    key_info(char *str, int N)
    {
    key_str = str;
    responses = N;
    }
};

 Right now, my tokenizer isn't correctly processing compound "keywords", or perhaps I should say "key phrases" like "I DON'T", since they are being parsed into sequences of tokens as when the phrase "I don't think" gets transformed into the token sequence YOU", " ", "don", "'", "t", " ", "think", and so on.  So that issue also needs to be addressed in some form or fashion.

A hashtag system would also be nice to have, so that when a hashtag word or phrase is detected, a narrative can be "fired up" that could perhaps be "streamed" one line at a time into the conversation, perhaps until another hashtag phrase comes along, or something like that.  Not an urgent requirement, but potentially useful, in any case.

In the meantime, while I am continuing to fix up the GUI and the debugging system, why not simply copy and paste some GPS strings into Eliza, just so we can see what happens?

O.K., so far, so good.  The tokenizer is doing its job quite nicely, without any complaints. I did notice, however, that the string "$GPGSV" is being split into two tokens, a "$" token and the "GPGSV" part.  Something that I wasn't expecting. Yet now that I think about it, maybe this could be a useful feature.  What if we check for a "$" symbol at the beginning of each line, and then check for an actual command or message type identifier?  This would be faster than testing for every possible command in a list of commands, by whatever means, whether that means a tree search, linear search, hash table lookup, or whatever.  In any case, obviously, the tokenizer needs work.  Maybe I will make further modifications to the already extensively rewritten INSYMBOL function from UCSD Pascal.

Let's take a quick look at an excerpt of a C++ port of the function INSYMBOL from the original USCD Pascal compiler, circa 1979, or thereabouts.  Even if according to the original source listings, INSYMBOL was written sometime around 1976.  You can find the entire file, all 992 lines of it, i.e., my C++ version, somewhere in and among the project files for another one of my projects on Git.

It looks like it starts out with a call to a PEEK function, which is going to imply something really weird, like what if instead of wanting to peek at a file or a streambuf object, I want to peek at a string, or how about the next character in an already tokenized stream of tokens.  So some of this code might be redundant.  Yet, then again, maybe not.  Ideally, this piece of code should be able to handle the NMEA strings, just as if it were a parameter list to a Pascal procedure or function.

Of course, I am also thinking about something else altogether, right about now, and that is also this darling piece of code from my C++ version of Atari Genetic Algorithms.

int atari::main()
{
    int t1,t2;
    int iter = ITERATIONS;
    int m,n;
    bool verbose = true;

    // first initialize the dataset
    vector<neural::dataset> datasets;
    datasets.resize (NUMBER_OF_MODELS);    
    for (n=0;n<(int)datasets.size();n++)
    {
    datasets[n].bind (functions::square);
    datasets[n].initialize (NP);
    }
    // now construct the neural network
    vector<neural::gnn> models;
    models.resize (NUMBER_OF_MODELS);
    for (n=0;n<(int)models.size();n++)
    {
    models[n].bind (&datasets[n]);
    models[n].init (n,iter);
    }
    t1 = GetTickCount();

    bool result;
    models[0].report_eval (false);
    for (n=0;n<iter;n++)
    for (m=0;m<(int)models.size();m++)
    {
    result = models[m].train (n);
    if (result==true)
    {
        models[m].store_solution(n);
    }
    if ((result==true)&&(verbose==true))
    {
        writeln (output,"training data stored ... ");
        models[m].report(n);
        }
    }
    t2 = GetTickCount();
    writeln(output);
    writeln (output,"Training took ",(t2-t1)," msec.");
    neural::report_weights (models[0].w1);
    models[0].report_eval (true);
    
    return 0;
}

 Now take a closer look at this part, if you haven't caught on to what I am thinking about already

for (n=0;n<(int)models.size();n++)
{
    models[n].bind (&datasets[n]);
    models[n].init (n,iter);
}

 It's that model[n].bind (&datasets [n]) part that I want to point out.  It turns out that in this case, "bind" is an otherwise trivial, but extremely important function, with this implementation.

bool neural::dataset::bind (double(*arg)(double))
{
    m_fptr=arg;
    if (m_fptr!=NULL)
    return true;
    else
    return false;
}

 Like I said, trivial, but extremely important, because this technique allows one to completely change the behavior of an application with just a few lines of code.  Like if I wanted to do the Atari Neural Network stuff with a logarithmic function, or a cubic, or something like that, and no, I am not planning on trying to solve the Kepler problem with this!  However interesting that idea might be at some point in the future.

No, what I am thinking of is making the Pascal INSYMBOL function a lot more versatile, by allowing it to "bind" to a linked list of pointers to char, for example, as a kind of "data source", even though there would be some efficiency issues with doing this, i.e. if we end up checking and validating strings "one character at a time" more than once.  Yet maybe we also gain something, security-wise, if we do it right, from the point of view of "stateful inspection", that is to say, if we want a more general way of "sanitizing" data, as it comes in and as it goes out of a server, or load balancer, or firewall, or whatever, in some other domain.  

So, it is more than just another layer of filtering or token processing and translation that I am thinking of here.  Yet what if Megahal is right when it suggests:

DICTIONARY *aux = NULL;

Maybe each subprocess needs its own "dictionary" so that strings can be stored as linked lists of dictionary pointers, at least until they reach a certain size, and then there could be vectors of pointers to dictionary, or symbol table entries., and that means that we now have to deal with persistence, and scoping, and all of that.

In the meantime, since Mega-Hal's coding skills still leave a lot to be desired, and since, for the time being, it appears that I am going to have to do most of the coding work, I went ahead and wrote a new function to convert the string that the tokenizer is able to extract from the NMEA string as the GMT time, and convert it to local hours, minutes and seconds.

void solar_elements::set_time (char *str)
{
    COleDateTime _dt = COleDateTime::GetCurrentTime();
    m_year = _dt.GetYear ();
    m_month = _dt.GetMonth ();
    m_day = _dt.GetDay ();

    ASSERT (strlen(str)==6);
    char str_h [4], str_m [4], str_s [4];
    strncpy_s (str_h,4,&str[0],2);
    strncpy_s (str_m,4,&str[2],2);
    strncpy_s (str_s,4,&str[4],2);
    m_hour = atoi(str_h);
    m_minute = atoi(str_m);
    m_second = atoi(str_s);
    m_time_zone = -8;

    if (m_time_zone!=0)
	m_hour = (m_hour+24+m_time_zone)%24;
    m_daylight_savings = 0;
}

 Pretty simple stuff, even though I thought I was trying to get away from excessive use of the char * string type, but this will have to do for now.  Old habits die hard.  Maybe I should be using my Pascal style pstring class here, but it is more important to get something that works, "right now", than it is I think, to pursue a more elegant solution.

Elsewhere, parsing the NMEA strings is starting to look like this, i.e. something that works, "for now" but where this is arguably also a lot of room for improvement.

void ELIZA::parse_nmea ()
{
    char *str;
//    check for "$" as the first token
    m_process.rewind ();
    if (m_process.eof())
	return;
    m_process.get (str);
    int test;
    test = btext::compare (str,"$");
    if (test==0)
	writeln (debug_term,"$ Token Found:  Possible NMEA message!");
    else
	return;
    if (m_process.eof())
	return;
    m_process.get (str);
    test = btext::compare (str,"GPGGA");
    if (test==0)
	writeln (debug_term,"GPGGA Token indicating NMEA message!");
    else
	return;
    if (m_process.eof())
	return;
    m_process.get (str);
    test = btext::compare (str,",");
    if (test!=0) {
	writeln (debug_term,"Invalid Sequence, \",\" expected");
	return;
    }
    if (m_process.eof())
        return;
    m_process.get (str);
    int t = atoi (str);
    writeln (debug_term,"Parsed GMT time as int = ",t);

    m_s->set_time (str);
    m_s->compute_local_aspect ();
    char test_str [256];
    int M,H,S;
    H = m_s->m_hour;
    M = m_s->m_minute;
    S = m_s->m_second;
    double az, el;
    az = m_s->m_az;
    el = m_s->m_el;
    sprintf_s (test_str,256,"LOCAL time = %02d:%02d:%02d  AZIMUTH = %f, EL = %f",H,M,S,az,el);
    writeln (output,test_str);
}

 Nothing exciting here, although I am sure that there is room for optimization.  For now, the idea of using the Pascal tokenizer will have to wait, although I still like the idea of having another layer of automatic type detection, which from a certain point of view might be far superior, or at least safter than the otherwise very risky use of functions like sscanf, for example, At least according to modern coding standards.  Running this inside the Eliza main loop produced this output.  

$ Token Found:  Possible NMEA message!
GPGGA Token indicating NMEA message!
Parsed GMT time as int = 14306
LOCAL time = 17:43:06  AZIMUTH = 281.351135, EL = 11.174071

So the brute force technique, while it may be messy, is still quite effective.  Yet I am still poking the local longitude and latitude into the solar_aspect initialization data from preset values.  Obviously, I am going to want to get those values from GPS also, most likely with a bunch of other stuff, like the number of satellites, signal to noise, and other things that there are a look at more closely.

Discussions