Close
0%
0%

Deep Sneak: For Lack of a Better Name

The mixture-of-experts and multi-model meme is in play, so lets let the reindeer games begin, even if we missed Christmas with Deep-Seek.

Similar projects worth following
We have all read the news by now, and perhaps for now, no one knows where this train is headed quite yet. Yet somebody is going to be the very first to try to get the Deep-Seeks R1 model to run on a Raspberry Pi or a Propeller, or otherwise explore just how low we can go with any and all variations of Arduino hardware. What will it be able to do? Shoot Pool? Perform DJ beat matching? Find the love of your life? Run for president?

As we all know by now, Deep Seek launched recently and has rapidly overtaken GPT and all of the rest as the leading AI in terms of cost efficiency, to say the very least.  But will it run on a Propeller, or a Pi, or on an Android, or even and Arduino device for that matter?  And what about standalone support under Windows?  Obviously, there will be many applications, where some big data users might not want to run their proprietary data on some third-party cloud, even if it is Google's cloud, Amazon's whatever, Microsoft's Azure, or something else. 

So, whether it is big-pharma, or the NSA ... and chances are there are many others who know that they can't ever really know for sure whether or not it is safe to run their special needs AI applications on an outsourced platform.  Hence anything that brings down the cost of getting one's own private, trusted, and hopefully secure cloud would be an essential need that is not being adequately addressed in the current marketplace.

Likewise, I think that if an even better AI could be made to run on something like a Pi-cluster, then that would not only be interesting in and of itself, but it would also be useful to the community at large, as better methods for running distributed workloads are developed.  Obviously, right now Blue-Sky is another thing that comes to mind, and the fact that Blue-Sky is based on something called "Authenticated Transaction" or AT protocol -- and apparently there are already people claiming that the AT protocol will work on a Raspberry Pi, whether as a part of the Blue-Sky network or on a completely separate network.  So this could get very interesting indeed.

What if we could build a Pi cluster and create an echelon of chat-bots, where each bot is a so-called expert on just one subject, or on at least one subject, while otherwise having some general conversational ability?  That does not mean that I want to turn an army of bots loose on Blue-Sky, of course.  Rather I am just thinking along at least on a path where it might be possible to train multiple bots on different subjects, and see how well they do, if they are Deep-Seek R1 based, vs. prior approaches.  So, we turn the bots loose on each other and let them learn from each other.

This is of course, an ambitious project.  One that will not produce instant results overnight.  Yet, let's embrace the chaos and see what happens anyway.

atari_genetics.cpp

Here is the C++ version of the Atari Genetic algorithm. I adapted Deep Seek's translation so that I could piggyback it into a library that also runs the classic Eliza, and which makes use of Pascal style writeln io instead of printf, etc., i.e. for the debugging messages.

plain - 1.82 kB - 02/23/2025 at 19:42

Download

convert.cpp

C++ version of the Deep Seek source file convert.py

plain - 13.00 kB - 02/10/2025 at 03:55

Download

compatibility.h

Temporary helper file used to provide some missing prototypes, etc., for use with Deep Seek initial porting efforts, i.e., if C17 and PyTorch is not available for the platform of interest, and for studying dependencies.. etc.

plain - 2.19 kB - 02/10/2025 at 03:55

Download

  • Back into the Maze of Twisty Little Passages

    glgorman3 days ago 0 comments

    So I am starting to figure out some of the issues with the original Atari Genetic algorithm code, and there are a bunch of issues, despite the fact that the original code does actually appear to work - sort of.   For those that love to stare at debugging sessions, this will explain volumes.  Maybe I could ask Deep Seek for an explanation of what is happening here?

    First the good news.  I completely refactored the C code that Deep Seek gave me, so as to have something that for now is in C++, and which I think I can eventually adapt to run on a Propeller, or on an Arduino, or an NVIDIA GPU, as discussed in the previous log entry,  It is also quite easy to see that every element of the array m_out0 is getting filled with the same value, which I am pretty sure was also happening in the original Atari BASIC code.  So clearly, some more work needs to be done on the topology of this network.  Yet I have definitely had some substantial improvement in the performance of the algorithm by adding some gain and bias parameters.

    Therefore, since I have a short, simple function that can easily be modified to work according to some different theory of operation, I can proceed therefore, with some of my ideas about running something like this in multiple threads, on the one hand, and/or adding some kind of evolutionary selection component, as previously discussed.  In Windows of course, a simple call to a training function via AfxBeginThread is all that would be required to do the actual model training and instance evaluation, this is to say - in and of itself.  Furthermore, all of that should work just fine, without needing a whole bunch of locks, semaphores, or event handles or whatever, that is to say - until we get to the reportage sections. 

     Obviously, we don't want multiple threads attempting to call WRITELN, at the same time.  Yet if I add a critical section to the code where the reporting takes place, then this could result in blocking problems from time to time.  Perhaps a simple mail slot system would be the way to go; so that a thread can create a report and then "submit it" to the IO queue as a completed text-object.  That doesn't seem all that hard, given that most of what I would need to do that, already exists in my PASCAL style IO classes for WRITELN, etc., and where I could perhaps modify the WRITELN class to allow writing to a TextObject, maybe something like this:

    #include "../Frame Lisp/intrinsics.h"
    #include "text_objects.h"
    
    ...
    TextObject debug_text;
    
        lisp::begin_message(debug_text);
        for (i=0; i<np; i++)
        {
            x = m_ds->x(i);
            y = m_ds->y(i);
            arg = double(i)/m_ds->size();
            value = eval0(w1,i);
            writeln (debug_text," >>> X=",x," Y=",y, " G=",value);
        }
        lisp::end_message(debug_text);
        lisp::send_message(output,debug_text);
    ...
    

    That way I can put all of the memory management, as well as the synchronization objects in the LISP and/or PASCAL library text manipulation and/or IO handlers.  Hopefully in such a way that there are no blocking calls that ever have to be made from within this otherwise critical loop.  

    And I still need to get back to reading the code to see how Deep Seek really works, as well is putting some of this to use with my braiding functions.  Fortunately, there is quite a bit that is already done, at least as far as all of this other IO stuff is concerned,

    Something to keep in mind therefore, is that there is already a project on Hackaday, as well as a source code repository on GitHub for a framework that I was working on, as hard as it is to believe right now, for bespoke application builder that I made in 2019 with the intent that it allow me to interact with Propeller and Arduino projects over a USB connection, among other things.  

    Therefore, a lot of the heavy lifting has...

    Read more »

  • Genetics, Death and Taxes vs. Creation, Concurrency and Evolution

    glgorman02/25/2025 at 19:53 0 comments

    Well, for whatever it is worth, I have started work on parallelizing the genetic algorithm for approximating a mathematical function by using a neural network, and even though I haven't started yet on actually running it as a multi-threaded process, the results are nonetheless looking quite promising!  

    An interesting question should arise therefore; Is it better to train on a single genetic neural network, let's say for 2048 iterations, or is it better to try and run let's say 32 models in parallel, while iterating over each model, let's say 64 times.  It appears that allowing multiple genetic models to compete against each other, as if according to an evolutionary metaphor is the better way to go, at least for the time being.  Interestingly enough, since I have not yet started on running this on multiple threads, I think that if memory constraints permit, then this code just might be able to be back-ported to whatever legacy hardware might be of interest.

    O.K., then - for now the code is starting to look like this, and soon I will most likely be putting some of this up on GitHub, just as soon as I iron out a few kinks.

    int atari::main()
    {
        int t1,t2;
        int iter = ITERATIONS;
        int m,n;
    
        // first initialize the dataset
        vector<neural::dataset> datasets;
        datasets.resize (NUMBER_OF_MODELS);
        for (n=0;n<(int)datasets.size();n++)
        datasets[n].initialize (NP);
    
        // now construct the neural network
        vector<neural::ann> models;
        models.resize (NUMBER_OF_MODELS);
        for (n=0;n<(int)models.size();n++)
        {
        models[n].bind (&datasets[n]);
        models[n].init (n,iter);
        }
        t1 = GetTickCount();
    
        bool result;
        models[0].report0(false);
        for (n=0;n<iter;n++)
        for (m=0;m<(int)models.size();m++)
        {
        result = models[m].train (n);
        if (result==true)
            models[m].store_solution(n);
            if (result==true)
            models[m].report(n);
        }
        t2 = GetTickCount();
        writeln(output);
        writeln (output,"Training took ",(t2-t1)," msec.");
        models[0].report1();
        models[0].report0(true);
        return 0;
    }

    It's really, not all that much more complicated than the code that Deep Seek generated.  Mainly some heavy refactoring, and breaking things out into separate classes, etc.  Yet, a whole bunch of other stuff comes to mind.  Since it would almost be a one-liner at this point to try this on Cuda, on the one hand.  Yet there is also the idea of getting under the hood of how Deep Seek works; since the idea also always comes to mind as to how one might train an actual DOOM expert or a hair expert for that matter, so as to be able to run in the Deep Seek environment.

    Stay Tuned!  Obviously, there should be some profound implications as to what might be accomplished next.  Here are some ideas that I am thinking of right now:

    1. While evolving multiple models in parallel I could perhaps make some kind of "leader board" that looks like the current rank table for gainers or losers on the NASDAQ, or the current leaders in the Indy 500.  Easy to do Linux style, if I make a view that looks like what you see when you do a SHOW PROCESS LIST in SQL or PS in Linux, for example.
    2. What if whenever a model shows an "improvement" over its previous best result, I not only store the results, but what if I also make a clone of that particular model, so that there would now...
    Read more »

  • Navigating the Twisted Convoluted Maze of Dark Passages

    glgorman02/24/2025 at 01:59 0 comments

    Or some other snooty remark.  Like I said earlier.  So, I have started on a C++ class based on the article that I saw about doing neural networks, with genetic algorithms, no less - on an Atari 800XL

    class ann
    {
    protected:
        int NH,NN,NP;
        int NM, NG; 
        double DB_INIT, F_INIT;
        double F, DB, DF;
        double DIST;
        double (*m_fptr)(double);
        vector<double> x0, y0;
        vector<double> U, wb, out;
    
    public:
        ann ();
        void init (int i);
        bool bind (double(*fptr)(double));
        void randomize ();
        void train(int iter);
        void store_solution (int N);
        double test (double val);
    };
    

    Obviously there is a lot more of this, but what I really think that I should be doing is to actually put some of the graphics capabilities of my framework to good use.  That would be kind of fun to watch in action, of course, whether on a PC, or on a classic 8-bit system.  C++ is of course convenient, but maybe I could eventually try this in Apple II Pascal, or using Flex-C on the Parallax Propeller P2, just as soon as I figure out what it is that I am actually trying to do with it.  Then of course there is also the task of figuring out how this relates to the land of all things LLM related.  Maybe I should mention a conversation that I had with a chatbot a couple of years ago, about the Yang-Mills conjecture, since some of that might turn out to actually be relevant, finally - after all of this time.

    Maybe I am thinking about the idea that I had about pasting oscilloscope data on the surface of a cylinder, or warping the same data onto a sphere, and then how I realized how easy it is for that to get into either special or general relativity, so that sort of thing could both be a use case for an LLM, insofar as working out the theory, but also a use case for neural network based equation solvers.

    So obviously there is more to this game than just regular tessellation, OpenGL, Cuda or whatever.  Yet I am thinking about something else, of course, and that is also the problem of the icosahedron, which tessellates so much more nicely than the sphere, like if we take the icosahedron and subdivide each face into four triangles so that we get an 80 sided polytope, which can, of course, be further subdivided get a surface with, let's say 320 faces, which is pretty close the number of dimples on most golf balls.  Interestingly enough.

    Thus, in the meantime, I can definitely see some serious use cases for using neural networks of whatever type, for some kind of vertex nudging, for example.  Yet, as suggested, it could get very interesting when we consider the possible role of LLM's as stated also, either in actual code or in developing the design concept.

    Even better if Deep Seek or something similar should turn out to be useful for generating useful models that can not only be run on the desktop, but which can be trained locally.

    In the meantime, however, we can have some more fun with our C++ version of the original genetic algorithm, by making some things a bit more generic.  I am surprised for example that the original BASIC code simply fills an array in a loop by squaring the training values directly.  Why not use the DEFINE FN idiom that most BASICS have, to define a callable function that takes an argument and then returns a value, just like we are accustomed to in a higher-level language such as C, or even Pascal or Fortran for that matter?  Maybe it is just too slow to do in BASIC.  Well, that doesn't mean that we can't do this is C++, however.

    Let's take a look at two functions, bind and square:

    bool ann::bind (double(*arg)(double))
    {
        m_fptr=arg;
    ...
    Read more »

  • Let's adapt some Atari BASIC code that does a genetic algorithm?

    glgorman02/23/2025 at 19:05 0 comments

    Elsewhere on Hackaday right now there is an article that describes another person's efforts to train a neural network to model a mathematical function, such as a simple function like y=x^2.  Perhaps this approach might be useful to help with the alignment issues in my braiding algorithm.  Might as well try it, right?  Even though I have an idea in the works that I could just as well export a list of vector-drawn control points for the braid, and then use a Fourier transform-based low pass filter to smooth out some of the irregularities, as previously described.  So, I tried giving the BASIC code from that project to Deep Seek, and I asked Deep Seek to convert it to C for me!  Much to my surprise, the code ran the first time.

    Here is the Hackaday Article about using Atari BASIC to implement a genetic algorithm.

    And here is a cropped screen shot from the author's YouTube video:

    So naturally, I had to try and see if Deep Seek could handle the otherwise fairly straight forward translation problem.  Obviously, using a sledgehammer to crush ants, maybe, but worth a try.

    Deep Seek was even nice enough to provide an explanation as to how the code works!

    1. **Initialization**: The constants and variables are defined at the beginning. Arrays are used to store the values of `HQ`, `YQ`, `U`, `MB`, and `O`.
    2. **Random Initialization**: The `MB` array is initialized with random values using `srand` and `rand`.
    3. **Training Loop**: The main training loop iterates `ITER` times. In each iteration, the `U` array is updated with new values based on `MB` and a random factor.
    4. **Distance Calculation**: The distance `DIST` is calculated based on the difference between `YQ` and the output `O`.
    5. **Update**: If the calculated distance is less than `DB`, `DB` is updated, and the `MB` array is updated with the values from `U`. The factor `F` is also decreased by `DF`.
    6. **Output**: The current iteration number and distance are printed.
    This C code should replicate the functionality of the original Atari BASIC program. Note that some parts of the BASIC code were incomplete or unclear, so assumptions were made to fill in the gaps.

    OK - very nice.  We can grab screenshots of source code from YouTube, and translate the code to other languages!  So now let's look at the generated code which I made some vary minor changes to in order to adapt it to my framework.

    #include "stdafx.h"
    #include "../Frame Lisp/intrinsics.h"
    #include "eliza.h"
    #include <math.h>
    #include <time.h>
    
    #define PI 3.14159265
    #define ITER 32
    #define F_INIT 2.5
    #define NH 4
    #define NN (NH + 1)
    #define NP 3
    #define NM (2 * NH)
    #define NG 10
    #define DR (1.0 / NP)
    #define DB_INIT 1000000.0
    
    void train_ann() {
        double HQ[NP], YQ[NP];
        double U[NN];
        double MB[NN], O[NN];
        double F = F_INIT, DF = F / ITER, DB = DB_INIT;
        double DIST;
        int I, J, K, N;
    
        // Initialize points
        for (I = 0; I < NP; I++) {
            HQ[I] = (I - 0.5) * DR;
            YQ[I] = HQ[I] * HQ[I];
        }
    
        // Initialize MB with random values
        srand(time(NULL));
        for (I = 0; I < NM; I++) {
            MB[I] = (double)rand() / RAND_MAX;
        }
    
        // Training loop
        for (N = 0; N < ITER; N++) {
            DIST = 0.0;
            for (I = 0; I < NM; I++) {
                U[I] = MB[I] + F * ((double)rand() / RAND_MAX);
            }
    
            for (K = 0; K < NP; K++) {
                for (J = 0; J < NN - 1; J++) {
                    double SUM = 0.0;
                    for (I = 0; I < NH; I++) {
                        SUM += U[I] * HQ[K];
                    }
                    O[J] = sin(SUM * PI);
                }
    
                O[NN - 1] = 0.0;
                for (I = 0; I < NH; I++) {
                    O[NN - 1] += U[I + NH] * O[I];
                }
                O[NN - 1] = sin(O[NN - 1] * PI);
    
                DIST += pow(YQ[K] - O[NN - 1], 2);
            }
            DIST = sqrt(DIST / NP);
            writeln (output, N, " ",DIST, " ", DB);
    
            if (DB > DIST) {
                DB = DIST;
                for (I = 0; I < NH; I++) {
                    MB[I] = U[I];
                }
                F -= DF;
            }
        }
    }
    
    int atari::main()
    {
        train_ann();
        return 0;
    }
    

     Now we simply have to try running it.  So let's sneak a test case into my hair drawing code, so that it will dump some debugging info at startup.

    Now I just need to go back and...

    Read more »

  • Transformers, Transformers - Oh, Where art Thou?

    glgorman02/20/2025 at 14:38 0 comments

    I don't know why I never thought about this until now, or maybe I did - but I just wasn't far enough along in my development efforts for it to matter.  Yet, like isn't this all of a sudden - I have an idea.  Something that maybe I should follow up on, because I think it is going to turn out to be very useful in transformer land.   First, let's take a look at something that I am looking at right now in a debugging session.

    Here you can see what some of the values are of some of the member variables associated with the camera class that I use when working with GDI-based stuff in 3D, and one of the things that this let's me do is write code that looks like this:

    void twist::draw_strand1 (camera &a, COLORREF c, int i)
    {
        SETCOLOR _(a,c);
        MATH_TYPE theta= 0.0;
        MATH_TYPE w = -m_width;
        MATH_TYPE m_offset;
        MATH_TYPE step_size;
    
        if (i%2==1)
        m_offset = -m_width*0.25;
        else
        m_offset = m_width*0.25;
        
        _vector pos = xyz (theta,i);
        pos[0]+=m_offset;
        pos[2]+=(m_length*i*0.5);
        a.move_to_ex (pos);
        step_size = STEP_SIZE;
        for (;theta<=360.0;theta+=step_size)
        {
        pos = xyz (theta,i);
        pos[0]+=m_offset;
        pos[2]+=(m_length*i*0.5);
        a.line_to_ex (pos); 
        }
    }
    

    So the most important part here, I suppose is the ease with which I can simply create objects by drawing in 3D, while doing a bunch of Open GL-like stuff like pushing and popping the necessary object transformations, while in effect generating vertex lists, as well as face and object data in some cases, which then in-turn allows for the creating of obj file like intermediate representations, which can then be played back at will, or saved to a file, or even 3-d printed in some cases.  Yet for some reason, until now - even though my MFC application is Multi-Document and Mult-View capable, I have until now only made use of one camera at a time, i.e. when rendering a particular scene - so that in effect I have a CameraView class in MFC that derives from CView, and that in turn allows whichever window it is that is having its OnDraw method called by the framework to translate the 3D view into the final GDI calls for onscreen rendering.  Or else, the more general conversion of primitive objects into fully GL-compatible objects is still a work in progress.

    Yet what would happen if I for a sufficiently complex object, and for such a complex scene that might have doors, windows, mirrors, etc., I then create one camera for each point of view within the scene, or perhaps better still I try binding multiple instances of the camera class to some of the more complex objects themselves.  Or, in a so-called "attentional" system, I try to find a meaningful way to bind multiple camera-like objects to the so-called transformers within a generative AI.  Maybe this kind of recapitulates the notions of the earliest ideas of the programming language Smalltalk, that is in the sense that not only should you be able to tell an object to "draw itself" - yet obviously,, if I wanted to have multiple regions of braided hair, and each braid or bundle could take "ownership" or at least "possession" of a thread, let's say out of a thread pool, not just for concurrency reasons, but also for code readability and overall efficiency reasons, that is to say - just so long as RAM is cheap.

    So, if a "groom" object might be associated with 100,000 hairs, for example, then how does sizeof(camera) affect performance if we bind the camera object to the object instead of to the view?

    Testing this code snippet, I find no obvious effects on the program's behavior, other than to learn that the current size of the camera class...

    Read more »

  • This May Be or May Knot Be Such a Good Idea?

    glgorman02/19/2025 at 18:02 0 comments

    First. let's take a look back at something that I was working on last summer, and that is my very happy "Teapot in  the Garden" Rendering.

    How nice! Yes. how very nice indeed!  Because I just saw something in the news the other day that NVIDIA has just announced their own Tensor Flow AI-based hair drawing routines, that supposedly allow for something like 240FPS rendering, and stuff like that, i.e., by requiring a 97% reduction in CPU/GPU utilization.  O.K., now that THEY are finally getting their act together, I must contemplate therefore: can we do better?  Well, for now, I am happy with my shader - even if it still mostly running on CPU + Open GL.  Yet here might be another possible use for Deep Seek, maybe?  I mean to say that is if I can get Deep Seek running, with or without Py-Torch in my own framework.  So let's do something else, therefore, like revisiting the whole theory of braids, that is - since the shader is pretty much ready for use in production code.

    So yesterday I went on a coding binge - and started writing some code to do some simple GDI-based drawing of some braids, and for starters - I came up with a simple twist drawing routine.

    After playing with the code, I finally came up with this mess - which might not be as terrible as it looks, since at least am at the point where the overall topology is mostly correct, even though there are obviously some alignment issues with how the different segments are being assembled, and this is something that I want to discuss further.

    But first, let's have another view along the x-axis, i.e. so that we are looking at a projection, mostly of the YZ plane.

    Now to understand what is happening here, take at look at the code for the "xyz" function:

    _vector twist::xyz (MATH_TYPE l, int i)
    {
        MATH_TYPE theta,x1,y1,dz;
        theta = l*(TWO_PI/180.0);
        x1 = m_width*(-cos(theta*0.5));
        if (i%2==0)
        y1 = m_radius*(1-cos(theta));
        else
        y1 = -m_radius*(1-cos(theta));
        dz = l*m_length*(1/360.0);
        _vector result (x1,y1,dz);   
        return result;
    }
    
    void braid::set_colors(int i)
    {
        const COLORREF r = (int)COLOR::red;
        const COLORREF g = (int)COLOR::green;
        const COLORREF b = (int)COLOR::blue;
        struct ctab
        {
        COLORREF c[3];    
        };
        ctab colors[] =
        {
        {r,g,b},
        {b,g,r},
        {g,b,r},
        {r,b,g},
        {b,r,g},
        {g,r,b},
        };
        c1 = colors[i].c[0];
        c2 = colors[i].c[1];
        c3 = colors[i].c[2];
    }

    Elsewhere, of course, I am calling the "xyz" function to find the values of some points along my control lines for my braid, while also calling another function in the same set of loops in order to decide what color to draw each segment with.   The segment coloring function, is of course magic, since it relies on various permutations which I derived empirically, and put into a table lookup.  Yet. for efficiency reasons, this might be a better way, I hope than trying to deduce some other magic formula that looks at the layer index and strand id and which then tries to use division and or modular arithmetic to determine the type of strand to draw, whether it goes over or under, and how to color it.  So I'll live with it - for now.  Yet clearly something needs to be done with that xyz function - and how it is being invoked.

    Maybe I could treat the position of my control lines as complex numbers in the xy plane, and then treat the z-axis as time, and then compute the Fourier transform of the functions that define the position of each strand.  Then a little low-pass filtering might...

    Read more »

  • Time to jump right in and swim with the big fish?

    glgorman02/06/2025 at 01:42 0 comments

    Alright, here goes.  In an earlier series of projects, I was chatting with classic AI bots like ELIZA, Mega-Hal, and my own home-grown Algernon.  Hopefully, I am on the right track, that is to say, if I am not missing out on something really mean in these shark-infested waters, and here is why I think that that is.   Take a look at what you see here:

    This is what I got as of January 27 when I downloaded the source files from GitHub for Deep-Seek.  Other than some config files, it looks like all of the Python source fits quite nicely in 58K or less.  Of course that doesn't include the dependencies on things like torch, and whatever else it might need.  But model.py is just 805 lines.  I checked.  Now let's look at something different.

    This is of course a screen-shot of a debugging session where I was doing things like counting the number of times that the word rabbit occurs in Alice in Wonderland, and so on.  Maybe one approach to having a Mixture of Experts would require that we have some kind of framework.  It is as if Porshe or Ferrari were to start giving away free engines to anyone, just for the asking, except that you have to bring your existing Porshe or Ferrari in to the dealer for installation, which would of course be "not free", unless you could convince the dealership that you have your own mechanics, and your own garage, etc., and don't really need help with all that much of anything - just give the stupid engine!

    Well, in the software biz - what I am saying therefore, is that I have built my own framework, and that it is just a matter of getting to the point where I can, more or less, drop in a different engine.  Assuming that I can capture and display images, tokenize text files., etc., all based on some sort of "make system" whether it is a Windows based bespoke application, like what you see here, or whether it is based on things like Bash and make under Linux, obviously.  Thus, what seems to be lacking in the worlds of LLAMA as well as Deep Seek - is a content management system.  Something that can handle text and graphics, like this:

    Yet, we also need to be able to do stuff like handle wavelet data, when processing speech or music, or when experimenting with spectral properties of different kinds of noise for example:

    Of course, if you have ever tried writing your own DOOM wad editor from scratch, you might be on the right track to creating your own AI ... which can either learn to play DOOM, or else it just might be able to create an infinite number of DOOM like worlds.  Of course, we all want so much more, don't we?

    Alright then, first let's take a peek at some of the source code for Deep-Seek and see for ourselves if we can figure out just what exactly it is doing, just in case we want a noise expert, or a gear expert, or something else altogether!

    Are you ready - silly rabbit?

    class MoE(nn.Module):
        """
        Mixture-of-Experts (MoE) module.
    
        Attributes:
            dim (int): Dimensionality of input features.
            n_routed_experts (int): Total number of experts in the model.
            n_local_experts (int): Number of experts handled locally in distributed systems.
            n_activated_experts (int): Number of experts activated for each input.
            gate (nn.Module): Gating mechanism to route inputs to experts.
            experts (nn.ModuleList): List of expert modules.
            shared_experts (nn.Module): Shared experts applied to all inputs.
        """
        def __init__(self, args: ModelArgs):
            """
            Initializes the MoE module.
    
            Args:
                args (ModelArgs): Model arguments containing MoE parameters.
            """
            super().__init__()
            self.dim = args.dim
            assert args.n_routed_experts % world_size == 0
            self.n_routed_experts = args.n_routed_experts
            self.n_local_experts = args.n_routed_experts // world_size
            self.n_activated_experts = args.n_activated_experts
            self.experts_start_idx = rank * self.n_local_experts
     self.experts_end_idx = self.experts_start_idx + self.n_local_experts
    ...
    Read more »

View all 7 project logs

Enjoy this project?

Share

Discussions

Similar Projects

Does this project spark your interest?

Become a member to follow this project and never miss any updates