Close
0%
0%

Deep Sneak: For Lack of a Better Name

The mixture-of-experts and multi-model meme is in play, so lets let the reindeer games begin, even if we missed Christmas with Deep-Seek.

Similar projects worth following
0 followers
We have all read the news by now, and perhaps for now, no one knows where this train is headed quite yet. Yet somebody is going to be the very first to try to get the Deep-Seeks R1 model to run on a Raspberry Pi or a Propeller, or otherwise explore just how low we can go with any and all variations of Arduino hardware. What will it be able to do? Shoot Pool? Perform DJ beat matching? Find the love of your life? Run for president?

As we all know by now, Deep Seek launched recently and has rapidly overtaken GPT and all of the rest as the leading AI in terms of cost efficiency, to say the very least.  But will it run on a Propeller, or a Pi, or on an Android device for that matter?  And what about standalone support under Windows?  Obviously, there will be many applications, where some big data users might not want to run their proprietary data on some third-party cloud, even if it is Google's cloud, Amazon's whatever, Microsoft's Azure, or something else. 

So, whether it is big-pharma, or the NSA ... and chances are there are many others who know that they can't ever really know for sure whether or not it is safe to run their special needs AI applications on an outsourced platform.  Hence anything that brings down the cost of getting one's own private, trusted, and hopefully secure cloud would be an essential need that is not being adequately addressed in the current marketplace.

Likewise, I think that if an even better AI could be made to run on something like a Pi-cluster, then that would not only be interesting in and of itself, but it would also be useful to the community at large, as better methods for running distributed workloads are developed.  Obviously, right now Blue-Sky is another thing that comes to mind, and the fact that Blue-Sky is based on something called "Authenticated Transaction" or AT protocol -- and apparently there are already people claiming that the AT protocol will work on a Raspberry Pi, whether as a part of the Blue-Sky network or on a completely separate network.  So this could get very interesting indeed.

What if we could build a Pi cluster and create an echelon of chat-bots, where each bot is a so-called expert on just one subject, or on at least one subject, while otherwise having some general conversational ability?  That does not mean that I want to turn an army of bots loose of Blue-Sky, of course.  Rather I am just thinking along at least on a path where it might be possible to train multiple bots on different subjects, and see how well they do, if they are Deep-Seek R1 based, vs. prior approaches.  So we turn the bots loose on each other, and let them learn from each other.

This is of course, an ambitious project.  One that will not produce instant results overnight.  Yet, let's embrace the chaos and see what happens anyway.

  • Time to jump right in and swim with the big fish?

    glgorman12 hours ago 0 comments

    Alright, here goes.  In an earlier series of projects, I was chatting with classic AI bots like ELIZA, Mega-Hal, and my own home-grown Algernon.  Hopefully, I am on the right track, that is to say, if I am not missing out on something really mean in these shark-infested waters, and here is why I think that that is.   Take a look at what you see here:

    This is what I got as of January 27 when I downloaded the source files from GitHub for Deep-Seek.  Other than some config files, it looks like all of the Python source fits quite nicely in 58K or less.  Of course that doesn't include the dependencies on things like torch, and whatever else it might need.  But model.py is just 805 lines.  I checked.  Now let's look at something different.

    This is of course a screen-shot of a debugging session where I was doing things like counting the number of times that the word rabbit occurs in Alice in Wonderland, and so on.  Maybe one approach to having a Mixture of Experts would require that we have some kind of framework.  It is as if Porshe or Ferrari were to start giving away free engines to anyone, just for the asking, except that you have to bring your existing Porshe or Ferrari in to the dealer for installation, which would of course be "not free", unless you could convince the dealership that you have your own mechanics, and your own garage, etc., and don't really need help with all that much of anything - just give the stupid engine!

    Well, in the software biz - what I am saying therefore, is that I have built my own framework, and that it is just a matter of getting to the point where I can, more or less, drop in a different engine.  Assuming that I can capture and display images, tokenize text files., etc., all based on some sort of "make system" whether it is a Windows based bespoke application, like what you see here, or whether it is based on things like Bash and make under Linux, obviously.  Thus, what seems to be lacking in the worlds of LLAMA as well as Deep Seek - is a content management system.  Something that can handle text and graphics, like this:

    Yet, we also need to be able to do stuff like handle wavelet data, when processing speech or music, or when experimenting with spectral properties of different kinds of noise for example:

    Of course, if you have ever tried writing your own DOOM wad editor from scratch, you might be on the right track to creating your own AI ... which can either learn to play DOOM, or else it just might be able to create an infinite number of DOOM like worlds.  Of course, we all want so much more, don't we?

    Alright then, first let's take a peek at some of the source code for Deep-Seek and see for ourselves if we can figure out just what exactly it is doing, just in case we want a noise expert, or a gear expert, or something else altogether!

    Are you ready - silly rabbit?

    class MoE(nn.Module):
        """
        Mixture-of-Experts (MoE) module.
    
        Attributes:
            dim (int): Dimensionality of input features.
            n_routed_experts (int): Total number of experts in the model.
            n_local_experts (int): Number of experts handled locally in distributed systems.
            n_activated_experts (int): Number of experts activated for each input.
            gate (nn.Module): Gating mechanism to route inputs to experts.
            experts (nn.ModuleList): List of expert modules.
            shared_experts (nn.Module): Shared experts applied to all inputs.
        """
        def __init__(self, args: ModelArgs):
            """
            Initializes the MoE module.
    
            Args:
                args (ModelArgs): Model arguments containing MoE parameters.
            """
            super().__init__()
            self.dim = args.dim
            assert args.n_routed_experts % world_size == 0
            self.n_routed_experts = args.n_routed_experts
            self.n_local_experts = args.n_routed_experts // world_size
            self.n_activated_experts = args.n_activated_experts
            self.experts_start_idx = rank * self.n_local_experts
     self.experts_end_idx = self.experts_start_idx + self.n_local_experts
    ...
    Read more »

View project log

Enjoy this project?

Share

Discussions

Similar Projects

Does this project spark your interest?

Become a member to follow this project and never miss any updates