Close
0%
0%

RAIN Mark II Supercomputer Trainer

High-Performance Computing for Everyone

Similar projects worth following
RAIN is an open-source project to design and build open, efficient, and accessible supercomputers. RAIN's mission is to make supercomputing accessible to a wider and more diverse audience and encourage the development of new, innovating and compelling high-performance applications.This Hackaday.io project is focused on the RAIN Mark II Supercomputer Trainer. The goal of this phase of the RAIN project is to create a small, inexpensive computer with the same architecture as a large scale cluster to facilitate learning and the development of new high-performance applications while making owning and operating the system accessible to a wide-range of programmers, designers and users.Completion of the Mark II SCT will provide a platform for research & development on the next phase of the project to make the design even more open (switch from ARM to RISC-V) and more powerful.

Up until now I've been documenting my work on RAIN (formerly known as *Raiden*) on my blog at https://jjg.2soc.net/category/rain/.  This includes information about the previous phase of the project (Mark I) and longer discussions of the philosophy behind the work (and why I think it matters).

Here I'm going to focus on documenting ongoing work on a specific machine, the RAIN Mark II Personal Supercomputer. 

Mark II is an 8-node ARM computing cluster using a Gigabit Ethernet interconnect.  It utilizes a combination of PINE64 SOPINE modules and PINE A64 single-board computer to create a distributed-memory supercomputer with up to 32 ARM cores, 16 vector/GPU cores and 16 GB of memory in a small, clean (and I think, beautiful), easy-to-use package.

In addition to the hardware, the RAIN project involves making the creation of new high-performance applications approachable by people with little or no experience with supercomputers.  My goal is to do for supercomputers what the personal computer did for mainframes.  This means more than putting the box on someones desk, it also means giving them tools to use it the way they want to.

The design has gone through many revisions and I've recently abandoned some of my own work in favor of using the PINE64 Clusterboard.  I didn't take this change lightly as my designs for assembling arrays of PINE A64 boards is more flexible and scalable than the Clusterboard, but the board is just so perfectly suited for the Mark II's design and meets the needs of the goals I have for Mark II with almost no compromise, so it's the obvious choice.

This means I can get the machine operational faster, and design a machine that is much easier for others to reproduce (sticking to off-the-shelf components is one of the design goals for Mark II).  All of this means I can turn my attention to the software side of the system sooner and I think that's where I have the most value to provide anyway.

panel_interface_pcb_1.4.0.pdf

Interface electronics to connect Clusterboard/SOPINE modules to front-panel controls (pcb)

Adobe Portable Document Format - 207.65 kB - 04/28/2018 at 15:32

Preview

panel_interface_schematic_1.1.0.pdf

Interface electronics to connect Clusterboard/SOPINE modules to front-panel controls (schematic)

Adobe Portable Document Format - 44.82 kB - 04/28/2018 at 15:30

Preview

a64_bracket_v1.3.stl

Magnetic mount for A64 board (rear end)

sla - 266.81 kB - 03/15/2018 at 14:45

Download

a64_bracket_face_v1.3.stl

Magnetic mount for A64 board (face end)

sla - 280.46 kB - 03/15/2018 at 14:43

Download

clusterboard_mounting_arm_v1.5.stl

Magnetic mount for Clusterboard (4 needed)

sla - 297.94 kB - 03/15/2018 at 14:43

Download

View all 6 files

View all 19 components

  • Mark IIb and breaking the 20 Gflops barrier

    jason.gullickson09/28/2018 at 13:53 0 comments

    Now that Mark II has been operational more regularly, I’ve noticed that working on it in the current case is really inconvenient. Something with a removable top would make a lot more sense (which is why machines of a similar design had this feature), but when I started designing Mark II I couldn’t find anything like that which would work with the original full-size PINE A64-based design.

    After switching to the Clusterboard, I took another look and found a case with a removable top that might work. I over-analyzed the specs and finally broke-down and just ordered one to measure the fit directly.

    2018-08-13_12-27-08

    The new case is considerably smaller in the “horizontal” direction, larger in the “vertical” one (I put these in quotes because the way length/width/height are measured makes this... difficult). The Clusterboard and one A64 do fit inside this new case, but it’s a pretty tight fit.

    2018-08-08_08-41-07

    I took this downtime opportunity to try a few other modifications I’d been putting-off while the machine was working. The first was to try a different Linux build to test my theories about performance limitations found previously. I’ve made a few attempts at “swapping out” the kernel of the Armbian images I’ve been using on the cluster nodes but the only result was “bricking” a couple of them. After that I decided it was worth trying the “stock” Ubuntu build that PINE64 provides for the SOPINE module just to see what happens. I thought I read that this would be a non-starter due to issues with the network interface and the Clusterboard, but figured it was worth a shot.

    As it turned out, the image worked fine. I was able to boot-up a single node using the new image, configure it for the cluster and run HPL on it with no obvious problems. I was however able to see that thermal management was working (unlike the previous build) and it was throttling-down the CPU to keep overheating at bay. This gave me a reason to try some active thermal management.

    I’d picked-up some fat little copper heatsinks for the SOPINE modules awhile back, but since the previous Linux build didn’t produce any temperature feedback it wasn’t clear that they would make any difference. Now that I can see the CPU getting throttled due to overheating it makes sense to give these a try.

    2018-08-08_19-13-12

    Using the heatsinks alone didn’t reduce the incidence of throttling much, so I coupled that with a (very janky) fan setup and tested again. This made a big difference, and in fact completely eliminated the CPU throttling messages from the logs.

    2018-08-09_19-21-55

    This looked pretty good so I pull the rest of the cluster nodes, re-imaged their SD cards and installed heatsinks. The only problem I ran into brining these on-line was the hard-coded MAC address in the O/S image which resulted in only one of the nodes being accessible via the network. Once I identified the problem it was a straightforward fix to make each node re-generate it’s MAC address and they all came-up correctly.

    Now that I had a recipe for setting-up the cluster nodes to run HPL, doing so went pretty smooth and I was able to run a full-power test in short order. The results speak for themselves; 22.12 Gflops.

    This is still a ways from my target of 50 Gflops, but overcoming the previous ceiling is encouraging, and it’s especially encouraging because I was able to correctly identify the cause of the limitation. Next steps will be to spend some time tuning the contents of this new O/S image as well as tuning the overall configuration now that the cluster nodes can run at full clock speed (or perhaps beyond…?).

    I’m also very happy with how the new case is working out, and I’m planning on re-designing the front panel to fit the new case. This design will be called “Mark IIb”, and I have a number of changes in mind beyond simply resizing the panel, but more on that later…

  • Corrections

    jason.gullickson06/27/2018 at 20:02 0 comments

    There were a number of errors in my last post.

    Error #1

    The most egregious was in my interpretation of the high-performance Linpack (hpl) results. I was mystified by the fact that as I added nodes to the cluster performance would improve to a point and then suddenly drop-off. I attributed this to some misconfiguration of the test or perhaps a bottleneck in the interconnect, etc. In the back of my mind I had this nagging idea that there was some sort of “order-of-magnitude” error but I couldn’t see it. Then when I was reading a paper by computer scientist who was writing about running hpl it jumped out at me. The author said they measured 41.77 Tflops, but when I looked at their hpl output it looked like 4.1 Tflops…

    Then I realized my mistake.

    tflops
    Taken from “HowTo – High Performance Linpack (HPL)” by Mohamad Sindi

    There’s a few extra characters tacked-on to the Gflops number at the end of hpl’s output:

    1.826e+01

    When I first started using hpl, that last bit was always e+00 so I didn’t pay any attention to it. However at some point that number went from e+00 to e+01 without me noticing, and it turns out that change was significant.

    Since I didn’t notice this, I was mis-reading my results and doing a lot of troubleshooting to try and figure out why my performance dropped through the floor after I added a fifth node to the cluster. Once I realized the mistake I went back and reviewed the Mark I logs and sure enough, that mistake sent me on a wild goose chase and resulted in a seriously erroneous assessment of Mark I’s performance.

    The good news is that this means Mark II is considerably faster than I thought it was when I wrote about it earlier. The bad news is that this means Mark I is also faster, and my conclusion that Mark II was exceeding Mark I’s performance was incorrect. Knowing what I know now, Mark I was roughly 4x faster than Mark II in the 4×4 configuration.

    Performance parity between the two systems seemed too good to be true, so in a way this is a relief. It’s a little disappointing, but this doesn’t undermine the value of Mark II because in terms of physical size, cost and power consumption Mark II easily out-paces Mark I. I don’t have detailed power consumption measurements for either system yet, but if we compare their theoretical maximum power consumption, it’s pretty clear that in terms of flop-per-watt, Mark II is an improvement:

    • Mark I: 40 Gflops, 4,000 watts (800 watts * 5 chassis) = 100 watts per Gflop
    • Mark II: 18 Gflops, 75 watts (15 amps @ 5 volts * 1 chassis) = 4 watts per Gflop

    Error #2

    The second mistake I made related to the clock speed of Mark II’s compute nodes. The SOPINE module’s maximum clock speed is 1.2Ghz and this is what I used when determining the theoretical peak (Rpeak) and efficiency values for the machine. I knew that this speed might be reduced during the test due to inadequate cooling (cpu scaling) but I thought it was a reasonable starting point.

    Based on this I was disappointed to find that no matter how hard I tweaked the hpl configuration, I couldn’t break 25% efficiency. I even added an external fan to see if added cooling could move the needle but it seemed to have no effect. I took a closer look at one of the compute nodes during a test run and this is when the real problem began to become apparent.

    I wasn’t able to find the CPU temps in the usual places, and I couldn’t find the CPU clock speed either. I had read something about issues like this being related to kernel versions so I looked into what kernel the compute notes were running. As it turns out they are running the “mainline” kernel, and the kernel notes on the Armbian website state that cpu scaling isn’t supported in this kernel. This means the OS can’t slow-down the CPU if it gets too hot, so the clock is locked at a very low speed for safety’s sake. From what I can tell that speed is 408Mhz, around 1/3 of the speed I expected the CPU’s...

    Read more »

  • Back in Bl...ue

    jason.gullickson06/22/2018 at 19:26 0 comments

    Thanks to a gracious gift of gadgetry for Fathers Day, I was able to spend last weekend increasing Mark II’s computing power to maximum capacity!

    IMG_0844

    Not only did I have the parts, but I also had the time (again thanks to Jamie & Berty) and had very ambitious plans for pushing Mark II across the finish line.

    Things didn’t go quite as planned (I spent one of the two days fighting O/S problems of all things) but I was able to get all 8 nodes online by the end of the weekend.

    IMG_0848

    Since then I’ve been working on running high-performance Linpack (hpl) to see how Mark II’s performance stacks-up against Mark I. After a few runs, I was not only able match the best results I achieved with Mark I, but slightly exceed them.

    The best hpl performance I could get out of Mark I was 9.403 Gflops. Using a similar hpl configuration (adjusted for reduced memory size per node), I was able to get 9.525 out of Mark II. On both machines this result was achieved with a 4×4 configuration (4 nodes, 16 cores) so even though it wasn’t the maximum capacity of either machine, it’s a pretty good apples-to-apples comparison between the two architectures.

    That said, there’s more performance to be had. I’ve learned a lot about running hpl since I recored Mark I’s results and I’m confident that I can figure out what was stopping me from improving the numbers by adding nodes back then. In addition to better hpl tuning skills, I’m able to run all nodes of Mark II without throwing a circuit breaker, which was not the case with Mark I. This makes iterating on hpl configs a lot easier.

    The theoretical maximum hpl performance (Rpeak) for Mark II is 76 Gflops. Achieving this is impossible due to memory constraints, transport overhead, etc. but I think 75% of Rpeak is not unreasonable, which would yield a measured peak performance (Rmax) of 57 Gflops. This would rank Mark II right around the middle of the Top 500 Supercomputer Sitesin 1999.

    crayt3e_005
    RAIN Mark II’s hpl performance should be similar to the room-sized, 300KW Cray T3E (example pictured above)

    Not too shabby for a $500.00 machine you can hold in your hand.

    ganglia_for_blog

    Still, there’s a long ways to go from my current results of not-quite 10 Gflops to 57. I think theres a lot to improve in how I’m configuring hpl and I also think there is work to do at the O/S and hardware level. Minimally I’m going to need to increase the machine’s cooling capacity (right now I don’t even have heatsinks on the SOC’s so I know the CPU throttle is kicking in). So if I’m able to find an hpl config that doesn’t loose performance as I add nodes to the test, and I can keep the system cool enough to make sure cpu throttling doesn’t kick-in, I should be able to get much closer to that Rmax value.

    But even with no improvement this test confirms the validity of the Mark II design. The original goal for this phase of the project was to establish the difference in performance between the Mark I machine and a similar cluster built from ARM single-board computers. The difference, surprisingly is that the ARM machine has a slight node-for-node edge over the “traditional” Mark I.

    Perhaps just as important, Mark II achieves this with significantly less cost, physical size, thermal output and electricity consumption. Once I complete the power supply electronics for Mark II I’ll be able to get more precise measurements of power consumption but even if Mark II operates at it’s maximum power consumption that’s still 1/10 of the power consumed by Mark I.

    This is a significant milestone in the RAIN project and marks the “official” end of the Mark II phase. I plan to finish Mark II’s implementation (wiring front-panel controls for all nodes, designing & fabricating the front-end interface board, etc.) and generate a “reference design” for others who would like to recreate my results (perhaps even offer a kit?), but with these results in hand I can confidently enter into the Mark III...

    Read more »

  • Pump the Brakes

    jason.gullickson05/21/2018 at 15:54 0 comments

    2018-05-19 08.55.40

    Looks like work on RAIN Mark II will be slowing-down a bit for a couple of reasons:

    First, the snow has receded which means work that can only be done during the month we call “summer” takes priority over anything that can be done inside (plus there’s just lots of fun things to do outside…).

    Second, it looks like I misunderstood how the 2018 Hackaday Prize works. I assumed that having your project in the top 20 positions of the leaderboard when a challenge concluded was what they meant by “The top twenty projects from each challenge will be awarded $1000 and will move on to the finals…”. Instead, they selected 20 projects using some other process, and RAIN Mark II didn’t make that cut.

    This is a bummer, because I invested time and effort promoting the competition with the misguided idea that doing so could result in injecting some resources into the project (which could have dramatically accelerated its progress). I should have spent this time working on the project instead.

    But it’s a good reminder to me of the pitfalls of competition, and that any project whose success relies on it is vulnerable to competition’s inherent inefficiencies. I’m glad to have had an opportunity to have these inclinations put in-check while the stakes are lower than they would be later on in the project.

    The whole experience has caused me to reflect on the purpose of the project itself and I have a renewed focus as a result. The next step for Mark II will be to complete the assembly of an ARM-based cluster with the same node count as the Intel-based Mark I machine. Once this is complete, I can duplicate Mark I’s run of the hpl benchmark on Mark II and have an apples-to-apples comparison of the performance difference between the two architectures. This was the original purpose of building Mark II, and once this is known it will be possible to describe an ARM-based system with equivalent power to an Intel-based system and determine at what scale ARM outperforms Intel in terms of processing power vs. total system efficiency (cost, power consumption, cooling, physical space, etc.).

    This will complete the work on the hardware side of Mark II. I can then move-on to both the software aspect of Mark II, as well as using the Mark II hardware as a platform for the development of Mark III hardware components.

    I’ll need around $250.00 worth of hardware to get the system to a point where these tests can be run. I’m selling-off the Mark I hardware in an effort to cover it, but it’s indeterminate how long this will take and as such progress will stall until this is complete.

    Thank-you to everyone who took the time to support the project on Hackaday.io.

  • Names

    jason.gullickson04/28/2018 at 15:03 0 comments

    Based on feedback, conversations and thoughts about the short and long-term goals of the RAIN project I’ve decided to make some changes to the naming convention of the series. I’m renaming RAIN Mark II from Personal Supercomputer to Supercomputer Trainer.

    heathkitcomputer

    The term “trainer” refers to education-oriented machines and systems (typical of the 70’s and 80’s) which I think suits both the form and function of the current iteration of the Mark II machine well. While the overall goal of the RAIN project is to produce an open-source supercomputer, there is a lot of dispute as to exactly what defines a supercomputer and using the term in an unqualified way to refer to Mark II machines appears to be controversial.

    So, instead of wasting time arguing semantics I think renaming the machine puts an end to that discussion while at the same time making the objectives of the machine more clear. This also helps me focus on the objective of this stage of the project and might help the machine connect with the most appropriate audience as well.

    In addition to renaming the current project, I’m also going to begin using the term “Type” to refer to a specific incarnation of each machine in the series. For example, the Supercomputer Trainer will now be referred to as “Type 1”.  I if another machine is designed as part of the Mark II series it will e refereed to as “Type 2”. This makes it more clear where each machine belongs in the overall evolution of the RAIN project (now that I’m producing custom hardware and electronics I need a simpler way to relate parts to the machines they belong to).

    I’ve begun this renaming process which has resulted in some changes in the source repository. Updates to specific components (the front panel, for example) will be applied as new versions of the component designs are produced.

  • Beginner’s Luck (KiCad Part 2)

    jason.gullickson04/23/2018 at 18:12 0 comments

    In the last log I left-off with a 2-D printed version of a printed circuit board I designed in KiCad which was enough to confirm that the board I was designing would fit into the board I’m designing it for. This felt like a lot of progress!

    2018-04-08 10.10.57

    However there was still a lot to do, primarily consisting of making the actual electrical connections between the components on the board. For some dumb reason I thought this would be easy, but I ran into problems getting the traces to connect, or prevent them from overlapping, etc.

    Part of the problem is that I have no real experience in the more abstract (i.e., not related to learning the software) aspects of PCB design. Luckily Bob was willing to help me out and gave me a list of steps to use as a starting point when laying out a board:

    1. Lay out the connectors where they have to go
    2. Logically place groups of components where they need to go (power section should be in one area, microcontroller + decoupling caps in another area, etc.)
    3. Refine component placement to have as few overlapping airwires as possible to ease routing
    4. Route length sensitive traces first
    5. Route other traces
    6. Route power traces, but do a ground pour to make it easier
    7. Tidy up
    8. Lay out silkscreen and be as descriptive as possible

    In addition to this, Bob said traces could be 6 mil wide minimum, but to aim for 10 mil in general and 12 mil for power.

    This advice helped a lot and between this an finding a “mode” for laying traces that worked for me, I was able to connect all the parts and get the board to pass the Design Rules Check (DRC). Bob eyeballed my layout, gave me some tips on improving it and said it looked like it would work.

    panel_driver_pcbnew

    It was around this time that I noticed an error in the pin assignment where the panel driver board connects to the headers on the Clusterboard. Since the driver board will connect using a right-angle connector, but I designed it with a straight connector in mind, pin 1 would end up in the wrong place. This was easily fixed by rotating the connector on the board, but since this connector has traces leading to almost every other component, the layout became a complete mess. Instead of fighting with this, I took the opportunity to redo the board from scratch and apply everything I’ve learned so far.

    The second time around went much faster and I think it looks nicer as well.

    panel_driver_v1.4.0

    After reviewing the design (for what felt like the millionth time), I was reasonably sure I didn’t make another mistake like the connector and decided it was ready to upload to OSH Park for production.

    panel_driver_osh_upload1.png

    This process went very smooth. I don’t have any personal experience to compare it to (being the first board I’ve designed to be produced this way) but the website was easy to use, the visualizations and “checklist” guides were very helpful and I felt like I had a  clear idea of what the final product would look like when they were done.

    Now it’s just a matter of waiting.

    2018-04-21 13.41.03

    The boards showed up a few days early and I was lucky enough to have time to try one out (the minimum order was 3 boards).  I had only enough parts on-hand to complete one, but this was intentional because I assumed that I would have made some mistakes and that I could order more parts after I fixed the design and re-ordered a second batch of boards.

    2018-04-21 15.10.47 2018-04-21 15.13.13 2018-04-21 15.37.55

    After assembly, I tried to quell my excitement and properly check-out the board in steps (inspired by Bob’s article) to reduce the chances of burning-up the new panel driver board or worse, the precious Clusterboard.

    First, I tested the driver board alone to make sure power was flowing to the right places.  Then, I attached it to the Clusterboard and checked the i2c bus to see if it was showing-up correctly. Finally. I wired-up the LED’s and toggle switch and ran the python script.

    2018-04-21 15.38.16 2018-04-21 15.56.04 2018-04-21 16.11.37 2018-04-21 16.20.51

    Much to my surprise, the LED’s lit up and the switch correctly toggled between display modes!

    Something was not quite right...

    Read more »

  • It's time for me to Ki(Cad)

    jason.gullickson04/09/2018 at 16:38 0 comments

    I’ve been putting-off learning to use an EDA for quite awhile. You see, I am old, and I learned how to read and draw schematics when I was young, so I learned how to do it on paper, and it’s very hard to give-up doing something you can do fast and well for something that is slow, frustrating and you suck at.

    In the previous RAIN PSC design, I could get away with using the plentiful GPIO pins available through the PINE A64‘s 40-pin connector to drive the front-panel LED’s and switches.  However, adopting the Clusterboard demanded a different approach and I settled on developing an i2c-based interface. This has worked well on the breadboard, but it requires building an interface board for each node and fabricating something like that out of perfboard (or some other “homebrew” technique) repeatedly is not very practical.

    2018-04-07-13-08-22.jpg
    8 of these are probably not going to fit…

    Regardless, I thought I could at least cobble-together a prototype of the driver board for a single node so I could at least pack everything inside the case for awhile. A couple of weeks have passed as I’ve tried various options and I’ve come to accept that designing a custom board is the only way to go, and the best way to go about that is getting my design into an EDA.

    I’ve put this off for a long time because I’ve made several attempts at learning several EDA’s and it’s always been frustrating. There’s a lot of up-side to learning to use these tools, but paper and pencil is so fast and so natural for me that it’s really had to give that up. Nonetheless, if I’m going to get boards made from my design the options are to either learn this or have someone else do it, and doing yourself makes you smarter so that’s the way I decided to go.

    2018-04-07 18.51.07
    First attempt (schematic is OK, layout on the other hand…)

    I could have learned a number of different packages but I choose KiCad because it’s open-source and one of the goals for RAIN is to be a completely open-source computer. There’s a number of other open-source EDA’s as well, but I got a lot of recommendations to use KiCad and even though it’s considered more difficult to learn, I’ve been told it’s worth it. I also knew that KiCad ran fairly well on my A64-based laptop, and this would allow me to design RAIN’s hardware on the PSC itself.

    I started with Getting Started in KiCad (seems like a logical place to start, right?) and slowly made my way through the tutorial, stopping whenever I became the least bit tired or frustrated. I’ve found this to be a good way to learn something I’m not looking forward to learning, because these forced stops cultivate some excitement and curiosity about returning to the task.

    2018-04-08 10.16.46
    Second attempt, much closer

    I was able to maintain this discipline over the course of a weekend and while I wasn’t able to finish the design, I made a lot of progress and learned a lot more than I expected about the tool. Based on what I’ve learned, I feel pretty confident that I will be able to design this board successfully, and continue to use an EDA for all of my future electronics projects.

    2018-04-08 10.10.57
    Test-fit confirms the form-factor, and also illuminates some design problems

    There’s still  work to do before I finalize the design and send it out to have a prototype board made, but what remains is squarely within my comfort zone.

    I need to determine whether or not the i2c pins on the Clusterboard need to be pulled-up to 3.3v like they do on the A64 (which would be a drag because the Clusterboard’s pins don’t supply 3.3v) and I need to sort-out some software problems on the SOPINE module just to confirm that the driver circuit will work the same as it did when I had it connected to the A64 for testing earlier. Once these two things are sorted I can finalize the design of the board and order a copy.

    With any luck it will work and I’ll be able to pack everything back in the case and focus on the software side of things until I scrape-up enough cash to order more panel drivers...

    Read more »

  • On the road

    jason.gullickson03/19/2018 at 04:50 0 comments

    Just a quick note to say that it will probably be a week or so before I post another update on the project.  I’m currently on a road trip across the western U.S. and won’t be back in the lab until around 04/01.

    When I do get back, I’ll probably work on moving the panel driver from the breadboard to something that can be installed in the chassis.

    Also I see some of you have left comments and even expressed interest in joining the project.  I will get back to all of you as soon as I’m back from the trip.

    Thanks again for the interest and support!

  • Arms & Legs

    jason.gullickson03/16/2018 at 01:24 0 comments

    Technically, the Clusterboard fits inside the case I’ve been designing around, but it doesn’t fit inside the “endcaps” so it can’t be mounted directly to the steel of the case using the mounting holes on the board.

    IMG_0064

    To address this, I sketched-up some adapters to “relocate” the mount points somewhere more appropriate. Since I’m still not 100% sure where everything will belong in the final configuration of the chassis, I came up with a more flexible way to mount the board: magnets!

    IMG_0050

    IMG_0051

    I also need to mount a single PINE A64 board to serve as the “front-end node” so I whipped-up a couple of magnetic mounts for this board as well.

    I wasn’t able to find appropriate magnets locally so I had to wait for some to arrive from The Internet. In the meantime I switched-gears and worked on writing a little software to drive the panel’s display.

    IMG_0062
    This has never happened before…

    When the magnets arrived I was stunned to see they fit perfectly on the first try. However I didn’t have any glue on-hand that was right for the job. Since I was tired of waiting I thought about how I might modify the mounts to eliminate the need for glue. This turned-out to be easier than expected and after two iterations I had working, glueless mounting brackets.

    IMG_0063

    IMG_0072

    All-in-all they work pretty well.  There is some alignment problem keeping all four feet on the Clusterboard from engaging the inside of the case completely, but I think this will be strong enough to safely move on to the next step: stuffing everything inside the box.

    IMG_0066

    IMG_0071

    IMG_0073

    The idea of modifying a part just because you don’t want to run out and buy some glue would seem ridiculous before I had a 3d printer but now it’s easier and faster to just “run-off a new part”. The result is not only faster, but it’s also a better part. This is one of the things I love about 3d printing, the ability to iterate at a pace similar to writing software and letting the robots do the work.

    IMG_0070

  • Ambitions, plans and kits?

    jason.gullickson03/15/2018 at 15:10 0 comments

    It's very exciting to see other people interested in this project.  Thank-you for your feedback and encouragement!

    In addition to releasing my work as open-source (so others can reproduce the machine), I'm considering developing kits (and perhaps a small number of assembled systems) once I have a design that is stable, reliable and repeatable.

    I haven't gone too far down that road yet because there's still a lot to do, and I wasn't sure how many other people might be interested in owning a machine like this, but if there's interest (and I can wrangle the resources) I'll seriously consider it.

    Starting a computer company is something I've dreamed about since I was a kid banging-out BASIC on my VIC-20.  It would be kind of poetic if in doing so, I could help put other kids on the same path.

View all 12 project logs

Enjoy this project?

Share

Discussions

Dan Miller wrote 03/24/2018 at 14:46 point

Does this serve a different use-case than some of the services provided by amazon or microsoft through AWS or Azure designed around high performance computing?

  Are you sure? yes | no

jason.gullickson wrote 04/02/2018 at 14:25 point

I would say they share *applications* more than use-cases.  This iteration of RAIN (Mark II) could be used for some of the same parallel-computing applications you might use a cloud service for, but this iteration is not designed to scale dynamically the way a cloud-based system can.

The goal of the larger RAIN project is to provide a more open *alternative* to cloud-based high-performance computing that doesn't have the complexity, security and privacy problems associated with cloud services.  Mark II is focused on producing a  desktop machine that mimics the architecture of these systems at a cost that allows more people to learn how to write software for and put to use high-performance clusters.  It also provides a platform for the development of new hardware and software to provide higher performance and distributed-computing scalability.

  Are you sure? yes | no

David H Haffner Sr wrote 03/18/2018 at 19:23 point

I love this project and I could have used this thing about 20 years ago :)

  Are you sure? yes | no

ajlitt wrote 03/15/2018 at 16:24 point

I really like the idea of making esoteric hardware public!

There are already HPC job schedulers that do what you describe, but typically manage one cluster / HPC system at a time.  One example that's used widely and is GPL'd is SLURM: https://slurm.schedmd.com/ .  I don't know if it would be possible to extend it to do what you're looking for, but it would be neat to try to make it work on small embedded clusters like yours.

  Are you sure? yes | no

jason.gullickson wrote 04/02/2018 at 14:29 point

I definitely want to leverage existing open-source software whenever possible to both expedite development of the system and also to make it more compatible with existing applications and developer's experience.

When I built the Mark I machine I used the ROCKS (http://www.rocksclusters.org/) which provided a number of tools for building, configuring and managing the cluster.  I'm planning to take some of the tools that ROCKS provides and build on that to make RAIN even easier to own and operate.

I've looked at a couple different job schedulers in addition to other interfaces to make creating and running parallel programs easier, but I'll be sure to check-out slurm as well, thanks for the tip!

  Are you sure? yes | no

riktw wrote 03/15/2018 at 08:43 point

Very nice looking project, I like the oldschool look case with blinkenlights. Is this type of case still available somewhere?

  Are you sure? yes | no

jason.gullickson wrote 03/15/2018 at 14:22 point

Thanks! 

It took me awhile to find that case, but once I knew what to call it I found several on Ebay, here's an example:

https://www.ebay.com/itm/Blue-Metal-Electronic-Enclosure-Project-Case-DIY-Junction-Box-110-250-190mm-USA/263534465032?hash=item3d5be0d008:g:7O8AAOSwf31anhEx

It's not *exactly* what I wanted, I really wish it had a removable top (it would make the machine a lot easier to work on).  I have found some with removable tops but they are considerably bigger (and more expensive) and they don't seem to capture the form of the old machines as well.

  Are you sure? yes | no

Mark Rehorst wrote 03/14/2018 at 16:50 point

This is a very interesting project.  How will you keep the coin miners and DDoS elements out of the network of personal supercomputers, or is that sort of thing the purpose of setting up the network?

  Are you sure? yes | no

jason.gullickson wrote 03/14/2018 at 19:13 point

At the moment I don't intend to prevent any specific use case (even boring ones :) )


That said, I'm working on ways to make this network less valuable for these types of applications.  For example, one thing I'm considering is that the way you earn "credit" to use the network is by allowing other's jobs to run on your system when it's idle.  This puts a natural ceiling on the amount of network power available to any individual user. 

Something like this doesn't make abusing this network *impossible*, but it makes it less attractive than using say, a botnet of pwned IoT devices...

You bring up a good point though and it's something I think about as I noodle on how the global network might work.  That's part of the reason I'm focusing on a stand-alone system first and taking my time before introducing the increased risks of distributed systems.

  Are you sure? yes | no

davedarko wrote 03/12/2018 at 20:00 point

very clean case design, I like that a lot!

  Are you sure? yes | no

jason.gullickson wrote 03/12/2018 at 20:55 point

Thanks!  I had originally planned to use an opaque panel on the final design (this being just a prototype), but everyone seems to like the transparent one.  We'll see if I can keep it tidy looking once as more of the electronics fall into place...

  Are you sure? yes | no

Similar Projects

Does this project spark your interest?

Become a member to follow this project and never miss any updates