Project | Clunky McCluster

« Back to project details Sort by:

Inventory
07/29/2021 at 00:38 • 0 comments
Some old stuff, some new stuff...
I received more stuff and here is the situation:
- Raspberry Pi:
  - 4× 3B+ (making one high performance quad)
  - 2× 3B
  - 2× 2B (together with the 3B, that makes another quad with slightly less performance)
  - 8× 1B+ (that's 2 quad with low performance)
- Hubs :
  - 4× 5 ports
  - 4× 8 ports
- RJ45: I have found 4× very short patch cables, I need more but don't want to order them. So I crimp them myself to measure, I received a bag of RJ45 plugs and I can finally use the old broken cables that were accumulated through all these years...
- SD cards : Found several from old project, and bought 12 during the sales. These should be enough for 3 quads: 6×8GB and 6×16GB. Usually I keep my images limited to 4GB, more is possible but I prefer when the storage is managed properly and centralised. Ideally the SD cards are mounted in read-only to prevent wear out.
- To ease SD card duplications, I ordered a few USB adapters. It seems that my internal port has gone kaputt anyway...
- Temperature management : I should by default stick a heat spreader on all the Pi's CPUs, just to keep the temperature reasonable and prevent the automatic clock throttling from kicking in. An extra undervolted 12V fan will increase airflow without being too noisy.
So far it seems I can shoot for a cluster of 4 quads, or 16 Pis, or 64 cores if I slowly replace the old single-core boards with newer ones. I hope that the overall throughput will help me with my current projects... The nice thing is that it is scalable and features can be added progressively. Another order of magnitude can then be gained once the GPU is harnessed. For now I need brute-force trivial jobs and it could evolve later, for example with distributed memory, so short and fast messages will become critical.
Where am I ?
07/29/2021 at 00:02 • 7 comments

One of the features that this project can provide (if implemented) is with helping each node find its static address.
Usually one would either use DHCP or static IPs for the Ethernet port. DHCP makes it tedious to know who is who and requires a master node to handle probing the network, while static IPs require a manual management, at least modifying each config file during the SD card duplication. Tedious.
The FPGA provides 4 ports and is intended to communicate with other quads, so each FPGA has its own address, that can be setup with jumpers/DIL switches/hex encoding wheels/whatever. And each Pi port is static so it can have its own fixed address.
Let's say there are 4 Pi ports, so 2 hardwired bits, and the FPGA can fetch its own address from a few GPIO pins (to a 74HC165 or 4017 for example, or even charlieplexing? SPI requires 4 wires, I²C only two but is more complex). This port address can then be read/fetched from the respective Pi which will then execute the program related to its address. Let's be reasonable and say we have 6 bits for the quad address, that makes a byte for each Pi which can then configure its own static IP address. That would be the best of both worlds :-)
Another aspect is that, if several quads need to be connected, each quad must know its own address to send and receive packets. No routing is possible without addresses.
Bare metal
07/27/2021 at 21:37 • 0 comments

For now I focus on the Pi3B+ to get quick results with the least amount of efforts and without breaking the already sadly-looking bank. Pi4s draw and heat too much but could be used, as well as many other compatible devices. In fact the key is simply the form factor and in particular the 40-pins header. So "model A"s could be used as well, but I stick to the Bs because they have an Ethernet port that makes management very practical.

Now if I use As, the other management process will be through the Wifi link, which saves cables and a physical hub, but requires a dedicated Wifi manager: thing have only been moved around... But going further, Ethernet and Wifi will not be a great communication medium. The FPGA is already there to help the applications interconnect, so what about... exposing the link to the Linux system more broadly? write a block or character device driver ?

Going further, Linux will have little to do, such as managing the hardware health, configuring stuff, allocating memory and storage... No need of a full-blown distribution, right ? So I'm looking at "bare metal" systems such that the SBC can boot fast, avoid wearing off the µSD card, ... I can start with a basic OS image and remove tons of things, then add a custom application for exposing the interco. Unfortunately, because of systemd, long is gone the time when you could simply add "init=/bin/bash" to the kernel's command line :-(

I even wonder if/how I could let the whole system boot off the interco, thus saving the µSD card altogether. I know people have already made their own bootware, such as Tristan for the Pi3, but this normally goes on flash storage.

But for now and during development, I stick to the standard OS...
FPGA selection
07/27/2021 at 21:12 • 0 comments
For this project, I enlist the ProASIC3 family because
1. I know it very well,
2. It's suited for the task (no need of ultra-high-performance features, the speed is right)
3. The price is OK (look them up on eBay : A3P250 is around $15)
4. The PQ208 package is reasonably easy to solder at home so the end user could swap parts or hack it further)
5. The SW is free (as in free beer), works on Linux and Windows, not as crazy to install as others I've tried, and offers a choice of locks for the licence (it's not crazy constraining). Just make sure you get a compatible FlashPro JTAG probe, or a suitable equivalent.
6. I have stock.
Now let's look at the product table from the official site:

A3P125 is the smallest in QFP208 and is able of minimal functions though one detail matters. Not only are only 133 GPIO available, but there are only 2 I/O banks, read: only 2 independent voltage zones. The others have 4 banks and can have their voltages vary, meaning: you can hotswap. Wouldn't it be nice if you could shutdown, remove or add a node while the cluster is operating?

So the "minimum specification" for my potato cluster (youtube reference) is the A3P250PQG208. It has 3072 LUT3 gates (serving either as logic or DFF) which is enough for normal interco management, and 8 small dual-port SRAM blocks that are easily configured as FIFO (when enabling the dedicated circuit). I have pushed that type of chip easily above 60MHz with real designs and synthesis around 100MHz is possible with some care. This is the range of frequency where the Pi's GPIO pins can operate rather reliably so it's a great match.

The file A3P_QFP208_pinout.txt shows the pinout differences between various chip densities so a single PCB can accommodate most of them. If I can go far enough, the pin layout files will be public of course.
From there on, if the A3P250 is too tight for you, you can look at the A3P400 (50% more resources) and the A3P600 (24 SRAM blocks and 7K LUT3) for when your routing protocols get crazy and you need more buffering (that provides a depth of maybe 4 or 5 FIFOs or 2KB per Pi, which is getting overkill unless you have a lousy protocol).
The top of the line is the A3P1000 with its 32 SRAM blocks and 11K tiles. I don't know what you'd want to do with that, unless you want to integrate a softcore CPU and/or more sophisticated interfaces instead of a basic message-passing link between quads. At least you have the choice.
Beyond that you'll find the A3PE1500 and A3PE3000. They're just massive and expensive. I doubt anybody would use that so I don't check the pinout compatibility. The A3PE1500 however has 6 independent GPIO zones so that could add another benefit (better hotplug support), but at a high cost.
Getting started...
07/22/2021 at 01:41 • 4 comments
This project starts because I need to crunch a lot of numbers in parallel. One of the available methods is to reuse, and then upgrade, my collection of Raspberry Pi left from past projects.

OTOH this particular cluster project relies on the 40-pin GPIO connector which appeared a while back, at the end of the 1st generation. Luckily, my inventory contains 9pc RPi B+ v1.2 from 2014 (aww, 7 years old now...) and this is enough to get started !

Performance-wise, we'd need 8 boards of this single-core ARM at 700MHz to reach the throughput of one Pi3B+, which is quad-core and clocked at twice the speed, or half as fast as my i7 laptop. A cluster of 2 quads with the old boards is then a mock-up, a demonstrator and a prototype, where I will later replace/upgrade the boards with faster versions. The old v1B+ serve to test and weather the bugs and shorts before the more expensive, faster boards enter duty. At this moment, I wonder if the Pi3A+ would do the trick: still fast but cheaper, smaller and Ethernet can then be replaced by onbard WiFi.
I also have a pair of Pi 2B (quad 900MHz), and some Pi 3B+ (quad 1400MHz) should arrive soon. With some basic thermal management measures, I'll try to overclock them a bit.

With all those boards, several quad-clusters can be implemented so I can work on interconnecting the quads. With the planned upgrades, making the cluster heterogeneous, I must not only consider many independent clock domains, but also speeds...

The inventory also covers the necessary accessories :
- Ethernet : at least 3 hubs with 5 port, more might hide here or there. I'll need many short patch cables as well, not sure how many I have left.
- microSD cards : it's sales season so I'm looking around, for 4 or 8GB ones. I found 6 so far.
- Power supplies : moot :-) Power comes through the GPIO port.
- Female sockets : I found the appropriate 2×20 right angled female header and must wait a few weeks for delivery.
- The rest : 5V sources, A3PxxxPQ208, fans and bare proto PCB are in stock. They form the core of the project, around which I add features...
When the proto PCB is validated, I can then open EAGLE to layout the pre-series.

Clunky McCluster

Inventory

Where am I ?

Bare metal

FPGA selection

Getting started...