• Log #1: The Architecture – Why 5U, 40 Nodes, and a RISC-V BMC?

    Null Runner12 hours ago 0 comments

    Welcome to Project Aether!

    If you are reading this, you probably know the struggle of building a baremetal cluster for OpenStack or Kubernetes at home. Stacking standard SBCs leaves you with a cable nightmare, zero Out-of-Band (OOB) management, and fragile power delivery. Project Aether isn't just another carrier board; the goal is to design a legitimate, modular supercomputer in a box, bringing data-center reliability to the homelab.
    But before routing a single trace in KiCad, we had to define the physical and power limits of the system. Here is the architectural DNA of the Compute Blade.


    ### 1. The Power and Thermal Reality: Why 5U?

    We are designing a backplane that takes up to 10 hot-swappable blades. Each blade hosts 4 compute nodes (supporting CM4/CM5 or custom ARM/RISC-V modules). That is **40 independent nodes** in a single chassis.
    If we calculate the max load of the compute modules, the network switches, and the BMCs, we are looking at a power budget hovering around **1000 Watts**. A 5U form factor is the sweet spot: it gives us the physical volume required to route massive 12V power delivery safely across the backplane, allows the use of standard 120mm/140mm fans (crucial for homelab acoustic comfort), and leaves mechanical room for future PCIe or OCuLink expansion modules.

    ### 2. The Brain: Pivoting to a RISC-V BMC

    To achieve true "Design-to-Cost" efficiency, we made a radical choice for the Blade's Baseboard Management Controller (BMC). We bypassed standard ARM chips and selected the **WCH CH32V307VCT6** RISC-V MCU.
    This chip is a hardware hacker's dream for carrier boards:
    * **Native Ethernet PHY:** It integrates a 10/100M PHY directly. Wiring the TX/RX directly to the backplane for true OOB management.
    * **8x Hardware UARTs:** We use 4 of these to wire directly into the serial consoles of all 4 compute nodes simultaneously. No more PIO state-machine hacks.
    * **Staggered Spin-up:** It controls the SG Micro SGM2588 load switches to power up the 4 nodes sequentially, smoothing out the 100W inrush current handled by our Richtek Hot-Swap controller.
    * **Telemetry:** It monitors the TI INA226 (I2C) for surgical power metrics and TMP1075 sensors for thermal zones.

    ### 3. The Network Aggregation

    Instead of routing 40 individual gigabit cables, each blade features an on-board **Realtek RTL8372N-CG** L2+ SDN Switch. It aggregates the 4 nodes and outputs a single 10GBASE-KR high-speed link straight through the edge connector to the backplane.

    ### What's Next?


    Let me know in the comments what you think of the CH32V307 choice!