DMA on the STM32H7
DMA on the STM32H7 is a beast, with each incremental improvement as their hardware got better represented by a different interface. There's the BDMA, the regular DMA, & finally the MDMA. The mane one used for accessing GPIOs is the regular DMA.
The mane use of DMA is making a parallel bus by firing data at all 16 lines of a GPIO register & using a timer as the clock. Despite being a 400Mhz core, bit banging a GPIO only goes at 16Mhz, so you need some kind of hardware support. There are a few limitations. The only timers which can drive DMA transfers over GPIOs are TIM1 & TIM8. Only DMA2 can access the GPIOs. The most useful information came from:
https://community.st.com/thread/41701-stm32f7-dma-memory-to-gpio-by-timer-problem
a complete listing which actually works, once you move the address pointer to AXI RAM & fix all the mistakes he discovered. The STM32F7 code is interchangeable with the STM32H7.
https://community.st.com/thread/48054-stm32h7-spi-does-not-work-with-dma
note about the address pointer.
The TIM_HandleTypeDef has an array of DMA_HandleTypeDefs which cause various timer events to trigger DMA transfers.
FIFOMode must be DMA_FIFOMODE_ENABLE & FIFOThreshold is key to maximizing the bandwidth. DMA_FIFO_THRESHOLD_1QUARTERFULL gave the best results.
MemBurst only worked with DMA_MBURST_SINGLE.
HAL_DMA_Start is the command which provides the src & dst addresses. You have to call SCB_CleanInvalidateDCache(); before & after this, since DMA doesn't touch the cache. The address for a GPIO input is (uint32_t)&(GPIOC->ODR) & for the output is (uint32_t)&(GPIOC->IDR)
__HAL_TIM_ENABLE_DMA is the command which starts the actual data transfer, when using timer triggers.
When using multiple timers to drive clock pins & DMA streams, you have to synchronize the timers. This is easiest done by setting all the timer_handle.Instance->CNT registers to starting values based on probing with a scope. All the CNT registers have to be set inside a __disable_irq(); __enable_irq(); block. Similarly, all the __HAL_TIM_ENABLE_DMA calls need to be with the IRQs disabled.
You must call HAL_DMA_Abort, HAL_DMA_DeInit, & HAL_DMA_Init to restart a DMA transfer.
In the STM32H7, GPIO to DMA operations now have to be done in the AXI RAM (0x24000000) or SRAM1, SRAM2, SRAM3 domanes, but not the DTCM-RAM (0x20000000).
Speed limitations
The mane problem is a single DMA stream writing a GPIO from AXI-RAM maxes out at 28.5Mhz. Any higher & the GPIO stalls every 8 samples. The DMA doesn't really directly access memory, but uses a FIFO. The FIFO appears to get starved if the timer fires too fast. The network analyzer project needs 1 writer DMA stream & 2 reader DMA streams to move 10 bits out & 20 bits in.
Using 3 DMA streams to move 30 GPIO lines, the speed drops to 11.7Mhz & the streams just lock up if they go any faster. It's disappointing a 400Mhz core has such slow I/O. The good news is you can copy data to DTCM-RAM (0x20000000) with the CPU & perform calculations without interfering with the DMA transfers.
It should be noted 11.7Mhz is a lot higher than 28.5Mhz / 3, so you can get slightly higher speeds by having more DMA streams in parallel. There was more speed to be had.
Overclocking the STM32H7
In the 3 DMA stream case of 11.7Mhz, it would be nice to get an even 12Mhz. You can get a few percent more clockcycles through overclocking....
Read more »
Your notes and frustrations on DMA are ones that I've seen before. There's not a lot of people writing about their experiences with STM32 DMA so I'm happy to see this, and quite interested in hearing about how you go about troubleshooting these problems. Getting it working is one thing (and I've done some simple DMA proofs of concept) but testing to ensure it's working right is something that I don't know how to approach.