Processed 1035 blocks of 16384 total blocks. Total time: 43.820000ms Render time: 43.351000ms Time taken for 100 frames: 4412ms Time per frame: 44120.000us FPS: 22.7
Less than half a millisecond - ore just more than 1% of the render time. About 98.5% of the render time is spent on the 3D rendering code, and of that probably about 99% is spent on computing and drawing individual pixels - not even matrix multiplication or any actual 3D projection. Ok, so let's see how much time we add by just adding some if blocks to check the texture coordinates:
// Almost all textures tile well.
if (kx > 1) kx -= 1;
else if (kx < 0) kx += 1;
if (ky > 1) ky -= 1;
else if (ky < 0) ky += 1;
Processed 1035 blocks of 16384 total blocks. Total time: 46.214000ms Render time: 45.750000ms Time taken for 100 frames: 4652ms Time per frame: 46520.000us FPS: 21.5
That's a big increase, considering its a simple if block. The reason is that the fragment code does an unbelievable number of iterations, so that if block is probably running millions of time per frame. All together the if block adds about 2.39ms, or 5%. That's a lot when you need less than 16ms for 60FPS. If a simple if block like that adds that much time, then what about the other if blocks in the fragment code?
inline void fragmentShaderRaw(DisplayBuffer* display, int x, int y, uint8_t z, Color textureColor) {
uint32_t displayIndex = display->width * y + x;
uint8_t* depthLoc = &(display->depthArray[displayIndex]);
if (*depthLoc < z)
// Discard the fragment
return;
if (textureColor >> 4 == 0)
// The fragment has no color; discard.
return;
Color fragColor = textureColor;
// Apply this fragment to the framebuffer
Color* outColor = &display->colorArray[displayIndex];
if (fragColor >> 4 == 15)
*outColor = fragColor;
else {
*outColor |= fragColor;
*outColor &= 0x7;
}
*depthLoc = z;
}
Clock Speed | Frame Time | FPS |
600Mhz | 45.85 | 21.6 |
720Mhz | 38.2ms | 26.0 |
816Mhz | 33.71 | 29.4 |
912Mhz | 30.16ms | 32.9 |
960Mhz | 28.66ms | 34.6 |
Some of these chips go as high as 1.08 Ghz. Other chips I have can't go above 600Mhz. So there really is a chip lottery with the Teensy 4.1s. The chip I was testing with here started crashing at 1.08Ghz, so I did not put that speed in the table. In any case, it is promising that the speeds are this good at that resolution.
Discussions
Become a Hackaday.io Member
Create an account to leave a comment. Already have an account? Log In.
You might want to look into raycasting-like DDA-style algorithms for this, rather than rendering polygons
e.g.:
https://s-macke.github.io/VoxelSpace/
https://github.com/cgyurgyik/fast-voxel-traversal-algorithm/blob/master/overview/FastVoxelTraversalOverview.md
Are you sure? yes | no
Thank you, I'll have to check this out! It would definitely be great to get rid of the z-buffer. While browsing last night I also found this DOOM port to Teensy 4.1, which seems to run quite well, and may also have some good optimization inspiration: https://github.com/Jean-MarcHarvengt/MCUME I haven't read through that yet, but seems very interesting.
I definitely want to try to get the render time down because I still have to send the image to graphics memory, and that takes time too, not to mention simulating the world. Hopefully I can use DMA for the image but not sure yet.
Are you sure? yes | no