-
MORE SPEEEEEEEEED!
09/04/2017 at 20:16 • 0 commentsAt last, after some long evenings working through various issues I've rewritten large portions of the TinyFPGA Programmer firmware and Python module to be much, much faster.
How fast? For a small design it can erase, program, and verify flash in 3 seconds. For a large design utilizing the entire FPGA it takes about 10 seconds. For comparison, the official Diamond Programmer and Lattice Download Cable takes about 15 seconds for the MachXO2 1200 FPGAs.
Fast bitstream program time matters because it means you can verify your changes on the real FPGA faster. Fast programming of flash means you don't have to worry about a power glitch or power loss to the board wiping out the SRAM configuration. The latest configuration bitstream will always be loaded.
How did I enable such a large improvement in speed? It comes down to recognizing the inefficiencies in the system and implementing optimizations that work around them:
Bitbanging the PIC's GPIOs over USB is slow
It increases the amount of USB traffic the PIC needs to process and doesn't allow for a fast inner loop. Below is a waveform of the JTAG pins while the TinyFPGA Programmer is writing the #TinyFPGA A-Series FPGA's flash. The sections marked A are times when the firmware was processing incoming and outgoing USB packets. The sections marked B are times when the firmware was actively driving the JTAG pins, but was only able to achieve about 15KHz. These two inefficiencies add up to a lot of wasted time.
The solution is to add commands to shift many bytes worth of data all at once. This reduces the overall amount of USB traffic the PIC needs to process and allows for a very tight inner loop.
Synchronous communication over USB is slow
Writing a command to the PIC over USB, then waiting for a response takes at least a few milliseconds of time. This happened every time the programmer needed to wait for a status bit to clear or verify data from the FPGA. The section marked C in the waveform below shows where the Python application was waiting for a response from the PIC before it would send new commands.
There are multiple optimizations here:
- First optimization is to enable polling to occur completely within the PIC microcontroller. Now the next programmer command can be executed immediately after the polling successfully finishes.
- Second optimization is to allow the PIC microcontroller to verify the data itself without having to send it back to the Python application.
- Third optimization is to send information to the host only if absolutely required. If a POLL command or SHIFT command fails due to a mismatch, a status packet is sent back. Otherwise the status is not sent to the host Python application until it is requested at the end of the programming command stream.
Blocking writes are slow
Every time the Python programmer module writes to the serial port it appears to be a blocking operation and the process would get context-switched. This adds a few milliseconds while the programmer is idle waiting for commands to process.
To hide this latency I increased the buffer to 256 bytes to enable several packets to be queued up to transmit at once. This seems to be enough to keep the programmer hardware fed with commands while the Python application is blocked.
Default Lattice SVF files are inefficient
Lattice SVF files contain large delays within polling loops, and program unused rows unnecessarily.
Now that I understand the programming protocol very well, I wrote a custom JEDEC file parser that determines exactly what JTAG commands to issue. I was able to reduce the wait time between status polls to speed up polling. I was also able to program only the rows that have non-zero data.
Compiler optimizations
A final optimization performed I'm not too happy about. My firmware ran up against the program flash size of the PIC16F1455. I had to get a demo license of the XC8 PRO compiler from Microchip in order to optimize the program flash size. This also had the side-effect of speeding up the serial data shift loops.
The resulting waveform
Time A: The firmware still pauses to process incoming packets, but these occur less often. Additionally the Python application sends more data per serial write operation so there are always commands for the programmer firmware to execute. These gaps tend to be 100-200 microseconds in length now compared to 1-5 milliseconds before.
Time B: Shifting serial data is now much faster. Brute-force bitbanging over USB would operate at about 15KHz, but now the optimized data shifting routines operate at about 1MHz.
Time C: Polling occurs completely within the firmware now, so the next command is executed immediately after the status bit is cleared.
All of these optimizations added up mean that a lowly PIC16F1455 with only Full-Speed 12MBit/second USB can erase, program, and verify the MachXO2 FPGAs at least as fast, or faster than the official hardware and tools from Lattice. I plan on selling these on my Tindie store for less than $10.
The latest code is committed to GitHub. There are a couple of bugs to fix and more testing to perform, and I need to integrate this into the TinyFPGA Programmer GUI, but I am very happy with current progress.
-
More Speed
09/01/2017 at 07:22 • 0 commentsWell I took another look at the MPLAB Code Configurator for the firmware and I realized I was only running the CPU core at something like 8MHz. So I modified the multiplier so it's running at 48MHz and the improvement was significant. Flash program and verify now takes 35 seconds and @Xark reports SRAM programming takes only 2 seconds. Very nice! This is before any bulk serial optimizations have been added. I'm very happy with this result.
Next steps: I want to add the bulk serial optimizations in and see even better performance. Then I will add support for programming .jed and .bit files directly, and finally I will add the module into the TinyFPGA Programmer GUI that exists for #TinyFPGA B-Series.
Once that is done and working to my satisfaction I'll be making some more revisions to the PCB. I think it makes sense to breakout all the PIC's pins so that the board can be used for other purposes as well. So I can add support for UART and SPI along with it's existing GPIO capabilities. That will make it a dirt cheap programmer that can be used for many things.
-
It works! And next steps...
08/31/2017 at 07:29 • 0 commentsI squashed a few more bugs in my Python code and was able to successfully program and verify the flash in a #TinyFPGA A-Series board. At the same time that I'm developing a Python module to communicate with the #TinyFPGA Programmer, @Xark has been working on a solution using Lattice ispVM to more closely integrate with the Lattice tools. He's discovered the SRAM can be programmed in about a dozen seconds using the #TinyFPGA Programmer. Flash programming and verifying on the other hand is taking the Python module about 3 minutes.
I believe flash programming is so slow because there is currently a lot of overhead for each JTAG bit of data transfered. Combine that with verification and it just takes time. To remedy this I'm planning on adding a new serial acceleration command. This command will allow the JTAG data to be transfered across USB about 16x faster and should reduce the overhead in the FW as well.
-
Programmer Python Module
08/21/2017 at 05:44 • 0 commentsI started development of the Python module for interfacing with the programmer hardware. Currently the module reads a standard SVF file from the Lattice tools and attempts to play back all the commands over JTAG. It appears as if it should be working according to my logic analyzer, but there are some bugs to track down. Maybe a few more days.
Started using the MPLAB debugger tools, pretty slick stuff. But it sure looks silly having the PIC programmer stacked on top of the FPGA programmer stacked on top of the FPGA board:
Stay tuned!
-
Programmer Protocol
08/19/2017 at 16:45 • 0 commentsFrom the TinyFPGA Programmer GitHub:
Serial Protocol
The programmer firmware appears as a generic USB serial port when you connect it to a computer. Control of the GPIO pins on the programmer is through this simple serial interface.
Command Format
Commands are encoded as 8-bit bytes with a command type field and data payload. The payload is typically a 6-bit bitmap representing the GPIO pins of the programmer.
7:6 5 4 3 2 1 0 Command Type TMS TCK TDI TDO RC1 RC0 Commands
Opcode Command 0 Configure Input/Output 1 Extended Command (Unused) 2 Set Outputs 3 Set Outputs and Sample Inputs Configure Input/Output
For each of the GPIO pins, set the direction of the pin.
- 1: Set GPIO pin n to INPUT
- 0: Set GPIO pin n to OUTPUT
Extended Command
Reserved for future command expansion.
Set Outputs
Set each of the output pins to the given values.
Set Outputs and Sample Inputs
Set each of the output pins to the given values and return a byte representing the current values of the input pins.
General Usage
For serial interfaces like JTAG this protocol divides the maximum possible bandwidth by 8 from the USB to JTAG interface. This means we might get 0.5MHz JTAG programming speed. That speed is actually fast enough to transfer all the data to the FPGA in a few seconds. However the configuration flash on the FPGA actually needs a fair bit of time after erase and write operations that will slow down the programming operatuon.
What can really slow down programming is the turnaround time for reading back data from the FPGA. For the most part data is going in one direction from the host computer to the programmer to the FPGA. For these cases we can use the
Set Outputs
command and not wait for any data to return. However there are times when we may need to poll a status bit to see if the FPGA has finished an erase or write operation. In this cases we will want to also sample the inputs and check the status. These should not be timing sensitive because the FPGA is already busy.Verifying the configuration data on the other hand could take a long time if not done carefully. The application talking to the programmer should make sure to write as many commands as it can before attempting to read back the data from the serial interface. Rather than paying a penalty for the turnaround time on every read bit, we pay for it after reading dozens of bytes. This should allow read-back of the configuration data to be relatively quick and painless.
-
TinyFPGA Programmer Development Status
08/16/2017 at 01:19 • 0 commentsI'm happy with the basic design of the programmer hardware and firmware. It's really cheap and tiny and that was the goal. There are a few revisions I would like to make however:
- Additional staggered JTAG through-hole header. The JTAG header was designed with staggered holes so that it can be used naked like a socket. However, I discovered some pin headers are thinner than others. Because of this I want to add an additional through-hole JTAG header with more aggressive offsets to use on these thinner pins.
- Surface mount JTAG header footprint. It would be nice to have the option to use a right-angle surface mount female socket for the JTAG connector. This would allow for that.
- Power and status LEDs? These are common...but I'm considering assembling a lot of programmers by hand. The fewer components I have to add the easier it is for me to assemble. At least I can add the footprints.
- 3.3 volt and ground access. A couple of through-holes for power and ground could be useful if you want to power your FPGA board from the programmer.
These changes will be pretty easy, but they are not the next task to tackle. Instead I need to focus on the code used to talk to the programmer. The programmer hardware itself is more like a USB-GPIO device than anything else. The python code talking to the programmer will implement the JTAG protocol and FPGA programming sequences. This is what I need to implement now. The first plan is to implement an SVF JTAG player that will read SVF files generated by the Lattice tools and use them to program and test the FPGAs. Once I have that I should be able to figure out the commands and sequences to program the FPGA over JTAG.
Stay tuned here and the project GitHub page for updates.