-
Skipping ahead a bit
08/05/2017 at 02:20 • 0 commentsI've gotten a SAM E70 XPlaind developer board in the mail today. I need to figure out what to do with regards to a development environment still. I am strongly inclined to use an ARM compiler for Mac and the command line, as it's where I'm most comfortable. But the weight does seem to be behind running Atmel studio on a Windows VM.
Meanwhile, some of the nice things about potentially using an ATSAMS70E19 for this is that it has a TRNG built-in. That means that the whole boost converter and avalanche transistor can go away.
But to make up for that, I need to figure out some way to multiplex the HSMCI port, since the S70 only supports a single SD slot. What really complicates the hell out of things is that there are 6 pins of the HSMCI interface for an SD card: data lines 0-3, a command line, and a clock line (along with that is the power and ground).
All of the data signals except the clock are bidirectional.
My fervent hope is that all of the data lines can be shared and that just the clock line can be switched back and forth with a simple pair of gates. If that isn't going to fly, then the only choice is foregoing the HSMCI interface and just using SPI. We already know that works from the current generation of Orthrus.
In theory, a 25 MHz single-bit SPI system could transfer around 3 MB/sec, so having to start from a place of such low throughput would make it hard for the rest of the system to not make it worse, particularly given that there isn't (so far as I am aware) any support for pipelined or multiplexed I/O over USB. A 25 MHz 4 bit setup could do 12 MB/sec, which is much more in line with expectations. The only thing that would really get in the way is the fact that we need to intersperse the AES computations in every 16 bytes of I/O.
Going to a faster ARM processor would let us go from 16 MHz SPI to 25 MHz as well as going to 480 mb/sec USB. But I'm dubious that those changes and the faster AES engine by themselves will be enough for us to crack the 1 MB/sec barrier.
-
Best Product Semi-finals!
08/01/2017 at 20:37 • 1 commentI am overjoyed that Orthrus has been chosen as one of 20 semi-finalists for the Hackaday Prize best product round. I totally did not see this coming (to be truthful, I expected my other entry to move on). I've not been doing a lot with Orthrus of late mostly because the current design as it exists on Tindie hasn't sold even one and I had other more interesting irons in the fire.
But all of that changed today!
The basic functionality of Orthrus as it is today is there, but Orthrus is just too slow to be taken seriously. If this is going to be worthy of the label Best Product, then it needs to be at least within an order of magnitude or so of customary USB/SD mass storage device speeds - something north of 1 MB/sec (instead of the current 150 kB/sec), while retaining the current basic feature set and operational characteristics.
The last long entry held out hope for the ATSAM4E16E, but it only supports full-speed USB. Given our new expectations, we need to find an interface capable of high speed USB (480 mb/sec, not 12). That, of course, will bring with it a whole new set of challenges - primarily getting the interface wiring just right. At first glance, the AT32UC3A4128S looks like it might be a contender. They're $6.10 @ Q:1 from DigiKey and in stock. But in addition to the aforementioned high speed USB challenges, this chip also brings with it the challenge of programming over JTAG (which I've never done before) and BGA reflow (which I've also never done before). And since it's BGA, that means moving to a RoHS reflow process, which - again - is something I've never done. I'm also going to have to figure out how to use the hardware SD interface on this chip as well as adapting the existing firmware to the UC3 architecture generally (the good news is that LUFA does support it).
It's always really nerve-wracking to have so many "firsts" all at one time... the really hard part is if it doesn't work, it's not always easy to tell which of the firsts is the one you've gotten wrong. Fingers crossed.
EDIT:
After a nice twitter conversation with MarkAtMicrochip, another contender is the ATSAMS70N19. There's a nice eval board for the E70, which Mark explained is a superset, so I've ordered one to start getting familiar with the toolchain and architecture and whatnot. One of the remaining questions will be how to multiplex two SD card sockets across a single HSMCI interface, but I can't imagine there isn't some easy way to do that with just a GPIO pin as a "slot select" and some external buffers.
-
Chip choice for next gen
05/21/2017 at 01:13 • 1 commentI had a very nice chat just now with a very nice guy from Microchip (he used to work for Atmel before the acquisition). I didn't ask permission to use his name, so I won't mention it, but he was quite helpful. I went over my requirements for the next-gen Orthrus and he recommended the ATSAM4E16E. I haven't looked myself yet at the datasheet, but he said it has high speed USB, hardware AES (with 256 bit keys) and 4 bit SD support, which is exactly the feature set I need.
Since it requires me to get a whole new toolchain for 32 bit micros, I'm not really beholden to AVR over ARM or PIC (I believe this one is an ARM cortex M4), and (again, I haven't verified it - I'm typing this from Maker Faire) these come in QFN packages as opposed to BGA (I'm familiar with the former as opposed to the latter).
So I think that's definitely a direction to go in.
-
USB VID
05/11/2017 at 16:35 • 0 commentsI've started a GoFundMe campaign to get a USB VID.
If you google around, you'll find a couple of avenues to obtain product IDs for open hardware projects. I've inquired of both of them and the silence has been deafening. The other avenue is to use Microchip's VID, since I'm using Microchip chips. Unfortunately, they haven't fixed their sign-up widget since acquiring Atmel, and they're not answering their e-mails either.
So I have no choice but to "squat" on 0xf055 for the short term and try and raise money to obtain a legitimate USB VID longer term.
I actually hope that this campaign gets enough notoriety to put some pressure on the USBIF to solve the problem of USB VID/PID for small manufacturers and makers. That would be a better solution than all of us trying to raise $5K to get a range of 65,536 PIDs of which we will each use a tiny fraction.
-
Switching off the RNG
05/08/2017 at 16:20 • 0 commentsA lot of folks have said that leaving a transistor in avalanche mode is bad for its long term health, but I'm not convinced. The metrics of that health that continuous avalanche impact are important for the normal use of the transistor, but in this case the transistor has no other function.
Still, if nothing else it seems a waste of power to run the 20v boost converter and avalanche circuit continuously when they're needed only for a few ms every once in a while.
To that end, I've decided future hardware will include a logic output from the controller to the boost converter's !SHDN pin to allow the avalanche supply to be turned on and off.
But there's a wrinkle there: you can turn off a boost controller, but there will still be a conduction path from the boost input supply through the inductor and catch diode to the output. Without taking extra steps, you can never turn a boost supply completely off.
Fortunately, there's an easy solution to this, and it's apparently a classic one. You connect a P channel MOSFET up on the output of the boost converter and connect the gate to the input power supply. P channel MOSFETs are high impedance ("off") when the gate voltage is (nearly) equal to the source voltage, which will be the case when the boost converter isn't switching. Usually you turn on a P channel MOSFET by dropping the gate voltage, but in this case it will turn on because the source voltage will rise relative to the gate. The result is a true on-off controlled "high" voltage power supply. Exactly what we want. This doesn't switch the inverter chip on and off, but that's ok. It should wind up in a stable state without any input and take relatively little power on its own.
-
On diffusion
05/08/2017 at 04:39 • 0 commentsIn searching around for information about the state of the art in WDE, I came across this particularly interesting article.
One problem with whole-disk encryption is that you're generally not allowed to alter the block size. At this point, it's almost completely universal that we use disks (or pseudo-disks) that are simple one-dimensional arrays of 512 byte blocks.
One desirable quality of encryption is that you'd like to know if someone tried to tamper with the ciphertext. In general, this means either using authenticated modes or adding a MAC to the ciphertext. Unfortunately, this means that the ciphertext (or ciphertext plus MAC) is longer than the plaintext. For WDE, this is untenable.
Since we can't add any bits to the block to authenticate the content, the best we can do is try to use encryption to perturb errors so that an adversary can't, for example, be allowed to flip arbitrary bits in the ciphertext to flip the same bits in the plaintext. Such an adversary would be able to modify files in place, which is almost as good as being able to read them.
XEX (or XTS) will cause a 16 byte corruption in decrypting a block that has a single bit flipped. That blunts an attacker's ability to modify files. It would, however, be better if the mode could cause an entire block to be corrupted beyond recognition if a single bit of the ciphertext is altered. This property is called diffusion. Diffusion and confusion are two basic properties of a cipher. Confusion means that each bit of the ciphertext relies on more than one bit of the key, and that different bits of the key combine in an unpredictable pattern to alter bits of the ciphertext. Diffusion means more or less the same thing with regard to the plaintext during encryption and ciphertext during decryption. Altering one input bit will cause radical changes to the entire output. Both confusion and diffusion are necessary to prevent statistical analysis of a cipher. This was all worked out by Shannon in 1945.
Ideally, we'd use a 4096 bit block size cipher for WDE, but that isn't practical. XEX provides confusion by perturbing the plaintext and ciphertext on both sides of the encryption operation, but because it handles each 16 byte AES block individually, it supplies no diffusion.
So far as I can find, since the BitLocker post was written, there haven't really been significant advances on the diffusion front for WDE. So far as I am aware, most solutions still use plain XTS (or XEX), meaning that a single bit flip will cause a 16 byte aligned block diffusion error and no other changes beyond. It certainly blunts bit-flipping attacks, but doesn't really eliminate every possibility of efficacy.
What does this mean for Orthrus? Not much. Orthrus differs from most WDE systems in that Orthrus isn't really intended to be a primary volume (not something on which you'd install an operating system to boot) so much as an offline storage system. It's intended to take away the job of key management for a particular, limited use case. So we're going to stick with XEX.
-
The impact of XEX
05/07/2017 at 03:14 • 0 commentsJust for completeness' sake, I coded up an implementation of XEX for Orthrus just to do a speed comparison. It's a third slower - around 150 KB/sec instead of 225 KB/sec. I'm fairly confident that most of this stems from the fact that the encryption cannot be precomputed in the background and must be done interactively as the block is read and written from the card. It's not as bad as I had feared, but it's certainly an impact on what is already quite a slow mass storage device.
Still, I think the weakness of straight counter mode make the changeover to a very widely used encryption mode for the given purpose seem like a good move. With this change, we can truly say with a straight face that we're doing whole-disk encryption using universally accepted standards.
Incidentally, if you google it, you'll find that most implementations of WDE talk about using XTS rather than XEX. However, the two are equivalent if the disk sector size is an even multiple of the cipher block size, which is the case for us. Some implementations use two separate keys - one to encrypt the nonce to form the tweak and one to encrypt or decrypt the data. However, the value of doing that seems (in the literature) to be disputed, so we just use the same key for both. If we had to pick two different keys, we could do so by cutting the volume ID in half and performing the key derivation twice - once on each half.
-
Crypto standards validation
05/02/2017 at 14:39 • 0 commentsIt turns out that the method I'm using to derive the volume key is just AES-CMAC-PRF as described in RFC-4615. In other words, we're just calculating the AES-CMAC-PRF with the concatenation of the two card keys as the "key" and the volume ID as the "data."
On the other hand, counter mode isn't the best choice for the block encryption. If an adversary can force you to write a known plaintext to a disk block and then observe the encrypted result, they can discover the pre-ciphertext stream for that block. It is then possible for them to trivially recover any plaintext written to that block anytime after that. The only mitigation possible for this scenario is to use a mode that includes the plaintext in the cipher usage itself (rather than just XORing it as the last step). XEX mode is widely used in whole-disk encryption and has this property. The trouble with this for Orthrus is that it means that the pre-ciphertext can no longer be pre-computed in the background, so performance would suffer, possibly fatally (performance is already quite constrained compared to other microSD card readers).
So Orthrus will retain counter mode at least for the initial version. That means that Orthrus won't be resilient against more sophisticated attacks which assume an adversary can force various requests of his choosing.An improved performance version of Orthrus would have high-speed USB and perhaps a native SDHC controller of some sort. There are more sophisticated microcontrollers that have these features, and they might have the horsepower to support XEX mode as well (and use AES-256 possibly), but they're 144 pin TQFP or BGA packages and at least double the price of the current device. Not out of the question, but not... today.
-
USART in SPI master mode FTL
04/30/2017 at 23:57 • 0 commentsI took a scalpel to my Orthrus prototype today to swap the wiring for MOSI and SCK so that I could try out USART0 in SPI Master mode. It took quite a bit of swearing to get the kludge wires to work, but they finally do, and at least with the code I've written, USART0 in SPI master mode is around 5% slower than straight-up SPI.
I'm surprised by this, but I've stared at the code and experimented with it for a while now and I can't see any improvements to be made. The USART code works, it just doesn't work any better.
So with that, the final performance numbers I'm getting on a small variety of different SD card makes is around 225 KB/sec.
It's possible that a future version of LUFA might bring improvements in performance - in particular the ability to use ping-pong buffers might be a big boost (if it's the USB performance that's throttling the system). To test that out, I replace the disk block read method with one that skips all of the SPI stuff and just reads zeros. That achieved a throughput of ~270 KB/sec - only 20% faster than actually doing the I/O properly.
So with that, I'm going to declare that v2.0.1 is ready for prime time. v2.0.2 just swaps SCK and MOSI. I will keep that change going forward just in case there's some sort of epiphany down the road that makes USART SPI mode work, but there's no reason not to release the current design now.
-
DMA FTW!
04/29/2017 at 04:47 • 0 commentsAfter a lot of fussing around this evening, I finally got DMA based AES working.
It turns out we have to use 3 DMA channels to get it working - one each to transfer the key and nonce into AES and the third to transfer the pre-ciphertext out. The first two can run simultaneously and there's tricky logic in the ISR (it's common for both of those channels) to figure out when both transfers are finished before starting AES. The third channel triggers on AES completion, and its ISR checks for completion, increments the counter and kicks off the two inward channels.
The net result is a 20% speed boost. We're now up to 220 KB/sec. And that tops out this hardware rev. We'll have to wait for the next one to come back to see how much (if anything) we get from USART in SPI master mode. And that will likely mark the completion of the project.
EDIT: If that wasn't enough, I followed it up with automatic AES triggering. That gets rid of the first two ISRs, which gives us another 5 kB/sec. Now AES automatically starts when the key and data are filled in, and then channel 2 is triggered when it's done. The ISR for channel 2 just checks for completion, increments the nonce counter and triggers channels 0 and 1.