By whygee on Friday 27 July 2012, 04:08 - Architecture
I just received an email from another "hacker" who raised a lot of interesting questions. I answered by email and I share here some important ideas and insights.
Nikolay wrote :
> To be honest, I was more interested in YASEP16, because that would much harder task to solve compared to YASEP32 (32-bits are generally much more easier to define an instruction format, and if one sticks to fixed-length 32-bit instructions & encoded instruction format, things will look generally at least acceptable, if not even good).
I don't see what is so hard. It just happens that if you can do more, you can do less. it's explained there :
> Let me start few steps away - I enjoyed to see that you're finally pissed-off of using separate tools and sources, and you had started to play with an integrated model for the CPU, that's used for rtl/docs/assembler/disassembler. I think this is so major step, that I can't even find any cool words for this. My understanding was that lots of interesting CPU projects die too early because the burden of supporting all the separate tools in a compatible state for both the developers usage and the community just quickly exausts the peolpe and they stop pushing forward. Having the model-based approach will hopefully leave more time for fun for the hobby-CPU designers :D.
I suppose that you refer to a hobby project in particular, right ? :-) In fact, building a whole, free, self-contained and EASY TO USE toolset was the main motivation : I think that the YASEP is a sub-standard architecture as it is now (yet it is being polished), but since it is so simple, I can also work on getting the tools RIGHT for a more "potent architecture" (whether totally new or dusted off from the archives...)
> Another interesting thing to see was that the YASEP16 & YASEP32 partially share the instruction format. I was wondering whether the actual intention was to have forward binary compatibility with the YASEP32?
This part of the architecture is not well understood and I still struggle to explain it correctly. It is explained there http://yasep.org/#!doc/16-32
YASEP16 and YASEP32 are binary compatible on many levels (see the link above). They do not "partially share" the instruction format. The instruction format and decoder are 95% identical. Bus widths change and some 32-bits specific instructions are invalid in 16-bits mode. That's all.
In fact, you could create a single CPU core that executes 16-bits and 32-bits core with only minor alterations of the decoder. Switching from 16-bits to 32-bits mode should only affect memory addressing and organisation.
Furthermore, I once needed a MCU that would only handle 14-bits data : I see no real limitation that prevents me to make a N-bit CPU (N<=32) as long as the bus width is equal or larger than the instruction address bus. Then YASEP16 becomes just a particular subset of the architecture family.
> Now some questions for the instructions - I looked at the instruction formats and also at several instructions. Is there any difference between FORM_iR and FORM_Ri (used by GET & PUT)?
This is addressed in http://yasep.org/#!doc/forms#FORM_iR :
"This is just another way of writing iR when the instruction uses the immediate value as a destination address."
In other terms, it's just to make the assembly language conform to the rule 1 that the destination is always the last part. See http://yasep.org/#!doc/asm#asm_ila
Physically, iR is encoded as Ri :
http://yasep.org/#!ASM/asm#a?PUT%20A5%204 : PUT A5 4 => 4762h http://yasep.org/#!ASM/asm#a?GET%204%20A5 : GET 4 A5 => 4742h
The immediate "4" stays at the same place. However, it is clearer to keep a writing rule and the source operand is easier to spot, it does not depend on the opcode.
> According to basic math instructions - I saw that ADD & SUB generate Carry/Borrow, but I'm not sure whether these instructions actually use this flag. Generally it's usefull when implementing arithmetics with integers wider than the CPU registers (and it's major pain in the ass when it's not available).
Carry/Borrow are used only by conditional instructions.
In practice I have seen that ADDC and SUBB are quite rare and I couldn't justify the added opcodes. Having a specific set of conditions, however, is far more interesting. So if you want to add with carry/borrow, it's a bit awkward but here is one way :
; R1 + R2 => R3:R4 ADD R1 R2 R3 ; generate the carry ADD 1 R4 carry ; suppose R4 was cleared.
It's a bit unusual, but possible. All the necessary data are here, and if you want to do number crunching, use a more appropriate CPU :-) This one is for doing "control" stuff.
> Other things for commenting - I was a little surprised to see the assembler format like "MOV Rx Ry" and I was wondering what influenced you when designing the ASM syntax?
Experience with several architectures :-) Before F-CPU I had already made a few assemblers for existing and hypothetical architectures. Simpler is better but expressiveness and consistency must not be forgotten : one must see the intent in the source code.
> SHR, SAR, SHL, ROR, ROL - I was pleasantly surprised to see a full set of shift operations on a small micro. I have also one question about the shift/rotate operations - I didn't see anywhere that the Carry flag is used/updated by these opcodes. Generally playing with the Carry is used when converting between receiving/transmitting 1-bit data (think for software SPI/I2C/UART). I suppose that your intention is to use instead logical operations to extract the LSB, like "AND 1 R3" for example?
Shift/rotate don't take the carry. I can't remember how many times I had to clear the carry flag before doing a shift on "another widespread architecture". How annoying.
If you want to shift a bit out, there's an easy way :
ADD R1 R1 ; => carry ! OR 1 R2 CARRY;
Or you can :
SHL 1 R1 OR 1 R2 MSB1 R1
Or the other way :
SHR 1 R1 OR 1 R2 LSB1 R1
Another method, if you want to have an arbitrary bit order, is to use a condition on R1's individual bits :
; count the number of bits set in R1's 16 LSB: MOV 0 R2 ADD 1 R2 BIT0 0 ADD 1 R2 BIT0 1 ADD 1 R2 BIT0 2 ADD 1 R2 BIT0 3 ADD 1 R2 BIT0 4 ADD 1 R2 BIT0 5 ADD 1 R2 BIT0 6 ADD 1 R2 BIT0 7 ADD 1 R2 BIT0 8 ADD 1 R2 BIT0 9 ADD 1 R2 BIT0 10 ADD 1 R2 BIT0 11 ADD 1 R2 BIT0 12 ADD 1 R2 BIT0 13 ADD 1 R2 BIT0 14 ADD 1 R2 BIT0 15
(just an example, there are better and faster ways)
Note that this feature makes sense in a microcontroller, its purpose is to handle bits. It could be unavailable in a more "streamlined" version, because of pipeline delays. but you never know.
> CMPU, CMPS, UMIN, UMAX, SMIN, SMAX - just plain awesome. "Dude, this is not your grandfather's PIC!"
That's one way of seeing this. Notice that MIN/MAX are not available on certain implementations though. I can play with the inhibition of writeback to the destination register because i don't want to add a conditional MUX in the datapath, too much wire load. That would slow the whole system down. But it can "partially work" with the inhibition trick on some operands combinations.
> The signed 4-bit immediate is the victim of the short instruction format :D.
I think I have found a good balance because short immediates appear very often. However I wish I had room for 8-bits operands. Anyway, the whole thing remains orthogonal and (almost) simple. It's a compromise and there will always be annoying cases.
> I was joking before several days that if we don't have support for immediate values, we won't have any data to load because there's no way to create the data in RAM in the first place (which is not true of course, reading hard-wired constants from a special reg and/or incrementing/shifting/performing bit ops will still provide valuable way to enter data in the programs, but still... it sucks compared to the immediates. Actually, my opinion is that the immediate is always a victim of any ISA - when having fixed instructions you either sacrifice immediates for register addresses and more operands, or you sacrifice the operand count (and reuse one of the operands as both source & destination, doh), or you have multiple instruction formats that had to be decoded simultaneously in order to provide the needed flexibility (that's not cool, but it's inevitable price to pay).
That's the same dilemmas for everybody :-)
> About the instruction condition codes - are these available only for the YASEP32? And also, are they emulated by the high-level assembler when generating machine code for YASEP16?
They are available for both YASEP16 and YASEP32. The datapath width is not the instruction width.
> Btw, I didn't found a description of all the programmer-visible registers, so I'm not sure what are these used for (they look somehow like memory index & memory window registers, but that's a wild guess).
I am working at this moment on the related page. I work both on the french and english version to keep them in synch. An older version is there : http://yasep.org/yasep2009/docs/registers.html However several things have changed. I'm adding "register parking" and numbers have been changed (R1-R5 instead of R0-R4), plus other subtleties.
> About call/return functionality - I saw that you have done some preliminary work on it. I'm by no way expert on VHDL (I typically write for my hobby in Verilog), but I checked the RTL and to me it looks like that there's no other way to modify the PC register - it's not part of the general purpose register file, so instructions like CALL/RETURN will be inevitable, imho. Nevertheless, I would be happy if you can share your thoughts on this important topic.
It is very important and I resisted for a long while before adopting a particular solution. My main problem is that it's inherently a 2-writes operation. It is necessary to treat PC independently, which raises a lot of issues. But for now, it seems to work in the microYASEP pipeline. It may evolve, for example I have not implemented "call with offset" (CALL2) because i believe it's a slippery slope, but it's "technically possible" so hey...
> PUT, GET - I'm not sure how these functions access the SFRs (they look unimplemented in the VHDL).
They were implemented for a prototype and the SR map has not yet been standardised.
> Btw, I would typically advise against separating the address space (separation in any form) like memory and I/O spaces - it's much more straight-forward to move & manipulate data around with unified instructions and to access memory-mapped peripherals (but in the same address space). Of course, the inevitable price for peripherals/SFRs is the address decoding - it's either partial & ugly, or full and expensive :).
I make a distinction because :
- memory is meant for high-speed, bulk transfers that MAY be performed out of order and with latency. Memory is for data and instructions.
- SRs are "serialising" and immediate effect, critical for control. Memory mappings, inter-threads protection, configuration of peripherals etc. will be done there.
I hope this clears some misunderstandings :-)
20200411:
It was indeed an interesting discussion. It was a signal to me that the YASEP is overly complex and hard to "get" at first sight despite my efforts to document it.
I updated the links to point to http://archives.yasep.org/yasep2013
Discussions
Become a Hackaday.io Member
Create an account to leave a comment. Already have an account? Log In.