Memory model
12/23/2024 at 02:02 • 0 commentsA big part of the programming model is the memory model : the organisation of data in memory, their roles and restrictions...
POSEVEN follows the CDI (Control/Data/Instructions) model and has three memory spaces that are independent from each other : a pointer for one area has no meaning in another area. All areas can be paged out and cached.1) Stack
The first and most simple memory space is the control stack. It is not accessed directly or indirectly by the program, hardware manages the addressing and the pointers have tags automatically added to them to prevent any confusion. Each level of stack contains two words : a data word and a pointer+tags word. Each level counts as "one" increment of the stack pointer, there is also a couple of "limit" indices to help catch abnormal conditions. This separate and protected stack is essential for the implementation of fast and efficient calls to other modules. There is no special constraint to the size of the index or the stack area, but it is usually limited by the word size (32 or 64 bits).
The stack space is specific to the thread. Nothing else (except thread #0 and its surrogate) can access it, and the only reasons to access it outside of the running thread is for setup/diagnosis/debugging.
2) Program
The second area is the program space. All instructions have the same size so incrementing the instruction pointer (PC) directly points to the next instruction. There is no data and the program can not read "data" from this space. The address size for a module is limited to 24 bits, probably even 22 bits (amounting to 16MB). The module can address itself with direct addresses embedded in the opcodes. Indirect or computed addressing is strictly limited.
Each program space is specific to a module. A IPC instruction switches the thread to a different module with its new program space. Pointers/instruction addresses for one module are irrelevant to a different module.
The program space is mostly homogeneous. The exception is the first 64K instructions which, at certain aligned addresses, can contain IPE instructions to allow jumping from a different module.
3) Data
Data belong to its own space. It is not homogeneous in its function, but the structure is always the same: a linear space made of 8-bit bytes, with a pointer that is as wide as words (32 or 64 bits).
The MSB of the address/pointer directs to one of four main areas: The top MSB is the private(0)/Public(1) flag. Other lower MSB divide this space even more, and the address generators can trap when a pointer calculation overflows or underflows, creating a pointer to a different sub-area.
00 : Data Stack, where parameters and recursive short-term blocs are stored.
01 : Local/Thread-private data (private heap, local/private data, variables)
10 : Module shared data, constants, holding the module's state, semaphores, configuration...
11 : Interchange area, used to allocate buffers that can be sent to other modules (message passing, "shuttles")
A pointer or address has a different meaning and shareability depending on the area/MSB:
- a private pointer (MSB=0) can't be shared at all and will have a totally different meaning in another thread.
- a module-shared area pointer (10) can be shared among threads running said module, but it's irrelevant in any other module.
- The interchange area (11) is the only type of pointer that can have meaning across all the threads and modules, BUT a given page can only be owned by one thread, who can then "yield" it to another recipient of its choice to transfer data.
Overall, it's not simple but it's not overly complicated either. It does not require a complex support hardware/circuit/mechanism and it is reasonably efficient for 32-bit and 64-bit machines. Unlike a monokernel, it starts to make sense when the system implements tens or hundreds of threads and modules simultaneously, even in an embedded system.
07/11/2024 at 00:41 • 0 commentsPOSEVEN is based on a platform (with a proper mix of hardware and software features) that provides/implements the following two fundamental elements :
1. Threads
A thread is a stream of execution : a sequence of instructions that are fetched, decoded and operated upon, one after the other, with the eventual and conditional change in the sequence that is controlled by the program and/or external data.
Nothing new here.
A POSEVEN system may implement an arbitrary number of threads, which may be limited by hardware constraints.
A thread can be started, paused, resumed, interrupted, frozen, thawed, swapped in/out or terminated at any time.
A POSEVEN thread has a "global" context that contains
- hidden, private and public properties
- capabilities, access rights, tokens...
- a protected stack space
- a private as well as a shared data address space.
A thread executes code from one module at a time.
2. Modules
A POSEVEN system hosts a number of modules: they contain an area of code to execute, associated with a common data memory area.
The code is a static collection of instructions to be executed by a thread. The thread can switch from one module to another by "calling" entry points in the desired module's trampoline area, located in the first instructions of the module.
The common data area is accessed only by the threads that execute the code of the associated module. Since any number of threads can execute a module simultaneously, all the code must be reentrant, and the relevant variables must be protected by semaphores, spinlocks or Mutexes.
Threat model
POSEVEN imposes modularity to promote code reuse and contain/isolate the flaws. The POSEVEN system assumes that any thread or module could be "rogue". It is the implementer's job to ensure that the "kernel" module (#0) and the "kernel" thread (ThId#0) provide the appropriate "root of trust".
Any module can call, or be called by, a module that is potentially altered (accidentally or intentionally) so a strict separation of the modules and threads is further required.
Communication between modules and/or threads uses strict safe interfaces:
- The trampoline area lets a module filter calls and decide which is legitimate. Each module enforces their own (potentially dynamic) policy, by reading the "capabilities" and other attributes of the calling modules & thread.
- Data is shared through a common data addressing area, where pointers are valid for pairs of threads. One thread may read and the other may write (but not both).
- Calling a module is often preceded by "shielding" the thread's data stack from reads & writes, and clearing all irrelevant registers to prevent any leaks. Extra data may be left over on top of the stack by a callee module, but it can't alter the data of the caller module.
- Each thread has its own protected stack that can only be accessed by itself, in order, through specific instructions, to prevent leaks, alterations, ROP (Return-Oriented Programming). Only the kernel and an authorised debugger can access another thread's stack's contents.
- Paged memory helps enforce some of the security policies and helps map, or remap, code and data areas. For example : page mapping allows a given thread to access the trampoline area of another module, which enables dynamical module substitution.
- "Resources" (hardware-controlled) may only be accessed by the "kernel" (thread #0) or one other thread, whose number is set by the kernel. In other words : the kernel manages which thread may access a given hardware resource at each time, and only one thread may handle a peripheral at a time, preventing any contention or race condition.
POSEVEN can not prevent hardware alteration or physical bypass of access rights/tokens. However it is designed to prevent, detect and isolate "unwanted" behaviour without getting too much in the way of an application, which is free to define its own policy. Adherence to security and safety rules is the responsibility of the system designer and implementation, but the system is considered unsafe unless proved otherwise.
Name change
12/16/2023 at 03:06 • 0 commentsNPM is no more, welcome the new flashy and witty name :
Yes it's a deliberate pun, which seems to have been overlooked for decades, and the domain names seem uncluttered (yet). So I got .com as well. You never know.
NPM was never intended to remain a definitive name for this projects/development/specification. Due to my ADHD, the name collision was a deliberate personal incentive to find a way better one, just like all those "project code names" names either Aurora, Nebula or Eclipse you see in the industry. I wanted to start the design without being hindered by the lack of a definitive umbrella name, and the ideas matter more than the label, just a few letters, put on them. But this lack was becoming a pain recently. Inspiration never warns anyway, and when it arrives, it strikes with its obviousness.
It's seven characters long and I'd have loved a much shorter one but they are all taken already.
What's after POSEVEN ? POEIGHT. So a system written using these principles would naturally be a POEM. Sweet, but now it's become totally corny.
Anyway, I'm still considering this development in the background, while I work on other things like the #YGREC8. Stay tuned.
Update: #YGREC32 will be the first dedicated platform !
SSS : the Single-Stack Syndrome
09/08/2023 at 17:20 • 0 commentsI remember as a teen, knowing BASIC and PASCAL, I explored the weird world of the 8086 so I could program it efficiently. And I was, and still am, scratching my head over (among other things) the concept of the stack as it was implemented by this CPU...
I knew about stacks already, being familiar with the 6809. Stacks are cool and great. But the 8086 implements frames and contrived, uselessly elaborated. And you know what ? Stacks are one of the main weaknesses of x86, particularly when coupled with C's unbound and unchecked array, it's a highway to bugs.
Stacks are one of the most fundamental structures in computer science and has been studied to death. Having a curiosity of FORTH, I know that a single stack is not a fatality as there are other programming languages with more than one. Indeed FORTH was creating on a computer featuring two stacks. Some languages also implement their own split stack under the hood. And that's very fine. So what happened ?
Well, C happened more than 50 years ago and started on a PDP-7. Belonging to a certain tradition and having to run on tightly limited resources, having only one stack was a compromise that worked well enough for the time, the constraints, so it spread wildly and even overcame PASCAL.
The single stack has one major problem in C : you can't return composite variables on the stack so function must resort to pointers and other stupid workarounds, which makes C a development hell once you go beyond the basic examples. The standard libs are a freak show. And I won't rant again about the insanity that comes from mixing control and variables data.
The thing is : C structures (and the SSS) crystalized in many subsequent designs. Worse : it underpins modern compilers and processor architectures ! The syndrome turns into a plague.
The NPM defines a strict split structure that separates data from control values.
- The Control Stack is tightly controlled and only affected by the program's instructions (CALL, RETURN, IPC/IPE/IPR, THROW/CATCH etc.). The data reside in a dedicated memory space to enforce security. Nothing bad should happen as long as the program and the hardware's integrity are ensured.
- The data stack is user-defined in the user's data space and can be placed anywhere there, in fact there could be any number of data stacks. It's up to the programmer's tastes and needs, who could shoot himself in the foot but this will not break everything. Anyway, one main data stack seems indispensable at least to support "frames" that common languages expect.
This is expanded in the sub-project #A stack.
This is a way to get both safety (the control stack can't be tampered with by data injection, only by modifying the read-only program) and performance (since separate access increases the ILP and there are fewer indirections due to programming kludges). The user can optimise the data stack(s) at will and even not use one when unneeded. The user stacks can grow up or down, be allocated dynamically...
OTOH the control stack enforces security by using tagged values that define the type of the stack's contents. This is discussed at https://hackaday.io/project/8774-f-cpu/log/222000-tagged-control-stack and will certainly expand. This tagged control stack (TCS ?) can mix critical information of the program's flow state, including error handling, inter- and intra- modules calls, or even outer loops. However registers and other states must be saved through the normal datapath : apart from the Instruction Pointer, there is no provision to save the other states on the TCS, whose name implies it's only for control.
Saving the state of the other data can be performed explicitly by code sequences and/or specific instructions. NPM/POSEVEN does not specify context switches.
Executable files' structure
07/17/2023 at 03:24 • 0 commentsThe same file format/structure is used for every executable module : the kernel, the libraries, the applications... It does not use ELF or other existing relocatable formats because it does not need the same features. There is no symbol relocation like with POSIX systems, for example.
The file contains 3 main sections :
- The list of required modules, in full text and official form, in a precise order.
- The data, relocated/moved to the negative range of the private addressable data space
- The code itself, moved to the private instruction address space
Loading such a new module is simple:
- create the list of module dependencies by scanning the textual lists, look the names up and sign the dependency (with the module ID and such) to get a 64-bit hash for each called module. See the last log Code space structure and management. This may become recursive, beware of the depth.
- copy the data to the appropriate location and map the data space
- copy the instructions to the new private instruction space
- scan the first 64Ki instructions to create the lookup table of the entry points.
- call the first entry point to run initialisation.
The first list can be empty: for example a minimalistic kernel or a basic library. These codes can be called by any other modules and threads and it's up to each module to select which rights or capabilities are protected and how. Thus a program can be built with a group of modules that call each other, registering each other's thread or module IDs to enable richer functions without the risk of interference from other programs.
A number of entry points are to be allocated for basic enumeration and general management.
- Entry 0 is for start/init : it's run after all the dependencies have been loaded, it can return immediately (for a kernel or lib) or go on until ... whatever (a program, a daemon, you name it)
- Entry 1 is for suspend
- Entry 2 is for restore/resume
- Entry 3 is called just before shutting down/exiting/unloading the module
- Entry 4 for enumeration of available entry points and getting the versions, name, max. num. of entries, configuration...
When an entry point is called, the module can choose :
- Accept anything anyway
- Check a semaphore to accept to go further
- Check the thread ID (given by the IPE instruction)
- Check the module ID (also given by the IPE instruction)
- reject everything
The rejection can be either a polite one (IPR) or a TRAP (to signal the kernel that something fishy is happening and the caller must be flagged as intrusive).
If the kernel must load external libs, then the bootloader must provide the appropriate access and functions to load more than one module. Ideally, the kernel must provide its own loader though but it's not a critical consideration yet.
The module can use 2 methods, maybe simultaneously:
- mix the code with the entry points. Compact, fast and suitable for few entry points.
- jump from the trampoline zone almost immediately : suitable when many entry points are provided in the 64K instruction zone
The entry points would ideally be aligned to 8 instructions boundaries to align the jump target to the cache lines, but it's not a requirement.
So let's imagine a simple library that provides a single int2str (convert integer to string) function:
- File header (could be "PM2\n") [32 bits]
- File checksum [32 bits]
- Some sort of specifying the ISA/CPU/word and instruction sizes
- Size of the dependency list [32 bits] (0 in this case because no external module is required).
- For each module name in ASCII, a PascalZ with a 8-bit prefix and NULL terminator
- Size of constant data section [32 bits] (if larger than 31 bits, it's clearly a problem)
- Constant section
- "loaded" flag (cleared)
- some constant array such as "0123456789ABCDEF" (the lookup table for ASCII chars)
- Size of variable data section (initialised to 0 and placed below the already negative data range of the above section)
- Size of the code
- Code section
The code will look like this and will load at logical address 0 for each module in its own addressing space (there is no need of ASLR because a pointer in one address space can't address another address space, you must go through the IPC/IPE duo to jump, and returns can't be forced from outside)
IPE ThrID, ModID // entry 0 : init // Save the calling Tread ID to local variable (register) ThrID and the module's ID to ModID // but they are not required here so just ignore the values If Loaded_Flag != 0 then TRAP_FLAG // already loaded, complain to the kernel // More initialisation can be done here Loaded_Flag = 1 IPR // Module is loaded, go on with your day IPE ThrID, ModID // entry 1 : suspend IPR // who cares, not implemented IPE ThrID, ModID // entry 2 : resume IPR // who cares, nothing to restore IPE ThrID, ModID // entry 3 : shutdown IPR // who cares, nothing to save or close IPE ThrID, ModID // entry 4 : enum IPR // who cares, there is only one function to provide, but this could be updated later IPE ThrID, ModID // entry 5 : the actual useful function (no filtering on ThrID or ModID because it's harmless) (do the actual computation and string stuff here) IPR
In other situations, the filtering of ThrID and ModID can be evaluated in the trampoline zone, then the code jumps outside the trampoline to have more room to do its job.
Nothing forces the first instruction to be IPE but that wouldn't make sense otherwise. It is not possible to read the instruction space, only to execute instructions, so maybe there could be code there but why ? So maybe the first instruction entry point is "implicit", no need of an IPE and the kernel calls it directly ?
Anyway the above would be easily mapped to an object-oriented language, with a bit more technical stuff that could be made visible to the programmer.
Important note : all the modules and programs and entry points MUST be totally re-entrant so using the shared values must be protected by semaphores !
Code space structure and management
07/16/2023 at 18:28 • 0 commentsHere is a more formal definition for one aspect of NPM, which was first imagined with the YASEP architecture (see logs 30. Log#30 : Fast and secure InterProcess Communications and 55. Log#55 : More about the IPC instructions)
So here we go (again).
In memory, the computer contains a collection of separate addressing spaces, each of their own size and corresponding to a given code module.
These addressing spaces are instruction-grained: each address points to one individual, sequential instruction, regardless of its size. Thus every index gives a valid instruction and the program counter increments normally. Instruction size or packing is out of the equation for the program itself (fixed instructions sizes are recommended).
Each module could also have their own dedicated shared data space (in the negative range ? for constants, semaphores, configuration, whatever, but it must be "safe") but the actual data must be held somewhere else, in the local thread's data context (that's another story for another log).
The addressing space for the code modules can be any (positive) length, but
- The shorter the better (to prevent feature creep and bug-prone cruft accumulation)
- The first 64Ki is a trampoline zone that could be jumped to arbitrarily from anywhere.
One module can CALL its own code but must go through the trampoline to call any other module, using a dedicated hardware stack and a trio of instructions, that enforce safety and access rights (through user code and strict separation) while maintaining performance (through static and dynamic caching).
The trampoline contains 64Ki instructions that could be addressed by a specific instruction only (called IPC for InterProcess Call), and it may only jump to another specific instruction (called IPE InterProcess Entry) that manages a dedicated, hardware-enforced stack. IPE can only exist in the first 64K of the code space (or it will trigger a fault) and any IPC to an instruction other than IPE will fault as well. The third instruction IPR (InterProcess Return) completes the collection and jumps back to the previous module (indicated by the dedicated hardware stack).
The trampoline is meant to speed up repeated accesses to libraries (for example) so the IPC instruction contains the 16-bit constant index of the desired trampoline entry (while the module number maybe be stored in a register). However a program can not know in advance this index so it must first encode the entry point number, which is later translated by a kernel-driven lookup at module load time.
Each module can not be stored and distributed with the IPC instruction itself, because the entry points will change with time independently from the program, so it uses the IPL (InterProcess Lookup) instruction which traps :
- The handler checks that the module number is valid
- Check that the entry point number exists (not out of range)
- The handler changes the opcode from IPL to IPC and stores the actual trampoline index in the 16-bit constant field of the opcode.
- The handler re-runs the instruction that trapped
Thus on the second run, the cached entry point has been integrated into the running code and saves some time. The interprocess call should be almost as fast as a regular CALL instruction.
If the called module has been taken down or updated, the calling module can simply be "reloaded" into the uncached state, to re-trigger the dynamic translation.
The numbers of the modules called by a given module are also translated at load time. Let's say a computer can load up to 64K modules simultaneously, each module may be allocated a different random number after a system restart. OTOH a program or module encodes the number in its own way.
In particular : each module contains a list of the "human readable names" starting at index 1 (constant index 0 is the microkernel's fundamental functions). This helps "match" the program's dependencies with the loaded modules, which are translated by the kernel (module indices => textual names => running modules indices) and updated in the module's internal tables.
The program loader must extract the list of names of all the called modules, in the same order as in the calling module's list, and for each name,
- Check that the name is valid and available
- Check that the module is loaded in memory. If not,
- the module is read and installed into memory
- its initialisation code (entry point 0) is executed (This could recursively trigger loading more modules)
- The trampoline's lookup table is built by scanning the first 64K for the IPE instruction and storing the indices in the lookup table
Thus, each module can get the current list of the other installed modules from its own lookup table, and load these numbers into registers.There is a potential problem here though:
IPC requires 2 numbers : the index of the entry point and the module to call (a constant) and the number of the module (which does not fit in the instruction and is often indicated in a register). The lookup happens with both numbers but only one is stored in the IPC instruction and there is a possible mismatch.
The first solution is to encode the module number as a constant in the instruction : 255 modules is a lot to be called so 8 bits would be enough but this would require the processor to contain a lookup table for each module.... which is not conceivable.
In the worst case, the multi-level protection mechanisms ensure that any spurious call to an invalid entry point gets reported and the offending calling module gets flagged by the kernel to be shut down.
- Only the kernel can modify the code space of a module, so a program can't fix itself to scan another module's entry points, in search of "widgets".
- Any IPC to an invalid module number and/or invalid entry point will trap and flag the offender
- The other purpose of the IPE instruction is that the callee gets the info from the caller (its thread ID, module ID and other data) so it can allow the thread to proceed to execute desired function. Or not.
With these mechanisms, modules and threads can be filtered to create protected herds of modules to create a larger modular program, shielded from others.
An even better solution would be to "sign" the IPC instructions to link them with the (un?)translated module number, but this requires more room inside the instruction... So maybe this could be hashed with the register-held value (a 32 or 64-bit value)
This system strikes a balance between HW and SW overheads. The hardware features should be easy to schedule without microcode, all the rest is software-assisted hence 1) light on transistors 2) easy to debug and update 3) pipelineable and non-blocking...
It should remain true to the RISC principles and avoid all the pitfalls of the iAPX432 or i386. The ISA must implement the 64K trampoline zone, the 4 dedicated opcodes, some trapping entries and more delicate: the hardware stack (which is different from the normal internal calling mechanism, and tightly protected). Each module should also have a list of the callees and callers (and when the number of callers get to 0, the module can be flagged for flushing from memory).
It is also independent from all the paged memory mechanisms, which could also be used to enforce some of the above protection mechanisms (thus moving even more protection to the software side).
But with only a single, secure, flexible and lightweight mechanism, it implements what is required to make an Operating System : the kernel (module n°0), the libraries and the programs. There is no real distinction between them, except the n°0 which has all the rights.
The watered-down OO Paradigm
06/16/2023 at 03:09 • 0 commentsHow can I easily and quickly describe the programming model ?
It originates from a study of microkernels but now I realise it features "Object-Oriented" traits. So it's a sort of object-oriented microkernel, or whatever...
I was really interested to learn the ideas of GNU HURD, in particular that "everything is a server" and the user/root split is replaced by credentials. Today is looks great to me as I poke holes into the POSIX paradigms, but it was not obvious back in the days, despite the claims of the benefits. It takes time to balance them with the drawbacks, the most prominent one being to re-learn to code, structure, manage this concept, which will percolate slowly, very slowly through a software collection. So slowly in fact that GNU simply provides a POSIX compatibility layer, making the whole effort moot...
OTOH, basic OO programming has some simple concepts: an object is a module that bundles data and code, of which there are private and public ones. Inheritance provides enhancements, and a few more clever concepts that I understand but find no use in my projects. I have written a Java program for a project 25 years ago and I still dislike it, though the necessary boilerplate seems to be more or less accepted, while the similar verboseness of Ada/VHDL is often criticised by newcomers. Whatever.
The convergence of OOp and capabilities-based microkernels however provides something really interesting. The overall structure of NPM can be used to implement a strict OOp system but we're not going for a iAPX432 remake, the ABI is very simple and requires little complexity on the platform side.
The similarity became apparent when I saw that the module's entry points could be seen as an object's public functions. Data are private unless shared with a specific mechanism (zero-copy becomes tricky though). The "init" entry point is close to the "new" or constructor of an object. The more I look, the more I see the parallels and it appears that object-oriented languages could be ported, with some modifications, which would lower the barrier of entry in this new world.
Does that mean I'm making a Java or C++ OS and processor ? No because it will remain language agnostic, though OO code would be quite easy to program and port there. Pure procedural code is probably easier to write but will require the understanding and knowledge of the paradigm, but if I say "it's OOp-like" then people will adopt many prerequisite coding habits. It might even justify the limitation of the size of the entry point area because people will see and understand why a program should not be overly complex. However there is no support for things like virtual tables or anything dynamic.
The BIG problem will be with the data pointers, as there is no "system-level heap". Modules/objects must communicate via private, peer-to-peer channels with some kind of address space right delegation. Most OO programs don't have to deal with different address spaces or access rights tied to an object, because classical OO programs share a single executable's virtual space and the OS does the policing.
OO programs could be made a bit more robust by adopting some of the proposed mechanisms but the classical OO structure is already stringent enough to help with adoption.
Introduction to the new model
05/30/2020 at 18:59 • 0 commentsIn the previous log (3. The cellular allegory), we find that there is some degree of similarity between eukaryote cells and the model I'm describing in this project.
Differences are :
- Binary coding is used, instead of quaternary (i'm nitpicking, I know)
- Programs communicate with direct calls, register values and transient memory blocks, instead of mRNA
- fine-grained rights are enforced by an extra sub-level, under control of a specific promgram that enforces protections, as well as housekeeping for the thread ID and other essential features (like allowing code to load in the executable area)
- Programs are inherently parallel and can be executed by any number of threads simultaneously (whereas DNA transcription of one gene can only be physically performed once at a time, as in the following GIF
Obviously a RNA or DNA strand can only be "sensed" by a single molecule at a time).
I will now try to describe the elements of the programming model :
- The rights
- The programs
- The threads
- The data memory
The rights
are properties and/or credentials that enable or inhibit access to a critical resource, such as
- the input-output ports, or communications outside de execution context
- the paging mechanism
- the program memory (loading and/or reading the code of programs)
- the properties of the programs
- etc.
There is one rule here, inspired by other OSes : it is only possible to drop/lose rights ! Otherwise, any program could get access to resources it shouldn't, by mistake and/or malevolently. So the whole system is designed in a "top-down" fashion where a first/initial program starts with all the possible rights, dispatches them to other sub-programs, with each of them having only the minimum required rights to perform their job.
Surrogate programs can serve as gatekeepers : they perform the I/O taks for example while filtering data and enforcing protocols. They have their own filters for who can use which provided service. This allows dynamic, fine-grained access to necessary features, and even cascading "server programs" while keeping the system "flat" (no "privileged program" because no program has all the rights).
More about this in Basics.
The cellular allegory
05/30/2020 at 05:06 • 0 commentsHere is one way to explain the "vision" of the programming model I try to define.
Initially I wanted to use robots in a factory, but a more natural example abounds on Earth : cells. More specifically : Eukaryotes. They have a nucleus that contains, among others, chromosomes, each made of genes, each written with codons of three nucleotides.
The analogy would then be :
- Each nucleotide equivalent to 2 binary bits, so it makes a quaternary "computer"
- Each codon has 3 quaternary values, or 64 codes, vaguely equivalent to a processor instruction
- Codons are assembled in sequences to make genes, or "computer routines"
- The genes are packed in chromosomes, like a computer's program
- The cell contains several chromosomes, or program, that can "work" in parallel...
- Some extra functions are provided by Mitochondria, such as peripheral management or energy processing...
- Communication is provided by "messenger RNA" (simplified)
Our "new computer" can perform parallel execution but requires of course synchronisation and semaphores. The number of execution threads is not specified and can vary wildly, as it is hypothetical so far.
Our computer has "threads" : this is an active instance of a program. Any number of instances could be running at any time... It depends on the implementation. The program starts from one "chromosome" and can then ask services to other programs/chromosomes. The callee can refuse or accept, depending on its own policy. This is performed with the "IPC/IPE/IPR" mechanism.
Communication is essential and "zero-copy" is required. How is our mRNA implemented ? Small chunks of data would fit in the registers, larger blocks require memory. The paging system ensures that a block of data has only one owner and the sender "yields" a data block ownership to the callee. To prevent dangling and zombie blocks, in case the callee has an issue, the callee must "accept" the new block (otherwise the block is garbage-collected). This block could then be yielded again to another program, and passed along a string of "genes" to perform any required processing.
Paging ensures that any access outside of the message traps. It's not foolproof because data will rarely fill a whole page... Smaller and bigger pages are required (with F-CPU we talked about 512, 4096, 32768 and 262144 bytes)
This also means that there must be a sort of fast and simple page allocator to yield pages of data at high rates, and also provide some garbage collection and TLB cleanup.
There is a shared space and a private area for all the threads. The private area implements the stack(s) and resides in the negative addresses. The positive addresses are shared though only one thread can "own" a block at once. So there are fewer chances of stack disruption.
As described before, the "yield" operation marks a block as candidate to belonging to the callee. The "accept" operation can then validate and acknowledge the reception and eventually remap the block into its private space.
Processor support is required for the best performance but these features could be implemented on existing systems through emulation. It would be slow (yes the Hurd was slow too) but will help refine the API.
Similarly : no mention of a "kernel" can be found : any "program" is structurally identical. All of them provide a function of some sort and the only difference is the access rights they provide. Each "thread" either inherit these rights from the thread that creates it, or can selectively "drop" some of those rights. This is not even a "microkernel" approach if there is no kernel !
However in the first implementations, no hardware provides those features and a nanokernel is necessary to perform these while we develop the software stack.
Read as Zero
05/30/2020 at 03:36 • 7 commentsI'm not sure why it's not widespread yet, or why it's not implemented or even spoken about but...
Imagine you free some memory, and the kernel reclaims it for other purposes. It can never be sure the physical data in RAM can be read by another thread to exfiltrate precious information. So the kernel spends some of its precious time clearing pages after pages, just in case, writing 0s all over the place to clean up after the patrons.
It's something the hardware could do, by marking a page as "read as zero" or "trap on read dirty" in the TLB. Writes would not trap and you can read your own freshly written data. In fact it's as if you allocated a new cache line without reading the cache...
The cache knows about the dirty bits, and a coarser "dirty map" could be stored to help with cache lines. That's 128 bits, one for each cache line if lines are 256 bits wide.
It's still preliminary but I'm sure people have worked on this already... Because scrubbing data was a thing for a long time.