-
A critical concept
07/10/2024 at 01:00 • 0 commentsBCFJ (as it is now known) draws some aspects and inspirations from Forth, which is a TIL. I want "introspection", dynamic redefinitions and flexible object code generation.
But FORTH has a big problem. Well, several but let's focus on this one : the stack with a single type of data (usually a word-wide integer). Hence why I have started to define a typing system in 9. Typing (for realz).
But that's just one part of the story. I want to replace the way the whole language is structured, in fact I'd love to "structure FORTH" and "Pascalise" it. I want to make a well-defined interface for functions, just like you do in C/Pascal/Ada/VHDL so what is a function and how do you define it ?
At first glance, a function is first defined by a name (a string type) and a pointer to the code, or address. You can associate some source code and a few ad hoc metaproperties. FORTH gets along with it because the ABI and the callee/caller interface is just the stack, with as much implicitness as possible "to shave layers and get more performance", at the cost that we know:
- No accountability
- No explicit interface : you have to run the actual code to know what it actually does and it hits the halting problem (if you ever dare to try formal-proofing Forth code).
- Type confusion
Hence why I steer toward the strong typing world. Whenever a language tries to be "smart" and attempts to guess what the user wants, you get WAT-level madness. You can try to explain it away but the layers of semantic and other cruft can't go away.
So a function is also an interface : Pascal/Ada/VHDL strictly define it, what goes in and what goes out. The interface is a list of input parameters and results/outputs. They both reside on the data/call stack (the function's context), as well as temporary data.
So here comes the "brilliant idea" that currently drives my research:
A function can be defined as a name associated to the bundle of 3 structures (the input, the temporary, and the output data). This is of course a high-level, unoptimised view but this is a great guide for the design of a language, as it creates a uniform, simplified picture. All you have to do is define a flexible, convenient, effective typing system, and you get your functions easily, without having to deal with early optimisations (looking at you, Ada).
When you see things like this, implementation and debugging become simpler than usual compiled languages, without falling into crazy semantic traps as with Scheme/Haskell/JavaScript...
The "tri-structure" that defines a function is quite easy to turn into a directed-data-depency-graph and it is not difficult to process & optimise, for example to eliminate duplicates or infer copies... Which will be a lower-level view at the generated code, while the original code can remain straight-forward to examine/debug.
You can imagine a function that is declared as:
function some_processing( u8 a, u16 b, u32 c) returns (u8 d, *u16 e) is u8 t, v; begin ... ... some code ... end;
This means that the "bundle" is actually a higher-level struct with 3 elements, such as
struct struct_some_function is struct arguments is u8 a, u16 b, u32 c end argument, struct locals is u8 t, v end locals struct results is u8 d, *u16 e end results end struct;
I hope you get the idea.
I don't know of a language that does this transform, can anyone comment on this ?
-
Typing (for realz)
05/20/2024 at 02:41 • 0 commentsHello 2024, here I am again !
The old log 5. Typing had a few ideas but some water has flowed under the bridges since...
Concerning the "just an integer" idea, I have discarded it because it's just a bat5h1t crazy time bomb that's furiously ticking and eager to blow just like it did in C. So I have decided that all the types should be fully defined at declaration time.
But what if I want a "whatever" type, just to get the ball rolling and not care until later ? Enters Mr Cockroft with his insane DEC64 format. Look at https://www.crockford.com/dec64.html ... To say that I endorse it is an exaggeration but in the context of what I intend to use if for (prototype algorithms before refining the implementation), it's "good enough" and provides some convenience for integer platforms. It is somewhat inspired by the JavaScript tradition and provides 56 bits of integerness, some weird scaling rules, but it is not as inconvenient as IEEE754 and I guess I can use if for some DSP work for example.
So here are the scalar types :
- Padding (P) (no type, no read or write, no meaning)
- Unsigned integer (U prefix) (not I ! because Integer is not clearly describing signedness)
- Signed integer (S prefix)
- Floating point (IEEE754, F prefix)
- Dec floating point (prefix D)
- Boolean (B)
- undefined ?
The type upper case letter is followed by a number that is a power of 2, at least 8. So valid sizes are :
- P : P8, P16, P32, P64...
- U : U8, U16, U32, U64, U128, U256, U512, U1024, U2048...
- S : S8, S16, S32, S64, S128, S256, S512, S1024, S2048, S4096...
- F : F8, F16, F32, F64, F128 (not sure F256 exists or is useful, F40 and F80 exist though)
- D : only D16, D32 and D64 make sense so far.
- B: whatever.
Modifiers:
- pointer
- SIMD flag (yes, a SIMD "vector" can be considered as a scalar because it can be held in a register)
- "const" flag : read-only
- write-only flag
- volatile
padding, ro and wo can be combined into a 2-bit field:
00 : padding (no read, no write) 01 : read only (const flag) 10 : write only (could be a "sink" or dummy) 11 : normal variable
Even that is not able to fully describe a scalar value. So a sort of "syntax" is needed...
I'm reinventing some sort of ASN.1 binary syntax in fact! But adapted and constrained to the types I expect to handle under the hood of my toy language.
Anyway a scalar type descript will not fit into a byte.
The size field is 4 bits to accommodate 16 possible sizes in bytes:
8 16 32 64 128 256 512 1024 2048 4096 8192 16384 32K 64K 128K 256K
256K is a lot... but you never know and the bits are there. You're free to set your own limit for the number of bits you want to support.
I have also defined 5 types: U/S/F/D/B, so that's 3 bits with some margin. Unicode points are just a subset, for example.
Then the modifiers:
- volatile: 1 bit
- pad/ro/wo/var: 2 bits
- SIMD : 1 bit
- pointer: 1 bit (excludes some other flags and types)
Another property I would like to add is overflow behaviour:
- saturate
- wraparound
- ... (forgot one)
- trap
That fits in two more bits.
The total is 5+3+4+2=14 bits, fitting in a U16 scalar with 2 bits left for extensions. Because I have not found yet a way to describe a fixed point integer yet.
So what's the purpose of this internal, binary, unambiguous representation ?
Oh, there are many reasons to do so.
First it is a great way to compare function prototypes without crazy hassles later.
Imagine, you describe a function with a list of parameters, it gets encoded in a binary chain, so that chain is all you need to make sure an API matches between caller and callee. Just compare the chain for equality.
It is the basis for a typing hierarchy and a bytecode-like version of a program, which can be later described unambiguously across languages.
The remaining 2 bits can encode the type of the U16:
- 00: scalar
- 01: array
- 10: struct
- 11 : union ? that's how you alias incompatible types and is potentially dangerous, so it is reserved in certain modes. Pointer casting may be another solution. That's a Pandora's box waiting to be opened.
Aligned Strings are missing as well.
I also need to define a range, that would be in a struct probably.
But since it is a "bootstrap language" it is not required to support every bell and whistle, right ?
Edit :
I have forgotten the "Function" type...
-
2024
05/11/2024 at 08:02 • 0 comments2024 is already here and a few things are brewing.
- Most importantly #POSEVEN is taking shape
- #Aligned Strings format has been developed
- #A stack is coming to life, boosting POSEVEN and #F-CPUand #YGREC32 !
What a time to be alive.
For now I'm focusing on the structure of the threaded interpreter and how it could evolve to become a front-end for a compiler for itself that is written in its own language.
https://muforth.dev/threaded-code/ and Wikipedia have interesting ideas.
It is also very interesting that BASIC or JavaScript have the ability to store their own source code in intermediary form, that is easy to then reinspect.
.
-
Programming is too complex...
08/05/2018 at 14:12 • 0 commentsIn'The Problem With Programming and How To Fix It' and other posts, slashdot nails that the software world has already reached critical mass and is exploding.
Wirth tried to rationalise algorithmics but Pascal is rarely used anymore. Ada has grown too much and the rest can be called "esoteric"...
-
Standard types
04/02/2018 at 04:25 • 0 commentsBCFJ's fundamental type is equivalent to BASIC's number, or C's "int". It's enough to get a few things done but won't go far...
Scalar types :
Defining the size explicitly is important, otherwise many problems appear.
Integer numbers are either signed or unsigned so the following types should be supported (if the processor can handle it) :
S8, S16, S32, S64 : Signed integer of 8/16/32/64 bits
U8, U16, U32, U64 : Unsigned integer of 8/16/32/64 bits
F16, F32, F64 : Floating point of 16/32/64 bits(I didn't add support for logarithmic, unlike in the early F-CPU, which was a dead-end...)
Characters... Are another thing. ASCII is only a relic now and UTF8 is the norm, which can have a crazy dynamic range. The norm limits the size to 21 bits (assembled from 4 bytes) so there is no actual "character" type as in C because the representation can vary wildly. However, intermediary types are possible for various representations (UNICODE point or byte serialised).
Strings too can have multiple representations (point or byte) and can't be handled directly in the core language. Yet this is very important and useful so a careful and simple design is required early on. Without proper string handling, no parser is possible... Usually, strings are very easy to process in plain ASCII but I don't want to make an ASCII mode at first because UTF8 will never get implemented later. Is the development of UTF-8 library holding everything back ?
-
Typing
04/02/2018 at 03:26 • 15 commentsWhat's best ? Pascal's strong typing or JS' weak typing ?
Both have excellent arguments but also significant drawbacks.
It's hard to reconcile both.
Since BCFJ starts as an interpreted language, typing can be weak and in fact, the first type is "a number" (think "int" in C).
Later, when objects ("blobs") are introduced (à la JS) a stronger typing mechanism can be introduced. The trick is to have/check the "type" attribute, which points to a collection of handlers that then resolve the types, convert or compute values...
So BCFJ does not enforce a typing mechanism or precise resolution behaviours from the beginning (this removes a LOT of red tape from the language). Later, conventions can be added to prevent inconsistencies or adopt the preferred behaviours.
So BCFJ starts "naked", like in FORTH, with an "integer" type, which is then completed... See the next logs.
Oh my... -
Units
03/31/2018 at 04:52 • 0 commentsHave you ever used Turbo Pascal ? or Ada ? Or VHDL ?
In TP, you can make a program, but also share code with what's called "units". They are named "libraries" in VHDL.
In BCFJ, everything is a "unit".
A "unit" is code, linked to other units by a collection of entry points.
- One entry point is meant for initialisation (it's called "init", and might be the first ever entry point)
- Others can provide functions to accepted processes.
When "init" is void/empty/absent, the unit is comparable to a shared library/dll.
The "Init" can also be equivalent to the "main" in C/POSIX. Other entry points are not absolutely required but they can provide asynchronous signal handling for the unit.
Unlike POSIX shared libraries, a direct call is not possible, units have separate and protected address spaces. The calling process must perform an Inter Process Call and send the information through the registers and shared memory spaces.
Some units can be IPCalled and access the calling process' address space BUT can't use any other space at the same time : they provide shared routines, that's all. Data leaks shouldn't be possible unless explicitly requested...
Remember : units do not have to trust each other. The caller and the callee should not assume benevolence, security and/or safety. A unit could be replaced by a different version between IPCs, either for maintainance or because something went wrong... So don't let any unit's code access your private data.
This is where "capabilities" and process properties become essential. In order to let other units call you or accept your call, you need reliable information about their rights. For example you can't let anybody call your "init" entry point (only a given process can do this and its process ID might change, but not its access rights)
-
Note for later
03/16/2018 at 13:56 • 0 commentsNobody reads the early design notes. But dear future self, I warn you about the dangers of this.
https://thedailywtf.com/articles/A_Case_of_the_MUMPS
https://thedailywtf.com/articles/MUMPS-Madness
...
On the sunnier side, https://blog.codinghorror.com/a-scripter-at-heart/
-
The need for a preprocessor
03/10/2018 at 11:05 • 0 commentsCompiles languages often have a preprocessor (I'm looking at you, Ada/VHDL...). Scripting languages don't.
Since this is based on a scripting engine (with an interpreter that manages files inclusions etc.) there is no need for preprocessing or substitution. The dynamic environment does the preprocessor's work as well as "elaboration" (like in Ada/VHDL). The "source code" contains not only the instructions to execute but also how to execute them...
-
A bit of historical perspective on early language design
03/09/2018 at 05:31 • 0 commentshttps://www.bell-labs.com/usr/dmr/www/chist.html has been added to the Files section.