Close
0%
0%

BCFJ

Because I want to create a good-for-all language borrowing qualities from Bash, BASIC, C, Forth and JavaScript

Similar projects worth following

One of those things I have been thinking about for countless years...

A language that can be very simple and easy to use, like BASIC but with the power of Bash and others.

A language that can span many levels, from the startup firmware of a machine, to user applications, from the machine code to the abstract concepts.

A language that is efficient, to build optimised code like in C (not just by making a powerful compiler but letting you guide and iterate the compilation)

A language that can work with interpreted and compiled code (like FORTH but more advanced ?)

...

This language is inspired by Bash/C/FORTH/JavaScript as well as a few others (LISP/Scheme or the Pascal/Ada/VHDL lineage for example) for:

  • ease of programming (no crazy syntax, as Python advocates, so no obscure stack stuff like FORTH)
  • security/safety (inherent checks like Ada)
  • performance (so you can tweak a script into fine-tuned machine code at will)
  • self-sufficience (store the system in a Flash/ROM)
  • interactive console use (like Bash)
  • building the OS
  • support of my CPU's features (so the F-CPU and the YASEP have something to run)

This keeps many nice things we have come to expect from today's programming languages but some concepts diverge :

  • Don't bother with POSIX
  • Enforce sandboxed and separate units of code (safety, modularity, reuse) with capability-based access rights. Just like HURD's "everything is a server".
  • Several subsets can be enabled/disabled depending on the use :
    - machine code generation is allowed only in the compiler and assembler context
    - hardware features, IO, protections etc are allowed only in the kernel modules
    - implicit dynamic features are not allowed for code that will be compiled (like VHDL code that can't be synthesised)
    - code introspection (like in FORTH or JS's "eval") only available in development mode (introspecting code can't be compiled)

The main idea is to create a baseline interpreted language that is used to compile itself. It must be able to generate machine code from its own source code, starting from a basic assembler and evolving into a more featured compiler. The interpreter's command line can also serve as a classic shell to administrate the computer.


Logs:
1. A bit of historical perspective on early language design
2. The need for a preprocessor
3. Note for later
4. Units
5. Typing
6. Standard types
7. Programming is too complex...
8. 2024
9. Typing (for realz)
10. A critical concept
.
.

Chistory.html

The Development of the C Language Dennis M. Ritchie (Bell Labs/Lucent Technologies) Copyright 1993 Association for Computing Machinery, Inc. This article was presented at Second History of Programming Languages conference, Cambridge, Mass., April, 1993.

HyperText Markup Language (HTML) - 63.66 kB - 03/09/2018 at 05:32

Download

  • A critical concept

    Yann Guidon / YGDES07/10/2024 at 01:00 0 comments

    BCFJ (as it is now known) draws some aspects and inspirations from Forth, which is a TIL. I want "introspection", dynamic redefinitions and flexible object code generation.

    But FORTH has a big problem. Well, several but let's focus on this one : the stack with a single type of data (usually a word-wide integer). Hence why I have started to define a typing system in 9. Typing (for realz).

    But that's just one part of the story. I want to replace the way the whole language is structured, in fact I'd love to "structure FORTH" and "Pascalise" it. I want to make a well-defined interface for functions, just like you do in C/Pascal/Ada/VHDL so what is a function and how do you define it ?

    At first glance, a function is first defined by a name (a string type) and a pointer to the code, or address. You can associate some source code and a few ad hoc metaproperties. FORTH gets along with it because the ABI and the callee/caller interface is just the stack, with as much implicitness as possible "to shave layers and get more performance", at the cost that we know:

    • No accountability
    • No explicit interface : you have to run the actual code to know what it actually does and it hits the halting problem (if you ever dare to try formal-proofing Forth code).
    • Type confusion

    Hence why I steer toward the strong typing world. Whenever a language tries to be "smart" and attempts to guess what the user wants, you get WAT-level madness. You can try to explain it away but the layers of semantic and other cruft can't go away.

    So a function is also an interface : Pascal/Ada/VHDL strictly define it, what goes in and what goes out. The interface is a list of input parameters and results/outputs. They both reside on the data/call stack (the function's context), as well as temporary data.

    So here comes the "brilliant idea" that currently drives my research:

    A function can be defined as a name associated to the bundle of 3 structures (the input, the temporary, and the output data). This is of course a high-level, unoptimised view but this is a great guide for the design of a language, as it creates a uniform, simplified picture. All you have to do is define a flexible, convenient, effective typing system, and you get your functions easily, without having to deal with early optimisations (looking at you, Ada).

    When you see things like this, implementation and debugging become simpler than usual compiled languages, without falling into crazy semantic traps as with Scheme/Haskell/JavaScript...

    The "tri-structure" that defines a function is quite easy to turn into a directed-data-depency-graph and it is not difficult to process & optimise, for example to eliminate duplicates or infer copies... Which will be a lower-level view at the generated code, while the original code can remain straight-forward to examine/debug.

    You can imagine a function that is declared as:

    function some_processing( u8 a, u16 b, u32 c)
       returns (u8 d, *u16 e) is
     u8 t, v;
    begin
    ...
    ... some code
    ...
    end;

    This means that the "bundle" is actually a higher-level struct with 3 elements, such as

    struct struct_some_function is
      struct arguments is
        u8 a, u16 b, u32 c
      end argument,
      struct locals is
        u8 t, v
      end locals
      struct results is
        u8 d, *u16 e
      end results
    end struct;

    I hope you get the idea.

    I don't know of a language that does this transform, can anyone comment on this ?

  • Typing (for realz)

    Yann Guidon / YGDES05/20/2024 at 02:41 0 comments

    Hello 2024, here I am again !

    The old log 5. Typing had a few ideas but some water has flowed under the bridges since...

    Concerning the "just an integer" idea, I have discarded it because it's just a bat5h1t crazy time bomb that's furiously ticking and eager to blow just like it did in C. So I have decided that all the types should be fully defined at declaration time.

    But what if I want a "whatever" type, just to get the ball rolling and not care until later ? Enters Mr Cockroft with his insane DEC64 format. Look at https://www.crockford.com/dec64.html ... To say that I endorse it is an exaggeration but in the context of what I intend to use if for (prototype algorithms before refining the implementation), it's "good enough" and provides some convenience for integer platforms. It is somewhat inspired by the JavaScript tradition and provides 56 bits of integerness, some weird scaling rules, but it is not as inconvenient as IEEE754 and I guess I can use if for some DSP work for example.

    So here are the scalar types :

    • Padding (P) (no type, no read or write, no meaning)
    • Unsigned integer (U prefix) (not I ! because Integer is not clearly describing signedness)
    • Signed integer (S prefix)
    • Floating point (IEEE754, F prefix)
    • Dec floating point (prefix D)
    • Boolean (B)
    • undefined ?

    The type upper case letter is followed by a number that is a power of 2, at least 8. So valid sizes are :

    • P : P8, P16, P32, P64...
    • U : U8, U16, U32, U64, U128, U256, U512, U1024, U2048...
    • S : S8, S16, S32, S64, S128, S256, S512, S1024, S2048, S4096...
    • F : F8, F16, F32, F64, F128 (not sure F256 exists or is useful, F40 and F80 exist though)
    • D : only D16, D32 and D64 make sense so far.
    • B: whatever.

    Modifiers:

    • pointer
    • SIMD flag (yes, a SIMD "vector" can be considered as a scalar because it can be held in a register)
    • "const" flag : read-only
    • write-only flag
    • volatile

    padding, ro and wo can be combined into a 2-bit field:

    00 : padding (no read, no write)
    01 : read only (const flag)
    10 : write only (could be a "sink" or dummy)
    11 : normal variable
    

    Even that is not able to fully describe a scalar value. So a sort of "syntax" is needed...

    I'm reinventing some sort of ASN.1 binary syntax in fact! But adapted and constrained to the types I expect to handle under the hood of my toy language.

    Anyway a scalar type descript will not fit into a byte.

    The size field is 4 bits to accommodate 16 possible sizes in bytes:

    8 16 32 64 128 256 512 1024
    2048 4096 8192 16384 32K 64K 128K 256K
    

    256K is a lot... but you never know and the bits are there. You're free to set your own limit for the number of bits you want to support.

    I have also defined 5 types: U/S/F/D/B, so that's 3 bits with some margin. Unicode points are just a subset, for example.

    Then the modifiers:

    • volatile: 1 bit
    • pad/ro/wo/var: 2 bits
    • SIMD : 1 bit
    • pointer: 1 bit (excludes some other flags and types)

    Another property I would like to add is overflow behaviour:

    • saturate
    • wraparound
    • ... (forgot one)
    • trap

    That fits in two more bits.

    The total is 5+3+4+2=14 bits, fitting in a U16 scalar with 2 bits left for extensions. Because I have not found yet a way to describe a fixed point integer yet.

    So what's the purpose of this internal, binary, unambiguous representation ?

    Oh, there are many reasons to do so.

    First it is a great way to compare function prototypes without crazy hassles later.

    Imagine, you describe a function with a list of parameters, it gets encoded in a binary chain, so that chain is all you need to make sure an API matches between caller and callee. Just compare the chain for equality.

    It is the basis for a typing hierarchy and a bytecode-like version of a program, which can be later described unambiguously across languages.

    The remaining 2 bits can encode the type of the U16:

    • 00: scalar
    • 01: array
    • 10: struct
    • 11 : union ? that's how you alias incompatible types and is potentially dangerous, so it is reserved in certain modes. Pointer casting may be another solution....
    Read more »

  • 2024

    Yann Guidon / YGDES05/11/2024 at 08:02 0 comments

    2024 is already here and a few things are brewing.

    What a time to be alive. 

    For now I'm focusing on the structure of the threaded interpreter and how it could evolve to become a front-end for a compiler for itself that is written in its own language.

    https://muforth.dev/threaded-code/ and Wikipedia have interesting ideas.

    It is also very interesting that BASIC or JavaScript have the ability to store their own source code in intermediary form, that is easy to then reinspect.

     .

  • Programming is too complex...

    Yann Guidon / YGDES08/05/2018 at 14:12 0 comments

    In'The Problem With Programming and How To Fix It' and other posts, slashdot nails that the software world has already reached critical mass and is exploding.

    Wirth tried to rationalise algorithmics but Pascal is rarely used anymore. Ada has grown too much and the rest can be called "esoteric"...

  • Standard types

    Yann Guidon / YGDES04/02/2018 at 04:25 0 comments

    BCFJ's fundamental type is equivalent to BASIC's number, or C's "int". It's enough to get a few things done but won't go far...

    Scalar types :

    Defining the size explicitly is important, otherwise many problems appear.

    Integer numbers are either signed or unsigned so the following types should be supported (if the processor can handle it) :

    S8, S16, S32, S64 : Signed integer of 8/16/32/64 bits
    U8, U16, U32, U64 : Unsigned integer of 8/16/32/64 bits
    F16, F32, F64 : Floating point of 16/32/64 bits

    (I didn't add support for logarithmic, unlike in the early F-CPU, which was a dead-end...)

    Characters... Are another thing. ASCII is only a relic now and UTF8 is the norm, which can have a crazy dynamic range. The norm limits the size to 21 bits (assembled from 4 bytes) so there is no actual "character" type as in C because the representation can vary wildly. However, intermediary types are possible for various representations (UNICODE point or byte serialised).

    Strings too can have multiple representations (point or byte) and can't be handled directly in the core language. Yet this is very important and useful so a careful and simple design is required early on. Without proper string handling, no parser is possible... Usually, strings are very easy to process in plain ASCII but I don't want to make an ASCII mode at first because UTF8 will never get implemented later. Is the development of UTF-8 library holding everything back ?

  • Typing

    Yann Guidon / YGDES04/02/2018 at 03:26 15 comments

    What's best ? Pascal's strong typing or JS' weak typing ?

    Both have excellent arguments but also significant drawbacks.

    It's hard to reconcile both.

    Since BCFJ starts as an interpreted language, typing can be weak and in fact, the first type is "a number" (think "int" in C).

    Later, when objects ("blobs") are introduced (à la JS) a stronger typing mechanism can be introduced. The trick is to have/check the "type" attribute, which points to a collection of handlers that then resolve the types, convert or compute values...

    So BCFJ does not enforce a typing mechanism or precise resolution behaviours from the beginning (this removes a LOT of red tape from the language). Later, conventions can be added to prevent inconsistencies or adopt the preferred behaviours.

    So BCFJ starts "naked", like in FORTH, with an "integer" type, which is then completed... See the next logs.


    Oh my...

  • Units

    Yann Guidon / YGDES03/31/2018 at 04:52 0 comments

    Have you ever used Turbo Pascal ? or Ada ? Or VHDL ?

    In TP, you can make a program, but also share code with what's called "units". They are named "libraries" in VHDL.

    In BCFJ, everything is a "unit".

    A "unit" is code, linked to other units by a collection of entry points.

    • One entry point is meant for initialisation (it's called "init", and might be the first ever entry point)
    • Others can provide functions to accepted processes.

    When "init" is void/empty/absent, the unit is comparable to a shared library/dll.

    The "Init" can also be equivalent to the "main" in C/POSIX. Other entry points are not absolutely required but they can provide asynchronous signal handling for the unit.

    Unlike POSIX shared libraries, a direct call is not possible, units have separate and protected address spaces. The calling process must perform an Inter Process Call and send the information through the registers and shared memory spaces.

    Some units can be IPCalled and access the calling process' address space BUT can't use any other space at the same time : they provide shared routines, that's all. Data leaks shouldn't be possible unless explicitly requested...

    Remember : units do not have to trust each other. The caller and the callee should not assume benevolence, security and/or safety. A unit could be replaced by a different version between IPCs, either for maintainance or because something went wrong... So don't let any unit's code access your private data.

    This is where "capabilities" and process properties become essential. In order to let other units call you or accept your call, you need reliable information about their rights. For example you can't let anybody call your "init" entry point (only a given process can do this and its process ID might change, but not its access rights)

  • Note for later

    Yann Guidon / YGDES03/16/2018 at 13:56 0 comments

    Nobody reads the early design notes. But dear future self, I warn you about the dangers of this.

    https://thedailywtf.com/articles/A_Case_of_the_MUMPS

    https://thedailywtf.com/articles/MUMPS-Madness


    ...

    On the sunnier side, https://blog.codinghorror.com/a-scripter-at-heart/

  • The need for a preprocessor

    Yann Guidon / YGDES03/10/2018 at 11:05 0 comments

    Compiles languages often have a preprocessor (I'm looking at you, Ada/VHDL...). Scripting languages don't.

    Since this is based on a scripting engine (with an interpreter that manages files inclusions etc.) there is no need for preprocessing or substitution. The dynamic environment does the preprocessor's work as well as "elaboration" (like in Ada/VHDL). The "source code" contains not only the instructions to execute but also how to execute them...

  • A bit of historical perspective on early language design

    Yann Guidon / YGDES03/09/2018 at 05:31 0 comments

    https://www.bell-labs.com/usr/dmr/www/chist.html has been added to the Files section.

View all 10 project logs

Enjoy this project?

Share

Discussions

Yann Guidon / YGDES wrote 07/05/2024 at 00:27 point

To read :

https://sinclairql.speccy.org/archivo/docs/books/Threaded_interpretive_languages.pdf

https://archive.org/details/R.G.LoeligerThreadedInterpretiveLanguagesTheirDesignAndImplementationByteBooks1981/page/n15/mode/2up

  Are you sure? yes | no

Yann Guidon / YGDES wrote 11/09/2021 at 08:17 point

https://www.youtube.com/watch?v=2mnYf7L7Amw

  Are you sure? yes | no

Yann Guidon / YGDES wrote 10/14/2018 at 18:09 point

I'm practising C quite heavily and teaching Python and this makes me think a lot about what features are good and bad in both...

I'm also considering implications in parallelism, "leaf subthreads" that can be executed more explicitly by VLIW processors, and other ideas...

  Are you sure? yes | no

Yann Guidon / YGDES wrote 09/02/2018 at 13:38 point

I should listen to those who succeeded :-P 

https://www.youtube.com/watch?v=Sg4U4r_AgJU

  Are you sure? yes | no

Yann Guidon / YGDES wrote 05/20/2018 at 01:33 point

A nice find by Peter Forth : http://beza1e1.tuxen.de/articles/forth.html
I'd love to come up with such a simple and clean path for BCFJ :-)

  Are you sure? yes | no

Morning.Star wrote 03/05/2018 at 06:33 point

Nice, its about time somebody made a decent language. Keeping my eye on this Yann :-)

  Are you sure? yes | no

Yann Guidon / YGDES wrote 04/02/2018 at 03:31 point

Don't hold your breath tooooo long, as I've been tinkering with many of these ideas  for as long as I've learned to program. I've often thought "why is this feature so lousy ? Why couldn't it be done differently ?" So I have collected a pile of thoughts...

Now I realise I can't design my CPUs without some sort of decent support language. I can't procrastinate anymore...

  Are you sure? yes | no

Similar Projects

Does this project spark your interest?

Become a member to follow this project and never miss any updates