I have to credit @Gravis for challenging me in the comments. Even though it does not go in the direction he seems to want, feedback is progress and this has brought some indirect useful bits. Even though the broad lines are defined, there are still details to perfect.
A quirky little detail I didn't properly consider before is how to manage or handle the NULL value. It belongs to the type 0 so the 2 LSB are cleared. There is no incentive to swap it with type 3 again, as type 0 is not dereferenced (so no risk of a NULL pointer crashing the program) and type 3 has an easy mask-based routine to fetch the total length up to 24 bits.
However type 0 gives code points and NULL translates to a character of value 0. This potentially causes confusion in cases where a pointer is returned as an error code, like malloc() failing. It shouldn't insert a 0 in the stream (eventually a Unicode glyph that says "error").
Furthermore a "placeholder" value is defined to indicate that the entry in the list must be skipped. Currently this is defined as value 4 (100b).
Something has to be shuffled around to make the whole thing safer. It is easier to re-attribute the less-used placeholder value, giving the following table :
Attribution | MSB | other bits | b2 | b1-b0 |
NULL | 0 | 00 | ||
Unicode point (including value 0) | 0 | Unicode value | 1 | 00 |
Placeholder / skip marker | 1 | 1 | 00 |
The value of the placeholder/skip can now be -4 because Unicode values can't affect the MSB.
I don't know yet how to display a NULL value. Maybe ⚠ - U+26A0 - ⚠ ? The Replacement character U+FFFD � is already used for a different purpose and should be avoided to prevent confusion. U+1F6AB 🚫 or U+1F5F2 🗲 could also represent an error.
The summary of this change is below:

Type UP means "Unicode Point or Placeholder". Now let's transform the specification into working code.
Discussions
Become a Hackaday.io Member
Create an account to leave a comment. Already have an account? Log In.
Consider U+2400, ␀ for replacing nulls
Are you sure? yes | no