Close

Attributes

A project log for aStrA : Aligned Strings format with attributes

Developing a more flexible pointer and structure format that solves POSIX and Cs historical problems.

yann-guidon-ygdesYann Guidon / YGDES 12/16/2024 at 05:080 Comments

So far, the attributes describe the types of characters that are found in a string:

// The attributes/flags
#define MASK_ASTRA_TESTED      (  1)
#define MASK_ASTRA_UTF8        (  2)
#define MASK_ASTRA_UTF32       (  4)
#define MASK_ASTRA_ERROR       (  8)
#define MASK_ASTRA_CONTROL     ( 16)
#define MASK_ASTRA_CHAR_UPPER  ( 32)
#define MASK_ASTRA_CHAR_LOWER  ( 64)
#define MASK_ASTRA_DIGIT       (128)
#define MASK_ASTRA_PUNCTUATION (256)
#define MASK_ASTRA_SEPARATOR   (512)
#define MASK_ASTRA_CHARACTER  \
    ( MASK_ASTRA_CHAR_UPPER | MASK_ASTRA_CHAR_LOWER )

Some hidden masks are defined to drive the UTF-8 decoder. But it occurred to me lately that the remaining bits could be used for other purposes, in particular to help with the list types :

// Flags used by 3/F3 type :
#define MASK_ASTRA_TYPE0   (4096)   // List contains Unicode points
#define MASK_ASTRA_TYPE1   (8192)   // List contains 255-byte string pointer
#define MASK_ASTRA_TYPE2   (16384)  // List contains 65535-byte string pointer
#define MASK_ASTRA_TYPE3   (32768)  // List contains reference to another list

The 4 bits must be cleared in types F1 and F2. Think of them as canaries.

For types 3/F3, this helps with processing certain types of lists, for example if only
MASK_ASTRA_TYPE0 is present, then it's an equivalent to UTF-32/UCS4.

Some implementations don't want MASK_ASTRA_TYPE3 in a list to prevent recursive definitions.

This leaves only 2 bits for extensions...

...

20241228 : only 1 bit left for extension, as one is now reserved to signal that the string does not contain characters but ... types.

Discussions