So far, the attributes describe the types of characters that are found in a string:
// The attributes/flags
#define MASK_ASTRA_TESTED ( 1)
#define MASK_ASTRA_UTF8 ( 2)
#define MASK_ASTRA_UTF32 ( 4)
#define MASK_ASTRA_ERROR ( 8)
#define MASK_ASTRA_CONTROL ( 16)
#define MASK_ASTRA_CHAR_UPPER ( 32)
#define MASK_ASTRA_CHAR_LOWER ( 64)
#define MASK_ASTRA_DIGIT (128)
#define MASK_ASTRA_PUNCTUATION (256)
#define MASK_ASTRA_SEPARATOR (512)
#define MASK_ASTRA_CHARACTER \
( MASK_ASTRA_CHAR_UPPER | MASK_ASTRA_CHAR_LOWER )
Some hidden masks are defined to drive the UTF-8 decoder. But it occurred to me lately that the remaining bits could be used for other purposes, in particular to help with the list types :
// Flags used by 3/F3 type :
#define MASK_ASTRA_TYPE0 (4096) // List contains Unicode points
#define MASK_ASTRA_TYPE1 (8192) // List contains 255-byte string pointer
#define MASK_ASTRA_TYPE2 (16384) // List contains 65535-byte string pointer
#define MASK_ASTRA_TYPE3 (32768) // List contains reference to another list
The 4 bits must be cleared in types F1 and F2. Think of them as canaries.
For types 3/F3, this helps with processing certain types of lists, for example if only
MASK_ASTRA_TYPE0 is present, then it's an equivalent to UTF-32/UCS4.
Some implementations don't want MASK_ASTRA_TYPE3 in a list to prevent recursive definitions.
This leaves only 2 bits for extensions...
...
20241228 : only 1 bit left for extension, as one is now reserved to signal that the string does not contain characters but ... types.
Discussions
Become a Hackaday.io Member
Create an account to leave a comment. Already have an account? Log In.