Project | reprint: modern printf

« Back to project details Sort by:

Code: Exponential notation
04/12/2016 at 16:00 • 0 comments
Exponential notation (aka 'Scientific' notation) represents a numeric value as mantissa and exponent. The mantissa is value in the interval [1, 10) in decimal notation. This notation is convenient for numbers with many digits (really really small fractions or larger numbers). printf only prints floats or doubles in exponential notation, as
```
/* Would print 4.200000e01 */
float x = 42.0;
printf("%e", x);
```
In reprint, exponential notation is not bound to the type of input; it is simply a method of output. Both integers and floating point values can be represented in exponential notation:
```
/* Print 4.200000e1 */
float x = 42.0;
reprint("\f.fr", x);

/* Print 4.200000e1 */
int y = 42;
reprintf("\f.r", y);
```
Printing integers in exponential notation may seem silly, but it's perfectly valid from a mathematical standpoint. If you are dealing with very large noisy counters, then exponential notation could neaten their appearance.
Internals: the asset of GOTO
04/08/2016 at 12:06 • 3 comments

TRIGGER WARNING: The following content may harm the sensibilities of those who hate GOTO.
The reprint_cb() is a state machine that interprets a format string and outputs characters based on its current state. It is natural in its control flow that from a single dispatch point, the correct code segment is executed to say, output a numeric digit or bitfield. Typically, state machines are implemented with an integer representing the state and a corresponding switch statement that peforms the dispatch.
On embedded systems this wastes space and time. switch() statements entail a lookup table at best, and a sequence of if statements at worst. Stepping through the generated code instruction by instruction makes one very aware of this waste. Consequently, I wanted to do what our forefather assembler programmers could do: JMP or BR to an arbitrary address. Unfortunately, these capabilities were banned from Standard C to protect the masses.
However, gcc supports labels as values. Thus, instead of maintaining an integer state, I can store a starting address to the segment I want to execute next (hint this is reprint's program counter). Instead of wading through if statements and jump table lookups, this is just a single instruction branch. The code size and execution time savings easily become apparent on an MSP430, which is the first embedded processor to run reprint.
References:
https://github.com/codehero/reprint/blob/b683bbac82bf796f5aa68ad6c2b894d698c5b4e7/src/reprint.c#L283-L299
https://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html
Great write up by Eli Bendersky
http://eli.thegreenplace.net/2012/07/12/computed-goto-for-efficient-dispatch-tables
Code: Fixed point output
04/06/2016 at 20:20 • 2 comments
Fixed point on microprocessors is typically preferred in place of floating point. The Naive programmers can unknowingly encumber their firmware with printf, strtod, and associated if they do not know this. reprint supports fixed point output.
```
/* Printing to the hundredths place -> "42.042" */
int x = 42042;
reprint("\f3<r", x); 

/* Printing to the hundredths place with printf*/
printf("%u.%03u", x/ 1000, x%1000);
```
In reprint, we simply load 2 into Register 3 (identified by the "<" character). This is the amount we shift the decimal point to the left. When printing this requires no extra calculation. Oh, and if the 2 is omitted, the shift factor is specified as part of the varargs.
In printf, we much calculate 2 separate values, a division by a 100 and a remainder, in order to split up our source value into the integral and fractional part. We must also remember to zero pad the second number so the leading zeros show up.
Which one do you think is simpler?

Code: Printing bitfields

04/05/2016 at 03:48 • 1 comment

Data that is tightly packed typically ignores 8 bit boundaries, which entails much shifting and masking. reprintf eliminates the need for the programmer to their own shifts and masks. The following code shows printing an IPv4 Header.

/* 
    \fN;ncw prints N bits of data from this pool (in decimal by default).  */
const char test_reprint_ipv4[] = 
    "\f\r0=cq"                     // Specify packed data
                                   // and load 16 bits; no printing
    "Version:           \f4;ncw\n" // Print  4 bits;  4 total
    "Header Words:      \f4;ncw\n" // Print  4 bits;  8 total
    "DSCP:              \f6;ncw\n" // Print  6 bits; 14 total
    "ECN:               \f2;cw\n"  // Print  2 bits; 16 total
    "Total Bytes:       \fcnq\n"   // Print 16 bit value
    "Identification:    \fcnq\n"   // Print 16 bit value
    "\f0=cq"                       // Load  16 bits; no printing
    "Flags:             \f&3;ncw\n" // Print  3 bits in binary;  3 total
    "Fragment Offset:   \f13;cw\n"  // Print 13 bits;           16 total
    "Protocol:          \fcp\n"    // Print  8 bit value
    "TTL:               \fcp\n"    // Print  8 bit value
    "Header Checksum:   \fcnq\n"   // Print 16 bit value
    "Source IP:         \fcp.\fcp.\fcp.\fcp\n"  // Print 4 1 byte values
    "Dest IP:           \fcp.\fcp.\fcp.\fcp\n"; // Print 4 1 byte values

reprintf_ptr(test_reprint_ipv4, incoming_packet);

The packing directive "\r" indicates the data format is tightly packed, rather than struct packed.

The input specifier "cq" corresponds to "uint16_t", so exactly 16 bits are loaded into the Value register. The "0=" sequence specifically loads 0 into Register 4, which for formatted integer output governs the number of significant digits printed. Printing 0 significant digits is essentially a no-op but leaves the value loaded in the Value register. The input modifier "n" indicates the input datum is big endian formatted.

The input specifier "cw" specifically calls out bitfields and assumes the bit data was already loaded into the Value Register. The ';' character identifies Register 3, which is the parameter governing how many bits are output.

Using printf, the equivalent code (without the binary flag output of course) is:

/* Using printf. The format string may appear simpler, but correctly extracting
the data from the packet just to put it on the stack is a painful task.
I'm not even sure if that part is right... */
const char test_printf_ipv4[] = 
    "Version:           %u\n"
    "Header Words:      %u\n"
    "DSCP:              %u\n"
    "ECN:               %u\n"
    "Total Bytes:       %u\n"
    "Identification:    %u\n"
    "Flags:             %x\n"
    "Fragment Offset:   %u\n"
    "Protocol:          %u\n"
    "TTL:               %u\n"
    "Header Checksum:   %u\n"
    "Source IP:         %u.%u.%u.%u\n"
    "Dest IP:           %u.%u.%u.%u\n";

printf(test_printf_ipv4
    ,incoming_packet[0] >> 4
    ,incoming_packet[0] & 0xF
    ,incoming_packet[1] >> 2
    ,incoming_packet[1] & 0x3
    ,*(uint16_t*)(incoming_packet + 2)
    ,*(uint16_t*)(incoming_packet + 4)
    ,incoming_packet[5] >> 5
    ,*(uint16_t*)(incoming_packet + 6) & 0x1FFF
    ,incoming_packet[8]
    ,incoming_packet[9]
    ,*(uint16_t*)(incoming_packet + 10)
    ,incoming_packet[12]
    ,incoming_packet[13]
    ,incoming_packet[14]
    ,incoming_packet[15]
    ,incoming_packet[16]
    ,incoming_packet[17]
    ,incoming_packet[18]
    ,incoming_packet[19]);

Code: Printing values in "ones and zeros"
04/04/2016 at 01:10 • 1 comment
Sometimes fixing a bit twiddling or other low level bug comes down to showing the individual bits. printf does not support printing an integer in radix 2, despite originating from times when code was closer to the metal. The code in reprint is as follows:
```
/* Print 42 as binary */
reprintf("\f&r", 42);
```
Octal and hex of course are supported:
```
/* Print 42 as hex */
reprintf("\f$r", 42);

/* Print 42 as octal */
reprintf("\f%r", 42);
```
These use of '$', '%', and '&' are not entirely arbitrary, as the three characters are sequential in value:
1. '$' is 0x24 and selects hexadecimal
2. '%' is 0x25 and selects octal
3. '&' is 0x26 and selects radix 2 (binary)
4. Default output is in decimal
Code: Indentation
04/03/2016 at 13:05 • 0 comments
A common pattern in output is to indent a line based on its depth in a hierarchy (i.e., JSON or XML, or function depth when debugging). Though the output is the same, the method is different between reprint and printf:
```
/* Indentation with N spaces on reprint and printf */
int N = 10;

reprintf("\f=ep", N, ' ');

printf("%*s", N, "");
```
- In printf, we use the left pad approach but put ε, (the empty string), as the value to be padded. In printf, padding is hardcoded to the ' ' character resulting in N spaces.
- In reprint, the command is to repeatedly print a specific character N times. The syntax breaks down as follows:
1. \f: Formatted output field header
2. =: Store the corresponding integer in the '=' register (register 4).
3. ep: The data input type is a character, 8 bits.
In reprint, numeric parameters are specified by the user as loading register values (much like a microprocessor). The numeric value does not have any meaning without a corresponding input type. So when reprint parses 'e', the meaning of the register is understood to be character repetition.
Code: No worries about left pad
04/02/2016 at 11:30 • 0 comments
If you are not familiar with the Node.js leftpad debacle, essentially someone wrote a single function to format a string to particular length, adding spaces to the left. He exported this function as a library and thousands of projects depended on it. After he removed his library, thousands of projects failed to build because they were missing this one simple function.
Thankfully, there are no worries here as reprint supports left pad!
```
reprintf("\f5r", 42);

printf("%5i", 42);
```
In this case, we are printing out 42 with a column width of 5. This pads 3 space ' ' characters before the 42. The pad character can be arbitrary (unlike printf), but that is another post.
Code: Printing signed integers
04/01/2016 at 10:24 • 0 comments
Here is a head to head comparison of reprintf to printf:
```
/* char */
char sc = -42;
printf("The answer is %hhi", sc);
reprintf("The answer is \fp", sc);

/* short */
short ss = -4242;
printf("The answer is %hi", ss);
reprint("The answer is \fq", ss);

/* int */
int si = -424242;
printf("The answer is %i", si);
reprintf("The answer is \fr", si);
```
Most C programmers (consciously or not) default to printing signed integers. In reprint, this is the easiest format to output as it requires only a single letter (p, q, or r) to follow \f. Namely,
1. 'p' corresponds to "char"
2. 'q' corresponds to "short int"
3. 'r' corresponds to plain "int"
4. 's' corresponds to "long int"
The reason for starting at 'p' is simply that its corresponding hex value is 0x70, putting it at the top of its column in the ASCII table. Thus if we look at the lower 3 bits of each character:
1. 'p' & 0x7 == 0
2. 'q' & 0x7 == 1
3. 'r' & 0x7 == 2
4. 's' & 0x7 == 3
There are even more integer types defined by C and referenced by characters beyond 's', but that is enough for now.
Contrast this with printf, where 'hh' is the *smallest sized integer, just a single 'h' is second smallest, no modifier is "normal" and 'l' is the bigger. Arbitrary much?
*(on some platforms char is 32 bits...and the same size as the other types.)
Design: Populating bitwise registers from the format string
03/31/2016 at 13:56 • 0 comments
The Output Control Register, Input Control Register and Input Size Register are populated from the lower bits of the characters in the format string, streamlining the parsing procedure:
1. Upon parsing a Field Header, set Output Control Register Bit 13
2. A packing directive may immediately follow and set Input Control Register Bit 7
3. In general, the Output Control and Input Control registers are set by the lower bits of the character data in the format field string.
4. The Flag characters each correspond to a single bit in the Output Control Register.
5. Output Control Register Bits 2, 5, 8 toggle to 1 if a corresponding Selector character appears in the conversion specifier. They are 0 otherwise.
6. A final Input Size sets the lower 4 bits of the Input Size Register.
Design: Conversion Specifiers
03/30/2016 at 16:40 • 0 comments
Like printf, reprint also has conversion specifiers to control data formatting. The conversion specifier breaks down to the following parts
1. Field Header: Indicates start of conversion specifier. The \f header starts a formatted specifier, while \b header starts a binary output specifier.
2. Packing Directive: Specify whether source data is tightly packed or packed as a C struct.
3. Output Control: Various parameters for controlling output.
4. Input Specification: Various parameters for interpreting the input. At minimum there is a size specification, which terminates the conversion specifier.
The exact characters comprising these sections are shown in the ASCII table breakdown of the conversion specifier syntax.