-
improved encoder reading
11/12/2018 at 18:43 • 0 commentsThe software I'm modifying does an encoder read as follows:
int rot = ((digitalRead(BTN_EN1) == LOW) << BLEN_A) | ((digitalRead(BTN_EN2) == LOW) << BLEN_B); // potentiometer uses grey code. Pattern is 0 3 switch (rot) { /** logic to interpret the value goes here ** / }
This routine is ok, but it could be made a little better.
First I cut and paste the routine in a separate test project so that I can gather timings.
To collect timing I'm collecting 1000 iterations of the encoder reading.
void consume_input() { // grab the time prior to mucking with serial output unsigned long now = micros(); unsigned long delta_ms = now - delta_time_start; Serial.print("Turn: "); Serial.print(lcd_turn); Serial.print(" uS: "); Serial.println(delta_ms); lcd_turn = 0; delta_time_start = micros(); } static int g_loopctr = 0; void loop() { read_input_pot(); ++g_loopctr; if(g_loopctr>1000) { g_loopctr = 0; consume_input(); } }
I'm running this code on an Arduino Mega 2650 and using Arduino 1.8.5 (which is using avr-gcc/5.4.0-atmel3.6.1)
Starting with the existing implementation:
Turn: 0 uS: 12028 Turn: 0 uS: 12028 Turn: 1 uS: 12108 Turn: 2 uS: 12232 Turn: 1 uS: 12040 Turn: 1 uS: 12148 Turn: 3 uS: 12224
There's a fairly consistent time of 12028 uS per 1000 iteration and an additional cost of 50 - 100uS when the pin detection code runs
Measuring 1000 loop iterations with the pin reading code disabled
microseconds (uS): 3080
12028 - 3080 = 8948
Measuring 1000 loop iterations and calling read_input_pot() twice:Turn: 0 uS: 23108 Turn: 0 uS: 23108 Turn: 0 uS: 23108 Turn: 0 uS: 23108
23108 - 12028 = 11080
That's a bit higher than I expected, but within reason. It seems like the timing numbers being collected have some validity.
So, first easy win is to switch some variables over to 8 bit integers. There's basically 2 pins that are being used to read values, so there's no reason to use anything larger than 8 bits.
Turn: 0 uS: 11648 Turn: 1 uS: 11740 Turn: 2 uS: 11876 Turn: 1 uS: 11664
That's a small but measurable improvement.
Looking at the assembly code it's easy to see why this was a win
//rot is int switch (rot) { 4ba: 21 30 cpi r18, 0x01 ; 1 4bc: 31 05 cpc r19, r1 4be: 61 f1 breq .+88 ; 0x518 <__LOCK_REGION_LENGTH__+0x118> 4c0: 24 f4 brge .+8 ; 0x4ca <__LOCK_REGION_LENGTH__+0xca> 4c2: 21 15 cp r18, r1 4c4: 31 05 cpc r19, r1 4c6: 41 f0 breq .+16 ; 0x4d8 <__LOCK_REGION_LENGTH__+0xd8> 4c8: 2c c0 rjmp .+88 ; 0x522 <__LOCK_REGION_LENGTH__+0x122> 4ca: 22 30 cpi r18, 0x02 ; 2 4cc: 31 05 cpc r19, r1 4ce: c9 f0 breq .+50 ; 0x502 <__LOCK_REGION_LENGTH__+0x102> 4d0: 23 30 cpi r18, 0x03 ; 3 4d2: 31 05 cpc r19, r1 4d4: d9 f0 breq .+54 ; 0x50c <__LOCK_REGION_LENGTH__+0x10c> 4d6: 25 c0 rjmp .+74 ; 0x522 <__LOCK_REGION_LENGTH__+0x122>
// rot is uint8_t { switch (rot) { 4ae: 91 30 cpi r25, 0x01 ; 1 4b0: 31 f1 breq .+76 ; 0x4fe <__LOCK_REGION_LENGTH__+0xfe> 4b2: 28 f0 brcs .+10 ; 0x4be <__LOCK_REGION_LENGTH__+0xbe> 4b4: 92 30 cpi r25, 0x02 ; 2 4b6: c9 f0 breq .+50 ; 0x4ea <__LOCK_REGION_LENGTH__+0xea> 4b8: 93 30 cpi r25, 0x03 ; 3 4ba: e1 f0 breq .+56 ; 0x4f4 <__LOCK_REGION_LENGTH__+0xf4> 4bc: 24 c0 rjmp .+72 ; 0x506 <__LOCK_REGION_LENGTH__+0x106>
Changing to uint8_t made a significant improvement in the generated instructions, but only resulted in a small improvement to overall performance. This suggests that the majority of time is being spent in something that the code is calling. In this case the time is coming from the calls to digitalRead.
digitalRead works and is safe, but is not the most efficient way to read from a pin. In this case the pins being read are 31 and 33. On an Arduino Mega these pins are on port 3 (see https://github.com/hazcat/QueryPinconfig ). That data plus the bitmasks to read the values can be cached off.
#define BTN_EN1 31 //[RAMPS14-SMART-ADAPTER] #define BTN_EN2 33 //[RAMPS14-SMART-ADAPTER] // cache of the port for the pins volatile uint8_t * cacheport = nullptr; // bitmasks for the two values uint8_t bm_a = 0; uint8_t bm_b = 0; void setup() { /* ... */ bm_a = digitalPinToBitMask(BTN_EN1); uint8_t port1 = digitalPinToPort(BTN_EN1); bm_b = digitalPinToBitMask(BTN_EN2); uint8_t port2 = digitalPinToPort(BTN_EN2); } void read_input_pot_direct_read_from_pins() { // doing two reads. Grab the port value and then do the two reads uint8_t port = (*cacheport); uint8_t rot = ( (port&bm_a) == bm_a) ? B01 : 0; if((port&bm_b)==bm_b) { rot |= B10; }
Collecting timings:
Turn: 0 uS: 4596 Turn: 1 uS: 4596 Turn: 1 uS: 4588 Turn: 0 uS: 4588 Turn: 2 uS: 4592
That's quite a lot better.
The next thing I wanted to check was to see if there was any advantage to switching to a lookup table based approach as described here.
https://www.circuitsathome.com/mcu/reading-rotary-encoder-on-arduino/
The first attempt produced significantly worse results. The primary issue was that it's a massive optimization to detect that the encoder hasn't moved. The second issue was that the right shift operator produces some assembly that is suboptimal.
last_input = (last_input >> 2); a36: 90 e0 ldi r25, 0x00 ; 0 a38: 95 95 asr r25 a3a: 87 95 ror r24 a3c: 95 95 asr r25 a3e: 87 95 ror r24 a40: 80 93 3f 02 sts 0x023F, r24 ; 0x80023f <_ZL10last_input>
Switching over to dividing by 4 I get the two shift rights that I'd expect
last_input = (last_input /4); a36: 86 95 lsr r24 a38: 86 95 lsr r24 a3a: 80 93 3f 02 sts 0x023F, r24 ; 0x80023f <_ZL10last_input>
In the end I came up with the following
constexpr int8_t kDIR_A = -1; constexpr int8_t kDIR_B = 1; static const int8_t enc_states[] PROGMEM = { 0, kDIR_A, kDIR_B, 0, kDIR_B, 0, 0, kDIR_A, kDIR_A, 0, 0, kDIR_B, 0, kDIR_B, kDIR_A, 0 }; void read_input_pot() { uint8_t port = (*cacheport); uint8_t rot = ((port & bm_a) == bm_a) ? B01 : 0; if ((port & bm_b) == bm_b) { rot |= B10; } if(rot!= lcd_rot_old) { uint8_t index = (rot*4)+lcd_rot_old; lcd_turn += pgm_read_byte(&enc_states[index]); lcd_rot_old = rot; } }
The end product produces some pretty reasonable assembly and timings that are in line with the other routine.
Turn: 0 uS: 4592 Turn: -1 uS: 4596 Turn: 0 uS: 4600 Turn: -2 uS: 4592 Turn: 0 uS: 4596 Turn: 0 uS: 4584 Turn: -1 uS: 4588 Turn: 0 uS: 4600
if(rot!= lcd_rot_old) { a6a: e8 17 cp r30, r24 a6c: 79 f0 breq .+30 ; 0xa8c <main+0x204> a6e: 98 2f mov r25, r24 a70: 99 0f add r25, r25 uint8_t index = (rot*4)+lcd_rot_old; lcd_turn += pgm_read_byte(&enc_states[index]); a72: 99 0f add r25, r25 a74: e9 0f add r30, r25 a76: f0 e0 ldi r31, 0x00 ; 0 a78: ee 5d subi r30, 0xDE ; 222 a7a: fe 4f sbci r31, 0xFE ; 254 a7c: e4 91 lpm r30, Z a7e: 90 91 30 02 lds r25, 0x0230 ; 0x800230 <lcd_turn> a82: e9 0f add r30, r25 a84: e0 93 30 02 sts 0x0230, r30 ; 0x800230 <lcd_turn> a88: 80 93 2b 02 sts 0x022B, r24 ; 0x80022b <_ZL11lcd_rot_old> loop():
The next step in improving this routine is probably to read the pins off of an interrupt rather than polling for state change. But that'll have to wait until another time.
Code is here: