-
improvements to serial output of data
09/07/2019 at 05:50 • 0 commentsone of the biggest delays to getting terminal information to output is that it needs to work at separate times from the i2c bus and spi bus. spi bus is fast enough that this should not be an issue with outputting to lcd display. I want the terminal data to instantly show in terminal, so this may involve a ram buffer for the data to almost instantly show to terminal when i2c bus is idle.
there are two ways i may do this; 1 safely burst the i2c bus. this can only be done on read operations. so some checks need to be built in. do not do this with current setup. the library has been throttle limited to a max of 1mhz. but it might be able to burst to 2mhz in special situations. currently it is not a good idea and will require a lot of checks to make sure it works safely, but some status checks and verifying it is only this speed at read operations are needed. there is also the possibility of tuning the i2c frequency to optimum performance. when data can normally not be sent or received quickly enough the slave device can slow down the master device. this is if everything is in sync. at high clock rates this is unknown,
also there is a compressed buffer cache. normally reading output to terminal is low detail, the asci char generation is only 6 different char :
i have a way of fitting this data 3 bytes per 1 byte of cache. this is only for output to terminal. data is much faster internally. it also only uses 256 bytes of cache. it can be better because some number information is not processed for byte 1 and byte3. byte 2 will most likely have information similar to 1 +3
here is how i'll do it most likely
3bits is first byte value temp range
111|00|000
1st data
0 <26
1 >25 & <30
2 >29 & <31
3 >30 & <33
4 >32 & <36
5 >356&7 not used currently
000|00|111
3rd data
0 <26
1 >25 & <30
2 >29 & <31
3 >30 & <33
4 >32 & <36
5 >35
6&7 not used currently
000|11|000
2nd data
00 lowest value
01 1/2 of 1st and 1/2 of 3rd value
10 1/4 of 3rd value
11 highest value
I'm just looking at options to speed up serial terminal output. this may not even be needed after caching of i2c data. i2c is 1/8 the speed of when it will be cached...
-
getting faster math performance with inverse 1/2^x tabulation
09/07/2019 at 02:59 • 0 commentsthere are at least ares of solving for calibration data where the result is to do this
/2^x such as ExtractKvPixelParametersRawPerPixel. this normally isn't that big of a deal because the values were stored in ram. now, they are read for each cell. so i want to make it efficient.
the 2^x goes up to 48 in code, but i wanted some wiggle room so i did up to 64.
also the code is then normally divided so i wanted to make it a multiply operation so numbers are the result of 1/2^x. other than a lookup no math is done. this drastically speeds up calibration reads
so i made this switch(X) routine. it might even be faster as a table, but i don't know. this method does not require jumps, and can run in order execution so there is no penalty.
float SimplePowFast2sInverse(uint8_t x){//we cause multiply instead of division will move this into PROGMEM later
//we need to do 2^48 so at least 49 values, but we can go to 64. table will be generated from javascript and in project folder
//float value;
switch(x){
case 0: return 0; break;
case 1: return 0.5; break;
case 2: return 0.25; break;
case 3: return 0.125; break;
case 4: return 0.0625; break;
case 5: return 0.03125; break;
case 6: return 0.015625; break;
case 7: return 0.0078125; break;
case 8: return 0.00390625; break;
case 9: return 0.001953125; break;
case 10: return 0.0009765625; break;
case 11: return 0.00048828125; break;
case 12: return 0.000244140625; break;
case 13: return 0.0001220703125; break;
case 14: return 0.00006103515625; break;
case 15: return 0.000030517578125; break;
case 16: return 0.0000152587890625; break;
case 17: return 0.00000762939453125; break;
case 18: return 0.000003814697265625; break;
case 19: return 0.0000019073486328125; break;
case 20: return 9.5367431640625e-7; break;
case 21: return 4.76837158203125e-7; break;
case 22: return 2.384185791015625e-7; break;
case 23: return 1.1920928955078125e-7; break;
case 24: return 5.960464477539063e-8; break;
case 25: return 2.9802322387695312e-8; break;
case 26: return 1.4901161193847656e-8; break;
case 27: return 7.450580596923828e-9; break;
case 28: return 3.725290298461914e-9; break;
case 29: return 1.862645149230957e-9; break;
case 30: return 9.313225746154785e-10; break;
case 31: return 4.656612873077393e-10; break;
case 32: return 2.3283064365386963e-10; break;
case 33: return 1.1641532182693481e-10; break;
case 34: return 5.820766091346741e-11; break;
case 35: return 2.9103830456733704e-11; break;
case 36: return 1.4551915228366852e-11; break;
case 37: return 7.275957614183426e-12; break;
case 38: return 3.637978807091713e-12; break;
case 39: return 1.8189894035458565e-12; break;
case 40: return 9.094947017729282e-13; break;
case 41: return 4.547473508864641e-13; break;
case 42: return 2.2737367544323206e-13; break;
case 43: return 1.1368683772161603e-13; break;
case 44: return 5.684341886080802e-14; break;
case 45: return 2.842170943040401e-14; break;
case 46: return 1.4210854715202004e-14; break;
case 47: return 7.105427357601002e-15; break;
case 48: return 3.552713678800501e-15; break;
case 49: return 1.7763568394002505e-15; break;
case 50: return 8.881784197001252e-16; break;
case 51: return 4.440892098500626e-16; break;
case 52: return 2.220446049250313e-16; break;
case 53: return 1.1102230246251565e-16; break;
case 54: return 5.551115123125783e-17; break;
case 55: return 2.7755575615628914e-17; break;
case 56: return 1.3877787807814457e-17; break;
case 57: return 6.938893903907228e-18; break;
case 58: return 3.469446951953614e-18; break;
case 59: return 1.734723475976807e-18; break;
case 60: return 8.673617379884035e-19; break;
case 61: return 4.336808689942018e-19; break;
case 62: return 2.168404344971009e-19; break;
case 63: return 1.0842021724855044e-19; break;
};}
this would have more of an effect but there is less waiting in a while loop for data to return to ram.
the original code spent 98% more time in while loops. the current code spends less time in loops, so it has more math ability in general. multiply is still slower, and most of division is being removed.
i just thought to share one example of how math can be improved.
i'll update the version of the code that has this to be online within 24hrs.
-
my uno board is in use in another project, so i used a 328p nano board
09/03/2019 at 15:56 • 0 commentsi have verified that this project can work within the limitations of the 328p, or uno and nano boards.
above image is of the wiring to the nano board. it has a 3.3v pin as well as a gnd pin.
here is the output in raw data to display as a put my hand in front of sensor and remove it. this is not image mode it is the math intensive calc output mode, with a lot of detail. the limits of performance are from math, and from bandwidth. it works with 982 bytes of ram, and i think stack mem overhead is about 100-200 bytes max. i'll be implementing some math tricks to greatly improve performance, however again the main issue is the bottleneck of data from i2c. so optimizations of data bus is being worked on.
BTW this performance is very close to the arm teensy 2.0, and teensy-lc.
in video above i move my hand about 6 inches in front of sensor and move it away a few times with an up down chop motion just to show data updating in real time.
Any way the latest code that works on arduino will be uploaded tonight. it also includes resolution doubler functions that really on data caching. it uses 256 bytes of ram, but speeds up performance by 4-5 times. this is from reduced i2c mem calls and caching math results.
-
performance on arudino atmega is similar. bandwidth is main limiter
09/03/2019 at 14:51 • 0 commentsi just recorded a video of performance of arduino atmega with MLX90460. its max frequency to work on device is 800kz, where the teensy seems to be able to do 1mhz. there are ways to optimize i2c bus with a small cache that will increase performance and work still needs to be done.
here is video from atmega using ImageMode (not as detailed as Calc Mode)
-
on teensy-lc i have improved performance of doubleResolution output
09/03/2019 at 14:34 • 0 commentsthere is a bottleneck with i2c communication and in order to subsample i would need to do at least 4 mem calls. and do 4x more complex math. i buffered the data so it uses 256 more bytes of ram if using the doubleResolution function, but it now runs 4-5 times faster as it also frees up the i2c bus for other parts of the process. this is for the teensy-LC. so it now runs with about 3k of ram in use. on arduino it uses a lot less ram because stores are in 8bits at a time instead of 32.
here is the video performance of the ascii output to terminal of my hand counting to 5, in image mode that uses less math, and in calc mode that has precise output.
Z_MemManagment.h, look for Replace_detailed_calc_with_image_data false ; or true
in order to get results, all you would need to do is type something like this
float temp=DoubleResolutionValue(x,y);//we return double we read x,y directly not mem location
the routine does the caching and all the interpolations with only needing a x,y value in the range of 64,48.
anyway the image mode is here
and the detailed calc mode is here. the ascii image art doesn't do it justice.
-
i've resolved the issue with math on the arduino. it will be uploaded
09/02/2019 at 21:56 • 0 commentsi was dealing with some strange issues with the switch to 64bit math, some of the routines were still not behaving. needless to say i resolved the issue. i'll be posting code after smoothing out the workflow for the serial and i2c data buffers. the code works without using 64bit math.
-
sensor works in image mode calibrated on atmega 1280 and on teensy LC
09/02/2019 at 00:18 • 0 commentsuntil now i have been working on getting accurate temp verified. this currently requires more math than the uno can handle without a 64bit library. short term i will include a 64 bit library, but the idea is to not need it. for example i modified the image function to work with less memory as well.
the teensy works in this mode as well, without need for the 64bit library as it internally is 32bit, and its number accuracy is 64bits for floats. (double precision). like i have stated earlier by using only sqrt, or only pow, the numbers will be in resolution with enough detail to work on uno. for now i will use the 64 bit math library if it works as a short term solution. If anyone wants to know the library's i'm testing for double precision float use they are here:
https://github.com/mmoller2k/Math64
https://github.com/mmoller2k/Float64
the image function does not require this level of resolution detail of both large numbers and small numbers. and by default it works ok on Arduino mega as long as new method of mem managment is in use. (uno most likely works as well but not tested with this function remake yet)
i just rewrote (but have not uploaded the code yet) the image capture function to work with the new mem management; this is all part of me trying to verify old way and new way for similar operation. it uses 758 bytes at compile there are no large pointer calls that load onto stack either. max usage should be under 1k as far as i can tell (will prob never know for sure, but it will be close enough).
here is the output of what mem usage is
the image function does not require extreme range of math operations, so it works on the atmega, and more than likely will work on the uno at this point as well. i will update this log and add a link when i upload the code, currently there is no buffer so reading the sensors is slower, so i'm going to implement i2c buffer before i list the code.
[placeholder here for when link is available]
I also want to clean up the code that does the upsample so it can be used more generally. so after this and the smart i2c cache for ram values of 64 words (128bytes) the code should run fast enough and be cleaner. the efficiency of i2c reads peaks at around 64bytes. currently i read only 2bytes at a time and this has a lot of overhead. it was for simplicity and to test it working. with buffer it will read about 8 times faster.
-
i've had 3 people ask me about starting with the melexis evaluation board. it depends.
08/31/2019 at 03:17 • 0 commentsBAD AND GOOD.
this board is excellent for testing the sensor, and verifying it works ok, and allows rom dumps, and videos of sensor reads on a computer. however it has its own processor, and a usb chip that has some sort of data bus to translate the i2c data from chip to data to be processed, and then to be sent thru to usb of host. the issue is the processor operates as master, and it is difficult to disable it. (This may be over simplified, but the basics of the operation and issues are stated)
there are other possible issues as well. the chips on the board might have low voltage tolerances, even on pins hooked up to i2c. i2c spec voltage is 1.2v to 5.5v. the mlx90640 sensor stated in the manual that it can handle 5v i2c logic only on these pins. but this board is more than just the mlx chip.
This board is also about 500$, and like i said good as long as it is only for testing features, verifying rom and basic testing of melexis devices. the issue is it has a master i2c configuration and will not tolerate any other master device on the i2c bus. there are ways to bypass the host processor that handles the i2c, but at 500$ i think it is better used for its purpose to check and verify parts or show capability of devices.
REASONS TO USE SPARKFUN OR A BOARD THAT CONTAINS MLX90640 sensor is the built in resistors, the reliable operation at speeds within the spec of device, and working with 3.3v and 5v logic i2c. the mlx device needs to receive 3.3v +/-0.3v max.
If you have any work you are doing that requires a controller, and custom software you will want to have your own master device for i2c. sparkfun examples, and the code generated here assumes it is the master. also using sparkfun board, or your own wiring allows freedom of a voltage range up to 5v to i2c pins. the logic to these pins should mainly be an input of some high resistance and a pull down transistor.
-
if you need to replace sensor, solder a new one on from digikey
08/31/2019 at 02:55 • 0 commentsi had damaged a sensor 110 deg x 75 deg, which is about 80$ and shipping. i did this when experimenting with the highest frequencies. chip spec is 1mhz. i ran above this testing it. for reading ram it seemed ok. it corrupted data somewhere from 1mhz and 4mhz data corruption happened within the rom. rebooting the device it didn't seem to work anymore. i decided it was better to get a narrower angle lens so i could see more detail.
same stuff applies as before with needing a rom dump.
also i decided to solder the new one on the old ones board. it saves about 30-40$ as digikey sells the sensor (by itself) for about 45$, and shipping can be free if your willing to wait 5-7 days inside of usa.
I could have put it closer to the board, but there are issues with solder wicking up towards the back of the device and shorting out the wires to the metal can which is grounded.
also i had been doing a lot of troubleshooting the old sensor and soldered it and removed it several times, and damaged the island pad beneath vcc so i needed room to run solder iron above and connect a wire thru the hole and to the top island, and the bottom and have a trace wire to 3.3vcc pin. it works ok so i'm not doing any more work with modifications of hardware.
it would be better to start off with a 55deg x35 deg sensor, and of course keep it at max speed 1mhz. also if not using sparkfun board be sure to use pull up resistors across sda and scl lines of around 2.2k and look at slew rate over o-scope because you will need to account for stray capacitance, and possible reluctance from the length of wire to board. it is recommended anyway that i2c wires try to remain under 6 inches
new sensor soldered in place, old sensor put to the side for possible salvage later (currently it does not address. learned a expensive lesson. don't push over 1mhz, and make sure wires are short. data corruption during write cycle can be deadly. device verifies a write, but only to the address sent, if the address is different, it rewrites possibly to the correct address this time. any incorrect writes to rom will be processed at start up of device in its registers. in my cause the corrupt writes went to the address of the chip and to the special registers. i have back up of the rom data, but do not yet know how to have device in special mode to restore its data.
-
some of the math can be modified or removed, or 64 bit float can be added.
08/26/2019 at 11:42 • 0 commentshere is the main math areas of the calculation of To (thermal value to deg c) some lines are left out, just the math intensive areas are shown below. i feel that this much calc is not needed, and use of both pow and sqrt can be reduced to one or the other.
granted, that this will in the end most likely only need to be performed on 4, maybe 5 pixels and the rest will be thermal sensor image data from get image data and not need so much calculations
look at how some values are pow(base, exponent) and then at the end sqrt() . i can't help but think that the math can be simplified, and at least the pow removed. it might require a little bit different thinking.
so at first i'm going to switch to 64bit float math library. (when i know it works i will credit the author) and then i'll try to simplify the math down to not require pow. the pows and sqrts create numbers that need so much resolution detail that the UNO can not process them all correctly with 32bit floats.
sub_calc_vdd = MLX90640_GetVdd();
sub_calc_ta = MLX90640_GetTa();
sub_calc_ta4 = SimplePow((sub_calc_ta + 273.15), (double)4);//prob remove x1x2x3x4
sub_calc_tr4 = SimplePow((tr + 273.15), (double)4);//prob remove x1x2x3x4
sub_calc_taTr = sub_calc_tr4 - (sub_calc_tr4-sub_calc_ta4)/emissivity;/redo because no pow
sub_calc_alphaCorrR[0] = 1 / (1 + ksTo[0] * 40);
sub_calc_alphaCorrR[1] = 1 ;
sub_calc_alphaCorrR[2] = (1 + ksTo[2] * ct[2]);
sub_calc_alphaCorrR[3] = sub_calc_alphaCorrR[2] * (1 + ksTo[3] * (ct[3] - ct[2]));irData = irData - offset[pixelNumber]*(1 + kta[pixelNumber]*(ta - 25))*(1 + kv[pixelNumber]*(vdd - 3.3));
sub_calc_Sx = SimplePow((double)sub_calc_alphaCompensated, (double)3) * (sub_calc_irData + sub_calc_alphaCompensated * sub_calc_taTr);
sub_calc_Sx = Q_rsqrt((sub_calc_Sx)) * ksTo[1];
sub_calc_To = Q_rsqrt(Q_rsqrt(sub_calc_irData/(sub_calc_alphaCompensated * (1 - ksTo[1] * 273.15) + sub_calc_Sx) + sub_calc_taTr)) - 273.15;// remove outer sqrt's, and just have some numbers be sqrt. this avoids the pow to sqrt conversionif(sub_calc_To < ct[1])
{
sub_calc_range = 0;
}
else if(sub_calc_To < ct[2])
{
sub_calc_range = 1;
}
else if(sub_calc_To < ct[3])
{
sub_calc_range = 2;
}
else
{
sub_calc_range = 3;
}
sub_calc_To =Q_rsqrt(Q_rsqrt(sub_calc_irData / (sub_calc_alphaCompensated * sub_calc_alphaCorrR[sub_calc_range] * (1 + ksTo[sub_calc_range] * (sub_calc_To - ct[sub_calc_range]))) + sub_calc_taTr)) - 273.15;
return sub_calc_To;//we return value to main loop rather than do each pixel (all together)