Project | Model S BMS hacking

« Back to project details Sort by:

Wiring modules together
03/02/2017 at 05:30 • 1 comment

I spent some more time tracing out the logic between the RF isolator and microcontroller to try and determine the actual function of the 4th channel on the isolator and just how the power down circuits work.
Ch4 does appear to be a fault line, it is connected, via a NAND gate to the FAULT_H pin on the BQ76 BMS IC, so this will drive the shared fault line (on pin 6 and 10 of the main connector) low in the case of an over temperature, over voltage or under voltage event. A diode is used to make it a dominant fault line, ie any fault on any module will force a fault state. This should work even if a module microcontroller stops responding (which would otherwise kill the daisychain UART communications.) Weirdly, the other pin of this NAND is derrived from a second NAND of the SDO pin and the RF com chip disable line. It's not clear to me what this achieves.
I also found the main connector (J1) part number by sorting through digikey, It looked like a Molex, that and the pins + spacing was enough to go on. Connector is Molex 15-97-5101, pins are 39-00-0038, retaining key is 15-97-9101.
So now it's pretty clear how the modules should be wired up to a controller. Here is an example for three modules (it can be extended to any number, up to 62 from what I can tell)
All this wiring is isolated from the cells, so the gnd and +5V can safely be on the 12V wiring potential in an automotive application.
A couple of the other team members, Collin and Tom are working on an Arduino sketch to interface with the serial protocol based on my earlier findings, this should now be ready to test out and get basic functionallity like voltage and current readings. https://github.com/collin80/TeslaBMS
Or if you have python and an FTDI board and just want to mess around, try the python script in the files section of this hackaday.io project.
bms settings
02/16/2017 at 08:41 • 2 comments
One of the reasons I wanted to reverse engineer rather than replace the microcontroller on the module BMS board is that Tesla have carefully selected voltage levels for under/over voltage thresholds and balancing levels. So it's a bit of a bummer to find that the modules themselves don't seem to have these values built in, it's handled higher up the control chain.
But the BQ76 does have a hardware over and under voltage threshold, as well as over temperature built in, which pulls a shared fault line low.
There are also some other configuration values worth reading out, and since I do have a bunch of boards which have had power applied since the last time they were used, since they have this huge battery backup - the hundreds of 18650 cells - to keep the RAM powered!
So how do we read it out? need to find the address first, using my python script we can iterate through the possible addresses with something like
```
for address in range(0x3E):
    sendData(Read,[address, 0x00, 0x4C])
```
And eventually, at address=0x0F I get the result:
TX: 0x1e, 0x0, 0x4c,
RX: 0x1e, 0x00, 0x4c, 0x81, 0x2a, 0x25, 0x25, 0xa5, 0x25, 0xab, 0x25, 0xae, 0x25, 0xad, 0x25, 0xac, 0x25, 0xaa, 0x0a, 0x99, 0x0a, 0x93, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x3d, 0x03, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x8f, 0x00, 0x00, 0x00, 0x00, 0x10, 0x80, 0x31, 0x81, 0x08, 0x81, 0x66, 0xff, 0x15, 0x00, 0x00, 0x00, 0xad,
Labeling each register:
```
0: 81        #status register
1: 2a 25     #GPAI measurement data
3: 25 a5     #cell 1 voltage data
5: 25 ab     #cell 2 voltage data
7: 25 ae     #cell 3 voltage data
9 25 ad      #cell 4 voltage data
b: 25 ac     #cell 5 voltage data
d: 25 aa     #cell 6 voltage data
f: a 99      #TS1 voltage data
11: a 93     #TS2 voltage data
13-1f rsvd   
20: 0        #alert status
21: 0        #fault status
22: 0        #OV fault state
23: 0        #UV fault state
24: 0        #parity result A
25: 0        #parity result B
26-2f rsvd
30: 3d       #ADC measurement control
31: 3        #I/O pin control
32: 0        #cell balancing control
33: 0        #cell balancing max on time
34: 0        #ADC conversion start
35-39: rsvd
3a: 0        #group 3 registers write access control
3b: 8f       #Address register
3c: 0        #reset control
3d: 0        #test mode selection
3e: rsvd
3f: 0        #EPROM programming enable
40: 10       #Function configuration
41: 80       #IO configuration
42: 31       #OV setpoint
43: 81       #OV time delay
44: 8        #UV setpoint
45: 81       #UV time delay
46: 66       #over temperature set point
47: ff       #over temperature time delay
48: 15       #user data 1
49: 0        #user data 2
4a: 0        #user data 3
4b: 0        #user data 4
```
So the values after 0x3F are identical to those read from the EPROM on a restart, they set to OV/UV/OT and time delays.
OV setpoint = 0x31, datasheet says 2V + 50mV * 49 = 4.45V
UV setpoint = 0x08. datasheet says 0.7V + 100mV * 8 = 1.50V
OT setpoint = 0x66. Table 2 says this corresponds to 1.578V, which is 65C for a 10k NTC such as ERT-J1VG103FA
Interesting values, the cells will basically be on fire if they reach these voltages.. The temperature value is much more reasonable.
Since we can't guarantee the register content, best practice is to just reset the BQ76 chips with a broadcast reset, then set it up from scratch. EPROM values will be loaded automatically but there are a few registers that require setup after a reset, Address control, ADC control, IO control and the fault and alert registers.
```
### ~~ Startup Code ~~ ###
#A5 is the magic value to reset the chips
sendData( Write, [broadcast, RESET_CONTROL, 0xA5])    
#set address
sendData( Write, [0x00, ADDRESS_CONTROL, address|0x80]) #do this for each BMS on the daisychain
#configure ADC
sendData( Write, [address, ADC_CONTROL, 0x3D])
#configure IO
sendData( Write, [address, IO_CONRTOL, 0x03])
#clear faults, need to write a 1 to the fault bit to be cleared, then a zero
sendData( Write, [address, ALERT_STATUS, 0x80])
sendData( Write, [address, ALERT_STATUS, 0x00])
sendData( Write, [address, FAULT_STATUS, 0x08])
sendData( Write, [address, FAULT_STATUS, 0x00])
```
I've put a new version of the python script in the files section which also reads out and converts the ADC values to voltages.
https://cdn.hackaday.io/files/10098432032832/TeslaBMS_02.py
UART protocol cracked!
02/15/2017 at 03:44 • 7 comments

As I found in the last post, the isolated UART bus daisychained through the Tesla modules is running at an odd frequency, 612,500bps. This is problematic as it is not a standard PC baud rate. I had an FTDI board lying around so I had a peek at the datasheet to discover it has a highly configurable baud rate generator, easily configured to within 3% of 612kbps. In addition, the windows driver handles configuration automatically.
To test this out I wrote a python script, PySerial was fine with the non standard rate, and the frequency was confirmed with a logic analyser.
One of my collaborators, Tom has been probing out a module BMS hooked up to a master board, he found that a serial string is periodically transmitted and this results in SPI bus activity, just the status register being read. However, we had issues decoding the exact UART messages as he was lacking decent analysis tools. I Figured I would try repeating the test by sending the supposed data with my FTDI board anyway.
All I observed was each byte being repeated on the other side of the daisychain bus as expected from analysing the UART ISR code. I started modifying the bytes and noticed any string starting with 0x00 or 0x01 was repeated with that first byte modified to 0x80 or 0x81. At first I thought it was a read error but it was consistant and verified with a logic analyser.. Still no activity on the SPI bus.
Then a coding mistake led me to send long strings of 0x000000 etc. running the logic analyser I saw the SPI bus come alive!
The UART responded with an additional 0x00. The SPI bus had repeated the 0x000000 message and added/read out a 0x00, It's like the microcontroller is acting as a UART/SPI bridge, with something funny happening on the first byte.
Working on this theory, I constructed a valid SPI read message according to the datasheet: http://www.ti.com/lit/ds/symlink/bq76pl536a-q1.pdf

Lets try 0x00 00 01 - This should read 1 byte from device 0, address 0
And I got a reply! 0x61. 0x35 is the CRC8 of 0x00000135 This was the same reply recorded by Tom with his master/module BMS pair. strangely a very different uart command to what he seemed to record so there may be more to this. But it does seem like the microcontroller provides (almost) direct SPI access.
Taking it to the extreme, lets read out all the registers by sending 0x00 0x00 0x4C as there are 76=0x4C total register bytes
RX: 0x80, 0x00, 0x4C, 0x61, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x80, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x10, 0x80, 0x31, 0x81, 0x08, 0x81, 0x66, 0xff, 0x15, 0x00, 0x00, 0x00, 0x7F
SWEET!
Side note, looks like there is EEPROM data being loaded in for the UV/OV points and timers too. Translates to overvolt threshold of 4.45V and undervolt threshold of 1.5V, which seems very low to me.. I'm guessing the master BMS handles much of the protection by polling the cells. The GPAI ADC channel is also pointed to VBAT, so this channel is by default going to be measuring the voltage of all 6 cells in the module.
So taking a closer look at the first byte as this clearly differs from the SPI protocol and contains some undocumented magic. It's my best guess that bit 7 is used as a blocking bit to allow address assignment to each module. The middle 6 are the device address as per datasheet. So the last bit must be the R/W bit. (which is why both 0x00 and 0x01 have the bit7 blocking bit set when they are repeated)
I confirmed this by writing to the address register, then reading it back using the new address:
TX: 0x01, 0x3b, 0x81, 0x8b,
RX: 0x81, 0x3b, 0x81, 0x8b,
TX: 0x02, 0x3b, 0x01,
RX: 0x02, 0x3b, 0x01, 0x81, 0xba,
We write 0x81 to the address control register (0x3B), (where bit7 here indicates the address has been set, this clears an alert signal) so the new address is 0x01. to use the address, we must shift it left by 1, to get 0x02 as bit0 is the RW bit. The module now responds to the new address! In addition it no longer sets the 'blocking bit' - the leading byte is 0x02.
So if we try sending a message now to 0x00, it will be forwarded by this first module without modification:
TX: 0x00, 0x00, 0x01,
RX: 0x00, 0x00, 0x01,
The second module in the daisychain will then act on the string, set the blocking bit, and any other modules in the daisychain will thus ignore it, simply forwarding it on until it gets back to the master board (or in my case the FTDI serial interface)
Some other things I noticed, the on board microcontroller actually checks the CRC checksums for any write packets, it won't forward them over the SPI bus if they are wrong, this means we can't disable CRCs for SPI writes/replies on the BQ76 chip. The uC is clearly also recording the device address so it can handle the daisychain protocol and ignore packets not addressed to it. Broadcast packets are also supported (device address 0x3F)
So now anyone with an FTDI board (many other USB Serial interfaces may work too) communicate with a string of 1 to technically 63 tesla modules.
I've uploaded a Python script which sets the address of one module, then initiates an ADC conversion and reads out the result. I'll add more functionality and comments shortly, it can be found here:
https://cdn.hackaday.io/files/10098432032832/TeslaBMS.py
Or TeslaBMS.py in the files section of this project.
Next up: Designing an embedded BMS master board with a CAN bus and contactor (big DC relay) drivers.
Addendum: If you have a battery module pulled from a Tesla pack, it will still have the address it was assigned when operating in the Tesla, so it won't respond at the 0x00 address. Either reset it's address with a broadcast command or simply unplug the battery cable from the board. It's also worth noting that there is quite possibly a lot of extra functionality built into the microcontroller, the extracted code is vastly more complex than it should be for the above functionality. The function of the grey wire in the wiring harness is also unknown. likely a fault signal from the BQ76 BMS IC.
Firmware hacking
04/17/2016 at 20:22 • 2 comments
Here is a rough schematic of the isolated bus circuitry.
It looks like there is some shared "open collector" style bus on CH4 of the isolator, with D8 and (I assume) a pullup on the BMS controller board. But so far I've only ever measured this line being low, thus disabling the entire bus. I haven't worked out what U5 is yet. probably a dual logic gate. Maybe it combines error signals out of the BMS IC..
Ch1 is used to bring the BMS out of low power mode. A low frequency oscillator powers up U4 periodically checking on the state of CH1. The whole thing then stays powered while CH1 is driven low (otherwise CH1 is high impedance if the bus side of the isolator is unpowered, the circuitry must detect this and shut down) I can't really be bothered tracing out much more of the circuit. the layer of varnish on everything and tracks disappearing into internal layers makes it a nightmare.
Ch2 and 3 look like the main comms lines, connected to the UART of the uC. J1 is set up such that Tx of one channel is connected to Rx of the next, so they daisy chain together. The first BMS in the chain must get a "go" command from a controller board.. so I just have to work out how to say "go" as discussed in my last post.
I thought I'd just check that Tesla had the security bit set to protect the firmware from reading.. already had a SiLabs JTAG/C2 programmer so I wired up the 2-wire C2 interface.
And turns out.. The firmware is totally readable! Tesla wanted to make my life a bit easier. Thanks Tesla! There is about 5KB of used codespace. I ran it through an 8051 disassembler, DASMx. The SiLabs C51 uC's are totally 8051 compatible, meaning binary operands and memory locations are the same, so DASMx can even decode the SFR addresses into memonics for me. Its supposed to be able to 'flatten' out the code, using all the calls and branches to make it more linear to read but it would only process 3% of the code when I tried that option.
Here is the hex: https://cdn.hackaday.io/files/10098432032832/code.hex
And disassembled code: https://cdn.hackaday.io/files/10098432032832/code.lst
And the chip datasheet: https://www.silabs.com/Support Documents/TechnicalDocs/C8051F52x-F53x.pdf
There are a few thousand lines of assembly. I started by looking at the interrupts, as some of the code is most likely interrupt driven, especially the UART RX.
Code at 0x12A9: mov iec,#0B0H, this writes to the interrupt enable Special Function Register, enabling interrupts on Timer2 and UART0. The UART interrupt vector is 0x0023, which contains ljmp L0C0F, which does another ljmp L0DB9 which contains the code below!
```
0DB9                        L0DB9:
0DB9 : C0 E0        "  "        push    acc
0DBB : C0 F0        "  "        push    b
0DBD : C0 83        "  "        push    dph
0DBF : C0 82        "  "        push    dpl
0DC1 : C0 D0        "  "        push    psw
0DC3 : 75 D0 00    "u  "        mov    psw,#000H
0DC6 : C0 00        "  "        push    X0000
0DC8 : C0 01        "  "        push    X0001
0DCA : C0 02        "  "        push    X0002
0DCC : C0 03        "  "        push    X0003
0DCE : C0 04        "  "        push    X0004
0DD0 : C0 05        "  "        push    X0005
0DD2 : C0 06        "  "        push    X0006
0DD4 : C0 07        "  "        push    X0007
0DD6 : 30 99 05    "0  "        jnb    ti,L0DDE
0DD9 : 75 2B 00    "u+ "        mov    X002B,#000H
0DDC : C2 99        "  "        clr    ti
0DDE                        L0DDE:
0DDE : 30 98 30    "0 0"        jnb    ri,L0E11
0DE1 : 85 99 2A    "  *"        mov    X002A,sbuf
0DE4 : E5 2C        " ,"        mov    a,X002C
0DE6 : 60 21        "`!"        jz    L0E09
0DE8 : E5 2B        " +"        mov    a,X002B
0DEA : 60 05        "` "        jz    L0DF1
0DEC                        L0DEC:
0DEC : 30 99 FD    "0  "        jnb    ti,L0DEC
0DEF : C2 99        "  "        clr    ti
0DF1                        L0DF1:
0DF1 : E5 2D        " -"        mov    a,X002D
0DF3 : 60 0D        "` "        jz    L0E02
0DF5 : E5 2A        " *"        mov    a,X002A
0DF7 : 54 FE        "T "        anl    a,#0FEH
0DF9 : 70 07        "p "        jnz    L0E02
0DFB : E5 2A        " *"        mov    a,X002A
0DFD : 44 80        "D "        orl    a,#080H
0DFF : FF        " "            mov    r7,a
0E00 : 80 02        "  "        sjmp    L0E04
                ;
0E02                        L0E02:
0E02 : AF 2A        " *"        mov    r7,X002A
0E04                        L0E04:
0E04 : 8F 99        "  "        mov    sbuf,r7
0E06 : 75 2B 01    "u+ "        mov    X002B,#001H
0E09                        L0E09:
0E09 : 75 29 01    "u) "        mov    X0029,#001H
0E0C : C2 98        "  "        clr    ri
0E0E : 12 12 95    "   "        lcall    L1295
0E11                        L0E11:
0E11 : D0 07        "  "        pop    X0007
0E13 : D0 06        "  "        pop    X0006
0E15 : D0 05        "  "        pop    X0005
0E17 : D0 04        "  "        pop    X0004
0E19 : D0 03        "  "        pop    X0003
0E1B : D0 02        "  "        pop    X0002
0E1D : D0 01        "  "        pop    X0001
0E1F : D0 00        "  "        pop    X0000
0E21 : D0 D0        "  "        pop    psw
0E23 : D0 82        "  "        pop    dpl
0E25 : D0 83        "  "        pop    dph
0E27 : D0 F0        "  "        pop    b
0E29 : D0 E0        "  "        pop    acc
0E2B : 32           "2"         reti
```
This is clearly an interrupt service routine, you can tell by the push operations at the start (to back up working registers) and the pulls at the end, finished by a "reti" - return from interrupt operand.
My analysis in comments:
```
	jnb	ti,L0DDE 		; branch if interrupt not caused by UART sending a byte.
	mov	X002B,#000H 	        ; clear 0x2B if byte had been sent already (0x2B is a 'byte sent flag' from later in this interrupt)
	clr	ti			; clear Tx interrupt flag
L0DDE:
	jnb	ri,L0E11		; branch if interrupt not caused by UART recieving a byte. exits interrupt
	mov	X002A,sbuf		; copy serial buffer to 0x2A
	mov	a,X002C			; 0x2C set outside of interrupt.
	jz	L0E09			; jump if zero
	mov	a,X002B
	jz	L0DF1			;jump if Tx interrupt flag was set above, this means we were interrupted by both rx and tx.
L0DEC:
	jnb	ti,L0DEC		; wait for Tx complete. as we sent a byte last time.
	clr	ti			; clear Tx interrupt flag
L0DF1:
	mov	a,X002D			; set elsewhere
	jz	L0E02			
	mov	a,X002A			; move serial buffer to ACC
	anl	a,#0FEH			; bitwise AND
	jnz	L0E02			; branch if zero
	mov	a,X002A			; move serial buffer to ACC
	orl	a,#080H			; bitwise OR - set bit 7
	mov	r7,a			; move serial byte with bit7=1 to R7
	sjmp	L0E04
L0E02:
	mov	r7,X002A		; move serial byte to R7, either with bit0 cleared or not (if 0x2D == 0)
L0E04:
	mov	sbuf,r7			; write R7 back to serial buffer for transmission..
	mov	X002B,#001H		; this makes the interrupt wait for transmission next time it runs, the 'byte sent flag'
L0E09:
	mov	X0029,#001H		; set 0x29
	clr	ri			; clear Rx interrupt flag
	lcall	L1295		        ; this call clears ram 0x0023, 24, 25, 26
L0E11:
```
So basically it will receive a byte, save it to 0x2A then send it onwards IF 0x2C is nonzero. This supports the daisy chain hypothesis as it would appear (when enabled) the interrupt forwards packets. it also modifies bit7 of the packet under some circumstances, if 0x2D is set AND sbuf bit1-bit7 are zero (bit0 is ignored by the &0xFE) then it will set bit7. This would lead to a behaviour where the first device to get such a packet would modify it, the rest would simply forward it. Definately part of the daisy-chain control method. I'd imagine the packet would remain unmodified when a node is done transmitting. There is probably a 'Reset' packet to get the node transmitting again. bit0 is probably used for this as it remains unchecked and unmodified when forwarding packets.
There is definitely some logic elsewhere in the program, time to search for code that looks at RAM 0x2A and sets 0x2D. Given the nature of compiled C code this is a ridiculous task, there are branches and calls everywhere. I think an 8051 emulator might be the way to go here.
###########--CBF--###########
Another option is to try out sending packets like 0x00 and 0x01, see if anything happens. Need to know the baud rate for this. So, diving back into the datasheet, Timer1 is used to generate the UART clock:
UartBaudRate = 1/2 x T1_Overflow_Rate
T1_Overflow_Rate = T1CLK / ( 256 - TH1 )
Sysclk is configured with mov X00B2,#0C7H, internal oscillator enabled, SYSCLK derived from Internal Oscillator divided by 1, ie 24.5MHz
T1mode is set with mov tmod,#020H, Mode 2: 8-bit counter/timer with auto-reload as recommended for UART usage.
T1clk is configured with mov X008E,#018H to use sysclk with no divider
TH1 is set by mov th1,#0ECH.. Which is not in the table. WTF. Non standard baud rate? ok then.. ( 24500000 / ( 256 - 236 ) ) / 2 = 612,500 bps.
I guess the best way to test this out is another Silabs uC. I'll probably try that in the next log.
first steps
03/28/2016 at 20:12 • 0 comments

Did some probing of the BMS and tracing out of the isolated circuitry, I think I have figured out how the basic communication bus SHOULD work..
Looks like Tesla have powered down the uC and isolator chips to save energy when the car is turned off, they do this with a switch between the IC Vss/gnd and the TI BMS IC's Vss/gnd, looks like U5 and U5 are the switches. U2 appears to be a voltage regulator (judging by the heat it produces when powered on.)
They have an oscillator which enables the isolator IC periodically, if the bus side of the isolator is powered, pin 14 will be pulled low, this then enables the uC and isolator until the isolated bus is unpowered and pin 14 goes high impedance. When it poweres up, the uC flashes the onboard LED, but it doesn't send out any data, in fact it pulls the data line low..
Pin 17 of the uC is connected to an output channel of the isolator, pin 16 is connected to an input, these are UART TX/RX pins so it's a safe bet that some serial protocol is used to enable the board and read out data. and it's not just a simple
Unfortunately it's going to take a long time to brute force the serial baudrate and combination of bytes which makes data flow.. so this is a dead end unless I can get my hands on a working Tesla with it's pack ripped open to probe the bus.. Anyone?
Otherwise plan B is to write my own code to run on the Silabs uC and interface with the BMS IC.

Model S BMS hacking

Wiring modules together

bms settings

UART protocol cracked!

Firmware hacking

first steps