Still adding enhancements like:
adadelta,
minibatch (6CPU_iterationsx5GPU experiences),
visualization, corrections, etc...
Also I'm using temporal_window variable which is used on input layer to add the lastInputs+lastActions (ConvnetJS :) ), to get now a better estimates in the reinforcement learning T value.
before:
_N = 0 maxval = forward(state_N+1) t = reward_N*maxval forward(state_N) backward(action_N, t)
state1_t0(+1) = 1 // state0action0 usefull
state2_t0(-1) = -1 // state1action1 usefull
and now...:
_N = 0 foreach temporal_windows: maxval = forward(state_N+1) t += reward_N*maxval _N++ _N = 0 forward(state_N) backward(action_N, t)
state1_t0(+1) + state2_t1(-1) = 0 // state0action0 not usefull (or take as -1?)
state2_t0(-1) + state3_t1(+1) = 0 // state1action1 not usefull
other example:
state1_t0(+1) + state2_t1(+1) = 2 // state0action0 very usefull
state2_t0(-1) + state3_t1(-1) = -2 // state1action1 very usefull
--------------------------------------------------------------------------------------
I want try some day what happens if I connect a blank neural network used as reward applicator (to avoid indicate any reward) and this reward net came from some type of long memory net.
Reward net actualization will be modelated according to something like a "neurons cell energy" variables with thresholds values to send backwards signals to this applicator and associating somehow the current input to the long memory.
Also the inputs layer from normal sensors go to indicate actions as usual but also go to long_memory > reward_applicator to get a closed loop system.
I will not be able to give a reward for walking towards the food but if the long memory + reward helps to reach the food by chance, the neurons receive their energy and it is recorded. Otherwise... natural selection.
or something similar :D
Discussions
Become a Hackaday.io Member
Create an account to leave a comment. Already have an account? Log In.