1 |
× |
Wirefitted Feedforward Neural Network
As the heart of Fido's learning algorithm, a feedforward neural network coupled with a least squares interpolator is used to model the reward function. This allows relations to be made both between similar actions and similar states, reducing required training iterations.
|
1 |
× |
Q-Learning Reinforcement Learning Algorithm
Fido uses a modified version of the model-free q-learning to shape Fido's q-function, allowing for the system's universality.
|
1 |
× |
Uncertainty Approximation System
Fido continuously calculates an uncertainty value, or the deviation between it's new and it's old model. This novel system allows Fido to detect when it's being retrained, allowing efficient hyperparameter optimization.
|
1 |
× |
Intelligent Probabilistic Action Selection Policy
Fido uses it's uncertainty value to modify it's exploration in action selection, making Fido explore more when being retrained. This not only allows Fido to learn faster and better, but makes it highly retrainable.
|
1 |
× |
Dynamically Optimized Experience Replay
In addition to training it's model on new experiences, Fido will also train on past experiences to decrease training time. The number of past experiences sampled is kept proportional to the derivative of the uncertainty value, as Fido shouldn't train from an old task's experiences.
|
1 |
× |
Fluid Model Architecture
Fido dynamically grows and shrinks it's neural network throughout operation to best fit the task at hand.
|
1 |
× |
Robot Simulator
To test and prototype Fido, we created a versatile simulator. This allowed us to test Fido's performance with various sensors and kinematics with a large number of trials for proper statistical significance.
|
3 |
× |
Hardware Implementations
To test Fido's real world functionality and practicality, we build three diverse robots. This confirmed Fido's ability to operate with a low power budget and natural sensor noise.
|