Approach adopted towards solving the problem :
● The track : We have chosen to replicate the Monza circuit, one of the most famous grand prix in the world notorious for its long sweeping bends and it’s few tight corners followed by high speed straights. This seemed like the ultimate ambitious challenge to solve for our self driving car, and as it turns out, the lack of guiding lanes for it to follow along the bends where the track was thicker made it hard for us to get accurate results about when exactly to take a turn. This resulted in a poorly labelled dataset causing issues with the accuracy and on-track driving.
● The Car : and the subsequent problem with the hardware. We initially decided to implement the algorithm from Nvidia’s landmark paper to predict the steering angles from the image. It was extremely hard to find a remote control car here in Delhi that could have the feature of proportional steering. (A servo that can be used to control the front steering instead of the motor in the front of the car.) As a result of this, we had to settle for hardware where we just could execute a full lef turn, a full right turn or a full ahead. This means that at the turns, we had to execute stepwise maneuvers to control the angle of the car while driving manually to train it. This again led to a lot of mislabelled images which has subsequently led to a substantial drop in the accuracy.
● Dataset : Custom created dataset, contains approximately 600 images of the track captured during training rounds with the associated steering direction in the form of labels. ( FF, FR, FL, NN) Sample image and the associated image processing has been provided in the proof of concept results. This dataset has been split 70:30 for training and test datasets with the help of several scripts. ( Please note that the file names are the labels, with a timestamp ) Issues that we’ve observed in the dataset : 1.)Bad Labelling. : Due to the inability to control the angle of the direction of the car, during turns of varying angles, the turning has to be in steps. For example a mild right turn is a conjunction of FF and FR. However when it is collecting the data in the form of the images and the associated direction at certain timestamps, it sometimes ends up picking up the FF in places where there is a clear right turn. 2.)Presence of other objects in the field of vision : Pillars and other features around the track are a part of the images, along with on certain occasions drains running around the track. These parallel images look very close to lane lines after image processing, and edge detection and since they move around in the frame as the car moves around, a region of interest algorithm isn’t effective.
● Neural Network : We’ve used a convolutional neural network, as a method for our Deep Learning. It is a class of deep neural networks, which is mostly used to be able to analyze visual imagery. CNNs use a variation of multilayer perceptrons designed to require minimal preprocessing.
The design of ConvNets are inspired by nature, and the connectivity of the neurons are made is such a way that it resembles an animal’s visual cortex.
Sequential Model in CNN ( Keras ) : It allows sequential layers to be easily stacked ( or other kinds of layers ) of the network in order from input to output.
Convolutional Layer : Convolutional layers apply a convolution operation to the input, passing the result to the next layer. The convolution emulates the response of an individual neuron to visual stimuli. ( Mathematical combination of two functions to create a new function, in CNNs it is done with the help of some kind of filter or kernel to create feature map )
Pooling : Pooling layers are used to be able to reduce the spatial size of the representation which helps reduce the amount of parameters and computation in the network.
Flattening : I’ve had to use flattening so that I can use dense layers afterwards, sequentially. A dense function / Fully connected Layer : It is a linear operation in which every input is connected to every output with a given weight.
Activation function : Usually a non-linear activation function such as Sigmoid or Softmax are used. Sequentially → One Convolutional Layer → Second Convolutional Layer → Pooling → Flattening → Two Dense Layers of 128 and 4 size each.
Using Sigmoid as an activation function on the last layer : Loss : 0. Accuracy : 0. Using Softmax as an activation function on the last layer Loss : 0. Accuracy : 0.
Please note that the 4 output neurons correspond to our classification, allowing us to be able to make a decision for a given image state. ( FF, FL, FR , NN )
Model Prediction : The model was saved : In JSON format, with the weights being stored in the form of an H5py file. Once loaded, this model is being used to make a prediction, in the form of a class for a fed image from the camera to decide the next direction to drive in. Case studies of a few problems we’ve faced and solved Design - Architecture for Scalability This is an an innovative architecture to cluster multiple single board linux computers that allows embedded autonomous systems to scale and monitor different parameters. Multiple programs are run on different cores of different single board linux computers, and bound together by a local network and a network of memory mapped IO. They work parallely to monitor different parameters for the system, and in the case of the detection of an anomaly can pass interrupts to other nodes making control decisions for the autonomous system.
Data Collection : Multi-processing to juice out the most out of the system. The connection to the blynk server to control the car and the Data Gathering program are designed to run continuously at the same time as parallel processes with system resources allocated to them separately. This allows both the processes to interact through one communication channel only, the hardware short on the GPIO pins. ( And the memory space associated with the same) The resultant method of shorting physical pins to interface different programs went on to be the backbone of our ability to interface multiple computational nodes with different preset programs running on them. DriveEngine Although in our program, we’re currently using a sequential flow of logic, multithreading approaches were tried to get over the issue of time required to get a prediction and take the action. We discovered a bottleneck in the form of the Python Global Interpreter Lock. This means that only one thread is allowed to get control of system resources to avoid a race condition in memory while storing objects. This significantly reduces the amount of computational time that can be saved by using multithreading. A workaround for the same would be the usage of different processes, in which each of them would get allocated their own system resources. However this would mean that the programs have to communicate through reading and writing in files. Since we hypothesized that the I/O utilization is already high and I/O would be a bottleneck, we didn’t use this approach. Instead, we have used a sequential method of executing the same.
Hence we are currently just clicking a picture, predicting the direction by passing it through our pre-trained model and on the basis of that executing the driving scripts. Haar-Cascade Classifier to detect traffic signs: Built only as a proof of concept to be able to detect traffic signs such as a stop signal or a traffic light. We’re not using it anywhere in the code base right now while running the car. Haar Cascade Classifiers are very effective for object detection. Initially, the algorithm needs a lot of positive images (images of traffic signs ) and negative images (images without traffic signs) to train the classifier. Then we need to extract features from it. Although we attempted to train one ourselves, the program ran for a few hours and still didn’t stop. Hence we resorted to using pre-trained classifiers which was used for some other projects on Github. ( Source has been cited )
Sample Image Processing Results : Image at 0 Labelled FF. Top Left : Original Image Top Right : Single Channel B/W Centre Left : HSV image Centre Right : Adaptive Gaussian Filtering with blur Bottom Left : Hough Lines drawn on the image for lane detection. Bottom Right : Canny edges to detect the edges of the road. Note : The reason we aren’t sending post-processed images through the CNN is that with the addition of some blur ( from vibrations ), these edge detection algorithms break down completely. CNNs anyways provide good results with minimal post-processing.