-
Update
10/25/2015 at 07:08 • 1 commentTo whomsoever it may concern,
The project turned out to be a bit tough. So I joined a PhD in Computer Vision. The method I would like to experiment with is bag-of-words. Therefore, I will try to detail this as hypothesis, experiment and observation. Also, rather than look at this particular application, I would like to switch to an exploratory way of going about it, so that anyone wanting to implement a computer vision project may have a theoretical template on which to base their application.
Hypothesis :
(Originally) Given a model of the breadboard, we can track it, without using any markers. This is vague, and with regards to just tracking, is already done quite well in state-of-art tracking applications, so I will modify it a bit.
(New-explorative?) Try many alternate pipelines to create a robust application using computer vision. (Still vague, but my kind of vague)
Experiment & observations :
The application is broken down as follows:
Real Life --> Data --> Analysis --> Inference --> Output
A log so far for each step, forgive the lack of detail.
- Real-life : We are using a single webcam. The field of view is < >*. The focus is < >*. We have a breadboard, which may be of many types and appearances. We will see clutter on a desk, there may be occlusion due to hands, instruments, components, wires, etc.
- Data : We capture a 2-D projection of the scene, in BGR. As pre-processing, we have many images of a simplified instance, where there is only the breadboard from different perspectives. An addition of hand holding the breadboard is also present, as incremental attempt to move towards actual use case.
- Analysis : (So far, only training) The elements of both the data and the inferences need to have representations. Let the image be denoted as x. Then we have transformations Ti which act on x. Further, let the composition of such Ti-s be denoted f such that Y = f(x). The result Y is a set of points that may be of interest (f is thus the SIFT detector). Let Z = g(Y), where Z is a set of descriptors for each point in the set Y. The following is the plot of all the descriptors in the training images, visualized in 2 dimensions by reducing the 128-dimensional descriptor to 2-dimensional points using PCA analysis.
-----> Will post some theory, links, codes, data and images soon...
* To be ascertained