Hello and thanks a lot to everyone who's signed up to follow and like the TextEye project so far! You really keep me going even in the face of severe problems.
I currently mostly work on improving the overall solution and finalizing the first proper version, but sadly don't have much to show so far. That should change over the next few weeks while I work on implementing a better time management so I can get more things done in the little time I currently have.
So what's happening?
Decisions on first version hardware
Since I started the project, a lot of new hardware components have become available. Some of those are quite helpful for creating a really small and easy to carry mobile text reader solution, even more so than what I originally imagined.
For example, the USB webcams work pretty well with the different versions of the Raspberry Pi, but you need to take them apart and also solder a shorter USB cable to them in order to get the size down. Thanks to the recent updates to the Pi camera modues and the Pi Zero - where the new version now comes with a camera connector - we can now use the new Pi Zero with a Pi camera module (normal or NoIR) and get a higher resolution than what most USB webcams can offer, all in a small, easy to fit package.
While both camera options don't have the kind of shake reduction that you can get in modern digital photo cameras, it's a very convinient option.
In combination with the Pi camera module, I'll try to add a small NeoPixel ring with RGBW LEDs instead of the NeoPixel stick I use for prototyping. This can fit around the camera lens, and it adds better white light for more brightness when taking pictures. It might not be necessary for the NoIR camera module, but I have not yet testet that (given that the OCR generally works better with greyscale images, the NoIR module might be a better choice as long as it produces enough contrast in the images).
On the audio side, Adafruit recently came out with a new and nicely small mono amplifier breakout board (the MAX98357 I2S Class-D Mono Amp) that can be connected to a Raspberry Pi using I2S. While I was able to get the Pimoroni pHat DAC board which adds audio output capabilities to the Raspberry Pi Zero (and improved audio output for other Pi models), it took quite a long wait (about 2 1/2 to 3 months) because it was out of stock and took a while for new boards to be produced. From what I've seen from Adafruit - through a lot of their videos, including the "factory tours" Ladyada did on a few occasions - I'm confident that the new amplifier boards won't be unavailable for that long. And I like the smaller form factor - so I've decided to use the new board with the Pi Zero as part of the first version of the TextEye.
So while I'm still prototyping with the Pi model 2, the first "usable" version should use the new Pi Zero, with camera module, NeoPixel ring, Adafruit MAX98357 I2S Class-D Mono Amp, mini metal speaker and USB-rechargeable battery. I've already ordered the new hardware modules and should get them soon.
I thought that the Pi Zero might be a problem, as it has been hard to get in the last few months. But it actually arrived just today, after I ordered it directly from Pimoroni in the UK only a week ago (where the online store already stated that because of Pi Zero related shenanigans orders could only be handled with a 3 to 5 business day delay - and you can currently get only one Pi Zero per customer).
As you can see, I went for the basic kit which included headers, USB adapters and a small case. I also added the camera cable which is needed in order to connect the Pi camera module (which I already have) to the new Pi Zero (the camera connection is a little smaller than on the other models).
Imaging research and insights
If you read through my earlier project logs, you know that while the different hardware and software components each work pretty well, using them for a mobile project like this takes them beyond the limits of what they were originally created for.
The important central component of this project is the OCR software. The tesseract software, like all currently available open source and commercial closed source OCR software, has been optimized for being used with images of text created by flatbed or document scanners. It works really well for that, even when the scanned pages or documents also contain images along with the text.
With a mobile, handheld camera though the image quality is totally different, and the standard OCR algorithms can't cope with the blur, distortion etc. which is normal for this kind of image. Even when I used a digital photo camera with shake reduction, automatic white balance etc. the text images did not work well for extracting any text with the OCR program.
It was clear that the project needs a better solution in order to come close to the original idea of a mobile text reader - otherwise I can only convert this to a non-mobile solution which only replaces a more expensive PC reader setup.
One idea I had was to move the actual image analysis and text recognition over to a cloud service on the internet. Such things have already been done on smartphones, where you can use the mobile internet conneciton and the integrated camera with a connected app which translates text on signs in foreign languages, e.g. in the Google Translate app. Google's cloud vision service, which can be used though the cloud vision API, might be able to provide the needed functionality.
The downside is twofold though: you need to have a mobile internet connection (preferably with a relatively high bandwidth and speed), and from what I've seen so far there is no free online image analysis service right now, so you have to pay some money in order to use that. In addition to that, depending on where you are, you might not have mobile internet access at all, wether it's WiFi or GSM/UMTS/LTE based.
Thankfully text recognition from "natural" images - in this case meaning photos taken with digital cameras, smartphones etc. in everyday, real-world conditions (as opposed to the images from flatbed and text scanners where parameters like basic alignment, lighting, contrast etc. are much more controlled like in a laboratory) - has become a topic of interest for more and more scientists in the field of computer vision and imaging. There are no standard, easily usable solutions so far that we can just download from GitHub or elsewhere, but the topic is being investigated by several specialists and at least one usable method has already been found.
The main idea here is to combine advanced pattern recognition, image analysis, and image enhancement techniques and turn the information inside the image into a searchable hierarchical structure, where characters can be properly located and translated into letters. While the final step is the same as with "normal" OCR, the image analysis process before it is more complicated.
As it is now, the pattern recognition has to be trained with different images in order to improve the reliability of the text recognition, similar to the audio training for your voice that you need for a really good voice recognition that works well for dictation. There are some basic but usable training data sets already, and there is some information about how to expand that with images of your own. This is an interesting application for the area of machine learning.
Francesco Pochetti, a highly skilled guy from Italy with interest in machine learning and similar topics, has written a really good article on the topic back in 2014, and even created some code to go along with it that can be downloaded from GitHub. The code is written in Python, which I've yet to learn, but is easy enough to read, especially with the explanations from the article.
Since other stuff I've seen in this area uses Matlab or similar mathematical analysis software (which usually needs a graphical interface and adds another software layer when you run the code for the image analysis - and needs quite a bit of computing power), Francesco's approach seems more usable for this project, and also allows for later optimizations as well as a C/C++ conversion (for a possible speed improvement).
Before I implement this however, I'm keeping the current software combination for development and testing. My current approach should allow me to switch out the OCR software relatively easy later on since I'm not integrating everything into a monolithic program.
That being said, it also remains to be seen how well the currently available Raspberry Pi models - especially the Pi Zero - can actually handle this kind of advanced image analysis. It's pretty heavy on the computing side, with lots of mathematical calculations and a good amount of data being involved.
Stay tuned... :)
Discussions
Become a Hackaday.io Member
Create an account to leave a comment. Already have an account? Log In.