Google's Vision For Embedded Vision: Numerous Impressive Technology And Product Implementations

Google's various efforts in the embedded vision space have caught my news-coverage attention many times in the past. There have been, for example, a series of postings related to the imperfect but still notable first stab at facial recognition in Android v4.0 "Ice Cream Sandwich," whose technology likely comes at least in part from the company's last-year acquisition of PittPatt (Pittsburgh Pattern Recognition). And don't forget about the cloud-based image analysis capabilities built into the Google Goggles search engine application. My earlier writeup had covered Goggles v1.6; v1.7 added continuous-scan mode along with optimization for text sources, while v1.8 made various continuous-scan and barcode recognition improvements. And, of course, there's also Google's autonomous vehicle project.

Those three projects (facial recognition, image-based search, and self-driving cars) don't, however, come close to encompassing the entirety of Google's embedded vision involvement. Take, for example, the company's book-scanning efforts, which are the subject of lawsuits by authors and publishers. Google has developed infrared-based technology that calculates and adjusts for the three-dimensional distortions of non-flat pages when scanning them and converting their text contents via optical character recognition.

Speaking of optical character recognition, a late-June writeup in ExtremeTech notes that for around three years now, the Google Docs service has supported auto-conversion of PDF and text-containing images into editable document files. However, when the company finally rolled out the long-rumored Google Drive service at the recent Google I/O developer conference, it expanded the feature in fairly dramatic fashion, to a cloud-based, full-blown image recognition engine outcome.

"It might seem totally sci-fi," ExtremeTech notes as an example, "but if you upload a picture of a pyramid to Drive, you can search for it and the system will identify it based on what Google knows about pyramids." And continuing, "It’s not too shocking that Google would bring this feature over to the iOS and Android versions of Drive, but it is extra handy on a platform where you don’t have advanced search tools to sift through your 5GB of data. The bulk of the computing is taking place on the server side so it’s not like your iPad needs to be able to recognize that pyramid, you just need to have an internet connection so that Google can lend some of its search magic to the files you’ve placed on Drive."

And then there's Google Translate, an app that started out as a language translation service for web pages and user-entered text, later expanding its capabilities via voice-recognition input facilities. As of v2.5, unveiled earlier this month, Google Translate will also translate text encompassed within an image, guided by your finger swipes to highlight areas of the image to be translated. For more, check out the following additional coverage sources:

Last but not least is the latest news on the Google Glass augmented reality personal display system…but there's sufficient information here to justify a standalone writeup. Stay tuned!

If you're building AI or vision-enabled products, you've come to the right place.

Google’s Vision For Embedded Vision: Numerous Impressive Technology And Product Implementations

Pages

Topics

Contact

Address

Phone