The web may not be the largest thing to run on the internet (these days it seems like Zoom is) but it was the most transformational until mobile apps came along. You can follow the waves by developer interest: in the 2000s everyone was learning HTML and making a website. In the 2010s everyone was learning to develop mobile apps. In the 2020s all the developers are going to build Vision AI. And for good reason.
Where the web had its impact was by digitizing manual paper-based processes. Rather than receive a bank statement in the mail you could view it on the web. Rather than mail in a check, you could pay on the web. Rather than fax in a trade authorization, you could validate it on the web.
This extended to internal enterprise processes, from product configuration to employee surveys, and to B2B processes, from catalog updates to credit reporting. All the information was now digital, thanks to the portal we call the web, and could be acted upon digitally. When mobile apps came along, the groundwork of digitized information was there to make that data available in the palm of our hands.
I believe the next big wave is Vision AI, and for the same reason: It offers the opportunity to digitize the next massive trove of information in the world, that which is not on paper but which can be seen through a camera. Cameras, you know, those miraculous things that, because they are in every smartphone, are now incredibly powerful and cheap.
Cameras and their related processing powers are eclipsing other types of sensors, driven by useful cell phone app experiences and digital culture ranging from Instagram to Pinterest to TikTok to facial recognition technology embedded into sporting events. The cameras and related processing power are becoming incredible and eclipsing other types of sensors. Why use a magnetic proximity sensor that must have something placed on a door when you can point a tiny inexpensive camera at the door and know if it’s open? Why use a temperature sensor when a camera can see reflected light frequencies and determine the temperature? The latest cellphones are integrating LIDAR sensors into their cameras, and I believe the camera sensing suite will become even more sophisticated. Combine this with emerging computer vision technology powered by AI, and together you have Vision AI.
Vision AI has the power to unlock the future of automation in a way not seen since the Web Revolution where every form and phone call was turned into a site, and we unlocked all the resulting searches, analytics, and automated processing that is now commonplace. Just like there are web boot camps, there will soon be computer vision boot camps to enlarge the circle of access to this new technology.
Anything you want to count, record, analyze, or store can be obtained by teaching Vision AI to look for it. And that’s just capturing the data, the way web forms did. After that unfolds everything we can do with that data. Provide reports, comparisons, and analysis. Make predictions. Profile and advertise. Learn and educate.
First come the tools, focused on particular use cases. Today we have uses like detecting manufacturing defects, assessing damage after weather events, gathering inputs for insurance underwriting, counting things for military plans, watching for gambling cheaters, alerting of people carrying weapons onto a premise, and predicting sports plays based on players’ past behavior and “tells.” These have bred Vision AI platforms, image and video management tools, edge platforms, and, of course, all the AI algorithms and training tools that enable turning images and video into information.
Next come the uses we couldn’t think of before — the things that didn’t seem possible before we lived with this technology. On the web, shopping for flights and hotels seemed possible as an extension of the phone-based process we used to use. But did we imagine that we would have a global continuously-updating collaborative encyclopedia or the ability to see what any friend paid for their house?
The real changes come when computers start measuring and counting things that are either too vast for humans to count – every dead oak tree in California – or too expensive for humans to count – every yeast cell in a culture – or too difficult for humans to perceive – the change in gait that suggests a medical condition.
During this decade we will see boot camps teaching hundreds of thousands of developers to utilize Vision AI tools, just the way we taught millions to code the web. After that, we will see our world for the next level of data that it presents and be able to act on that.
So it shall go with Vision AI. The Jetson’s future we’ve all been promised will make its way into existence.
(Disclosure: My firm, Shasta, launched a Camera Fund in 2017, which focuses on Vision AI, so I currently have a vested interest in eight Vision AI companies.)
Issac Roth has created and sold multiple enterprise software companies. As partner at Shasta Ventures he invests in enterprise software companies and currently manages investments in CodeFresh, Scalyr, Beautiful.ai, Quartz.co, and a number of unannounced ventures. He founded the cloud platform Makara, sold to Red Hat, and stayed on to create and scale Red Hat OpenShift. Later Issac incubated core Node.js maintainers at Shasta and joined them to form StrongLoop. StrongLoop was acquired by IBM where Issac became CTO of API Management and Hybrid Cloud Integration.