Therese Sullivan, editor of BuildingContext.me, and our ControlTrends’ eyes and ears from Silicon Valley, takes us on a data journey that begins at the ethereal origins of Artificial Intelligence (Dartmouth, 1956) and delivers us to metadata and Project Haystack — introducing some of the label makers in between.
One of the most re-watched episodes of the comedy series Seinfeld is ‘The Label Maker’ when Elaine’s gift to a friend at Christmas was re-gifted to Jerry before the Superbowl. True, a thing for tagging other things was not a fun or romantic gift in 1995. And, perhaps Bryan Cranston’s fictional dentist didn’t get it. But, Julia Louis-Dreyfus’s Elaine was way out ahead in her thinking. Label making is important! If you are a data wrangler today, you should appreciate any gift that helps you tag things with metadata labels.
Frank Chen of Silicon Valley venture capital firm Andreessen Horowitz (a16z) presents a timeline in his AI and deep learning mini-course that happens to plot label-making’s journey from elephant gift to tech’s newest cool thing. Released to all interested students in June 2016, it is a fantastic history lesson and primer on what is happening in artificial intelligence (AI) today. He writes:
“One person, in a literal garage, building a self-driving car.” That happened in 2015. Now to put that fact in context, compare this to 2004, when DARPA sponsored the very first driverless car Grand Challenge. Of the 20 entries they received then, the winning entry went 7.2 miles; in 2007, in the Urban Challenge, the winning entries went 60 miles under city-like constraints. Things are clearly progressing rapidly when it comes to machine intelligence. But how did we get here, after not one but multiple “A.I. winters”? What’s the breakthrough? And why is Silicon Valley buzzing about artificial intelligence again?
The same AI entering cars is impacting buildings too. Listen to Ken Sinclair discuss the surprising rate of innovation in his latest ControlTrends interview. Chen answers his own question this way: more compute power, more data, better algorithms and more investment. His research colleague at a16z, Ben Evans explores the topic of labeling in more depth in the blog post AI, Apple and Google. Here are some key excerpts:
So you can say to your phone ‘show me pictures of my dog at the beach’ and a speech recognition system turns the audio into text, natural language processing takes the text, works out that this is a photo query and hands it off to your photo app, and your photo app, which has used ML systems to tag your photos with ‘dog’ and ‘beach’, runs a database query and shows you the tagged images. Magic.
Try it without labels (‘unsupervised’ rather than ‘supervised’ learning). Today you would spend hours or weeks in data analysis tools looking for the right criteria to find these, and you’d need people doing that work – sorting and resorting an Excel table with a million rows and a thousand columns, metaphorically speaking.
The eye-catching speech interfaces or image recognition are just the most visible demos of the underlying techniques.
The important part is not that the computer can find them, but that the computer has worked out, itself, how to find them.
Did you catch that? The speech and image recognition technology may be superficial eye-candy compared to the feat of putting together the underlying knowledge graph. In other words, how you classify and label objects is at the core of how well your AI works. Knowledge graphs for the World Wide Web are the domain of semantic web researchers. Three leading professors in the field from the University of Zurich, Rensselaer Polytechnic Institute, and Stanford University collaborated on the September 2016 article, A New Look at the Semantic Web. Here are some key excerpts from this long-form editorial:
Bringing a new kind of semantics to the Web is becoming an important aspect of making Web data smarter and getting it to work for us. Achieving this objective will require research that provides more meaningful services and that relies less on logic-based approaches and more on evidence-based ones.
Crowdsourcing approaches allow us to capture semantics that may be less precise but more reflective of the collective wisdom.
We believe our fellow computer scientists can both benefit from the additional semantics and structure of the data available on the Web and contribute to building and using these structures, creating a virtuous circle.
Labeling, edge computing, artificial intelligence—these are three pieces of the same puzzle—a puzzle that seems to be coming together very fast right now. (Don’t miss the recent slideshow of another a16z thinker, Peter Levine, on how edge computing will soon eclipse the cloud.) The concepts and timing that Silicon Valley’s a16z thought leaders describe are as applicable to buildings as they are to cars, dogs and beaches. And, the academics leading the semantic web conversation are saying that the mark-up languages and metadata schema are coming from all corners of the web, not just ivory towers.
Frank Chen points out that the latest Google image recognition algorithms can chow down on the entire collection of videos on Youtube. But, when they do that, they get a graph that skews in favor of cats doing funny things. That doesn’t reflect the real world. The best knowledge graphs, metadata schema, neural nets—whatever you want to call this undergirding ML labeling technology that does the classifying—the versions that work best reflect the collective-wisdom and first-hand evidence of those with physical-world experience.
This brings us to Project Haystack, the open-source organization launched in 2011, devoting to developing a standard mark-up language and a tagging schema for devices in commercial buildings. Given the core importance to AI of getting standardized labeling right the first time, it is no surprise that Academia and big-IT picked up on the Haystack schema when they launched Brick schema. One way to look at it is that there is more industry, academic and government energy, focus and money being invested in label-making than ever before—what a gift! Seinfeld’s Elaine Benes would be such a supporter if she were here today. And even the dentist that became Walter White of Breaking Bad would not under-appreciate it. I hope more of those that hold the evidence and wisdom to contribute get involved. Silo-ing data was the way business was conducted in the last innovation cycle, but, it won’t work going forward in the age of AI and machine learning (ML).
Another reason to do your part: tomorrow, there may not be chief marketing officers and chief technology officers, but rather chief labelers of marketing things and chief labelers of technology things, etc. The labeling of training data for machine-learning algorithms is about to consume us all—at least everyone that works with computers, mobile phones, and Internet-of-Things devices. So, best to get ahead of the game.