If you thought you were about to read a book about the evils of AI, these first few chapters are going to surprise you. With the right choices, augmentative intelligence will help us take a step toward realizing our better selves. I wouldn’t have spent the last 15 years of my life working and studying in this space if I hadn’t sipped at least a bit of that Kool-Aid. Despite all of that potential, though, the effects of artificial intelligence on our lives is far from certain and has little to do with the new technologies we develop. Machine learning, robotics, automation, and artificial intelligence are just tools, neither good nor bad. They are tools that can do amazing and terrible things. We must understand both and decide together what comes next.
In this section I’ll explore the amazing and the terrible sides of AI. This inevitably means a discussion on futuristic medicine and a robot apocalypse, but I can think of no more an honest tour through the power of these new technologies than by sharing my own stories as a scientist and entrepreneur. I’m enormously proud of the work I’ve done, but even that work reveals the murky void between intention and reality.
I’ll share seven stories connecting together the CIA, diabetes, Homeland, job recruiting, trust, and the marketplace of things. Each wears the hat of its own ridiculous aspirations yet remains clearly and unambiguously dangerous. Each should scare you, but each must still be met. In the end, this section will not be about the evils of artificial intelligence but about how we must make truly difficult choices in a massively hyperdimensional world.
The most obvious place to start this story is where I started mine: the Cognitive Science Department at the University of California, San Diego (UCSD). CogSci is a relatively new field. The very first university department was founded right there at UCSD during my freshman year in 1989. CogSci’s interdisciplinary approach to understanding people draws together neuroscience, psychology, machine learning, statistics, philosophy, and more. One lab might be studying the psycholinguistics of metaphor while another down the hall is building brain-computer interfaces. It was a rather amazing introduction to the power of breaking silos and mashing up ideas[1].
While CogSci at UCSD flourished, I did not. It didn’t take long before I’d effectively flunked out and it took a whole decade for me to return and try again. That’s a story for another time, but suffice to say that the 1990s were not a happy time for me[2]. And yet that last decade of the millennium[3] did leave me with one true gift: a purpose. I went back to UCSD in 1999 intent on making better people.
When I returned, I had a passion to understand people and a purpose to improve their lives. I expected to become a traditional “wet” neuroscientist, doing anything from recording the electrical activity of cells to tracing neural pathways with retroviruses. That quarter I took my first (and in fact only) programming course and everything changed. The professor was a hardcore stoner named John Batali. He rolled into class every day with bloodshot eyes, wearing the same pizza-stained t-shirt. I ended up taking three classes with him, including introduction to AI, which he taught as a self-defense class.
John was absolutely one of my favorite professors. At the end of that first quarter he brought me into his office and told me that I’d earned a perfect score in the class. John asked me to be a teaching assistant for the class the following year[4]. But more importantly, he let me know he’d recommended me to be a research assistant for the Machine Perception Lab (MPL). Everything about my professional life today started with that unsolicited recommendation twenty years ago.
MPL was run by Javier Movellan and Marni Bartlett as a part of a larger conglomerate of labs under the famous neuroscientist Terry Sejnowski[5]. It was there that I was first introduced to the power of machine learning: the ability of a computer program to go beyond a set of rules and learn from experience.
My entry into the world of AI and machine learning was through face recognition lie detection for the CIA.
Our project at MPL was to take raw video–no audio, no context–and report second-by-second whether the person in the video was lying. More specifically, we had to automatically extract the facial action code, a system of coding facial expressions developed by Paul Ekman, the famous emotion researcher that inspired the TV show Lie to Me. FACS (Facial Action Coding System) breaks the language of expressions down to independent muscle groups. For example, a contraction of the zygomatic major and orbicularis oculi muscles cause the upturned corners of the mouth and slight tightening of the eyes we read as a Duchenne (“real”) smile. If only the zygomatic major contracts and the smile “never reaches the eyes'' it is a non-Duchenne or “false” smile[6]. Happy, sad, disgust, anger, and all the other (frequently debated) emotions are represented as some co-articulation of action units. Our goal was to read these combinations from the raw video using machine learning.
It turns out that without context, no person is any good at this task: not judges, teachers, or cops. But at the time, the CIA believed that they could train agents to watch these videos frame-by-frame and identify the micro-expressions that indicated a lie. The process was excruciatingly slow and took hours of expert attention. Our lab’s task was to speed up this process. Imagine the power of real-time lie detection on security cameras around the world–scary, right? Well, that was the idea, and it was run as a competition between our lab and Taketo Kanade’s lab at Carnegie Mellon.
In 1999, this was incredibly challenging. The people in the video didn’t just talk; they moved their heads around, turning and looking left and right. They would touch their faces or adjust their hair, sometimes blocking the camera (occlusion in the language of machine vision). We had to somehow capture three-dimensional models of those faces, rotate them out of plane, make them all stare straight ahead, and reconstruct the missing parts.
Marni Bartlett had dedicated her doctoral thesis to discovering how to read those faces. She’d applied a particular type of machine learning, Independent Components Analysis (ICA), to understanding faces and their expressions. It worked amazingly well when the faces didn’t move around, but we still needed to figure out how to make it work when the faces warped, translated, rotated, and turned out of plane. It was a pain in the ass.
I worked on two projects in this lab. My undergraduate honors thesis developed an algorithm called SNoW (Sparse Network of Winnows) to locate specific, largely invariant facial landmarks like eyes and the philtrum[7]. This helped us fit a three-dimensional model of a head to the two-dimensional face image and wrap the image around the model, like Gollum or Avatar, but without the helpful dots that they paint on the actors’ faces. Then we could rotate the face back to look directly into the camera in every frame[8].
My second project was a neural network that used gabor filters to categorize the emotions from each frame. My junior detective version of the expression recognition engine was a far cry from where Marni and Javier eventually took the technology, but it was still such a thrill to build systems that learned to recognize a face and what expression it was showing.
The challenge was brutal, but in the end we produced an incredibly cool demo of a person talking and moving naturally on one side with a reconstructed version saying the same thing but staring straight ahead on the other[9]. With these canonicalized faces we then applied neural networks frame-by-frame to turn the face images into facial action codes. This is basically automatic speech recognition but for the language of facial expressions.
It turned out that real-time expression recognition from natural videos was too difficult back then, and, cool demos aside, neither lab won. However, in the years since, the Machine Perception Lab has spun off from UCSD as a startup called Emotient. With the advent of deep neural networks these systems are finally reaching their potential. They sold expression recognition as a service. Want to know how your customers are feeling? What your visitors think of a new product design? In 2016, they streamed a real-time analysis of one of the Republican presidential debates. The bidding war started the next day between Facebook and Apple. Now Siri understands love[10].
Depending on your feelings about the CIA and Apple, you might find this story fascinating, terrifying, or both. At the time, we were rather ambivalent ourselves. I don’t have to think ill of the CIA to think that they alone should not control such a power. To make ourselves feel better about it we began exploring applications for systems in education that track students’ attention. Our hope was to make educational technology adaptive to the attention of the student: if a kid isn't engaged we could hypothetically change the experience to draw them back in…or at least that was the plan. Again, this was in the early days of these technologies. Today, though, multiple companies offer them, and they are currently being used in classrooms in both the US and France. I want this in my kid’s school, but it’s like a bad Yakov Smirnoff joke come to life: “In the future, you don’t attend to class, your class attends to you.”
This is how I began working in AI: ubiquitous lie-detection for spooks, adaptive classrooms, and Aibo the robot dog[11]. But my purpose always remained focused on people. The power that I saw in theoretical neuroscience was in combining machine learning and natural intelligence to make better people. When I interviewed for grad school I talked about building cyborgs (I’d now call this neuroprosthetics, but let’s not fool ourselves, it’s cyborgs) and algorithms to translate between the brain and computers. I went on to study efficient coding theory at Carnegie Mellon University and design algorithms that can learn on their own to see and hear. We showed how those algorithms could create adaptive cochlear implants[12], and I scratched that mad scientist, cyborg-building itch. But that wasn’t the end of this story. Twice more I had the opportunity to do something truly good with what we learned at the Machine Perception Lab, and the hard choices simply got harder.
[1] Not everyone is so enamored. I overheard one student read the title of our epinonimous “Cognitive Science Building” and mutter under his breath, “Pshhhh…that’s not a science.”
[2] In the same way the 1930’s were not a happy time for Oklahoma farmers.
[3] To anyone who wants to debate whether the millennium ended with 1999 or 2000, first please go fuck yourself.
[4] I ended up taking seven courses a quarter at UCSD with the insane goal of completing my BS in a single year. The plan worked, but the teaching was decidedly subpar.
[5] Terry is something of a scientific industrialist, with labs and institutes spanning multiple departments, schools, and into the Salk Institute. I reconnected with him years later speaking at the Techonomy conference. I was endlessly flattered when an attendee that walked over to talk to us asked Terry, “Isn’t she great?” There is no way Terry actually remembered some undergrad who spent a single year in one of his labs, but without hesitation he responded, “Oh yes. She was one of my best students.” Smooth bastard. (BTW, that attendee was David Bach, who shows up later in this book.)
[6] It turns out most of us rely largely on the mouth to read happiness. No wonder we’re so terrible at spotting liars.
[7] Go look it up you lazy bastard.
[8] It was spooky and cool as hell. Now it seems to be standard in every movie with a special effects budget.
[9] This is like the motion capture animation for Lord of the Rings or some of the posthumous acting in the Star Wars movies. Cool and scary.
[10] Or perhaps more accurately, now Siri understands the human weakness of love.
[11] Sony gave us one of their toy dogs in hopes we could train it to recognize its owners and their emotions. That way Aibo can pretend to feel shame, just like real dogs.
[12] This is another of the many tough questions posed by AI and neuroprosthetics. If we “cure” deafness, entire languages and cultures would end. It’s easy for me to dismiss these fears, but I’m not deaf.