News  |    |  January 29, 2020

Why asking an AI to explain itself can make things worse

News article by Will Douglas Heaven.
Published in MIT Technology Review.


Creating neural networks that are more transparent can lead us to over-trust them. The solution might be to change how they explain themselves.

Upol Ehsan once took a test ride in an Uber self-driving car. Instead of fretting about the empty driver’s seat, anxious passengers were encouraged to watch a “pacifier” screen that showed a car’s-eye view of the road: hazards picked out in orange and red, safe zones in cool blue.

For Ehsan, who studies the way humans interact with AI at the Georgia Institute of Technology in Atlanta, the intended message was clear: “Don’t get freaked out—this is why the car is doing what it’s doing.” But something about the alien-looking street scene highlighted the strangeness of the experience rather than reassured. It got Ehsan thinking: what if the self-driving car could really explain itself?

The success of deep learning is due to tinkering: the best neural networks are tweaked and adapted to make better ones, and practical results have outpaced theoretical understanding. As a result, the details of how a trained model works are typically unknown. We have come to think of them as black boxes. [ . . . ]