SAIL Room - 111 Levin Building, 425 S. University Avenue
Department of Neuroscience
Comparison of object recognition behavior in humans, monkeys, and deep neural networks
At the core of human high-level vision is the ability to rapidly and accurately recognize objects in spite of identity-preserving image transformations. Models that strive to reproduce this invariant object recognition ability would need to accurately capture human behavior, including making the same patterns of errors over all images. Here, we applied this straightforward visual “Turing test” to the leading feedforward computational models of human vision (hierarchical convolutional neural networks, HCNNs), which are optimized for performing object recognition, and to a leading animal model (rhesus macaques) over the broad behavioral domain of basic-level object recognition. Using high-throughput data collection systems for both human and monkey psychophysics, we were able to collect ~1 million behavioral trials, estimating response patterns across hundreds of images. Consistent with previous work, macaques and HCNNs were highly consistent with humans in their pattern of object confusions. However, at the resolution of individual images, we found that all tested HCNNs produced classification responses significantly less consistent with human and monkey responses. This gap in behavioral consistency of object recognition at the image level could not be easily attributed to low-level statistics of the images or object viewing meta-parameters; nor was this gap rescued by primate-like retinal input sampling, alternative output decoders, or additional model training on similar images as those used in behavioral testing. Crucially, objects and images were not optimized to be adversarial to HCNNs. These results highlight a general failure of the current feedforward, supervision-trained HCNN model class to fully replicate the image-level behavioral patterns of primates. In the future, high-resolution behavioral metrics could serve as a strong constraint for discovering models that more precisely capture the neural mechanisms of primate object recognition.
A pizza lunch will be served.