A dataset of 133,588 human object recognition judgments as a function of viewing time for 4,771 images from ImageNet and ObjectNet. The human judgments allow us to calculate an image-level metric for recognition difficulty based on minimum exposure in milliseconds that humans needed to reliably classify an image correctly. This allows us to explore dataset difficulty distributions and model performance as a function of recognition difficulty. These results indicate that object recognition datasets, are skewed toward easy examples and are the first steps toward developing tools for shaping datasets as they are being gathered to focus them on filling out the missing class of hard examples. Read more in our upcoming publication!
This work was supported, in part by, the Center for Brains, Minds and Machines, CBMM, NSFSTC award CCF-1231216, the MIT-IBM Brain-Inspired Multimedia Comprehension project, the Toyota Research Institute, and the SystemsThatLearn@CSAIL initiative.