Detection, Inspection, Return: An Object-Based Classification and Metric of Fixations in Complex Scenes
AbstractAnalyses of human gaze behaviour towards complex scenes typically aim to explain heatmaps or scan-paths. While heatmaps lack temporal information, scan-paths aim for a level of detail which often is impractical. We introduce a novel approach, based on the premise that most fixations target objects and do so in meaningfully different ways, depending on temporal context: Detection fixations (D) foveate an object for the first time; Inspection fixations (I) successively target object details; and Return fixations (R) revisit a previously fixated object after going elsewhere. To test the hypothesis that these classes capture distinct fixation profiles, we reanalysed a large dataset of scene fixations. We computed separate heatmaps for D, I, and R and found significantly higher inter-observer consistency within than between classes. Across fixations landing on different semantic features, the proportion of D, I, and R fixations varied consistently, and a semantic salience model trained to predict each type of fixations independently learned diverging distributions of feature weights. Further, we found a shift from D to I and R across viewing time, in line with previous findings on ambient and focal viewing modes. We tested and confirmed that the dynamics of this shift varied as a function of trial duration. Finally, we highlight the recent application of the D, I, R classification as a metric for gaze comparisons in the context of dynamic scenes, in which scan-path similarity metrics fail. We propose the D, I, and R classification as a computationally simple yet powerful tool to classify spatiotemporal aspects of scene fixations in an object-based and intuitive manner and provide well-documented code to implement it. Future research may explore potential functional differences between D, I, and R fixations.
02.02.2026 20:45 — 👍 0 🔁 0 💬 0 📌 0