#Knowledge activation and perspective taking in visual scene categorisation

Knowledge activation and perspective taking in visual scene categorisation

We understand our visual world fundamentally by categorising it. This project addresses two key challenges of visual categorisation: (1) how it uses different types of knowledge (taxonomic, or object and context based versus functional, or action based), and (2) the role of the viewing perspective, or frame of reference (allocentric, third-person versus egocentric, first-person).

Background

Our brain gathers and represents different types of information to generate predictions and orient behavior in real-world visual scenes (e.g. cooking in kitchens). Predictions may arise from expectations grounded in semantic memory – or the schema [1,2] – of the scene. For decades, research on scene categorisation has exclusively considered the role of taxonomic knowledge within the schema (such as scene contexts, e.g., “kitchen”, and typical objects, e.g. “saucepans, spoons,…”). More recently, functional knowledge (about actions typically performed in a scene, e.g., “cooking, washing dishes, …”) has been proposed [3] as a fundamental principle driving scene categorisation. Taxonomic and functional strategies rely on ventral versus dorsal neural pathways, respectively [4,5], which suggests that they are dissociated.

An egocentric perspective (Fig.1a), where the viewer’s head serves as reference [8], offers a natural, dynamic view, with access to potential interactions with the environment. Therefore, this is the view of the agent, represented in dorsal brain regions related to motion, reaching and goal-directed actions [9-11]. An allocentric perspective (Fig.1b), in which object relations are not defined by a viewer’s natural point of view, tend to be fixed (e.g., CCTV camera), with limited possibilities for interaction. This frame of reference is represented in the ventral brain regions that carry information about objects and scene contexts [12]. Egocentric and allocentric perspectives should, therefore, promote the use of functional and taxonomic knowledge, respectively. Understanding whether and how neural processing differences underlie behavioural differences is important to clarify how the categorisation process works.

Method

The project includes six experiments, programmed in PsychoPy and hosted on the Pavlovia platform, which use 2D rendered images of 3D computer-generated scenes (e.g., kitchens, bedrooms, bathrooms, lounges) from the SUNCG and 3D-FRONT databases [13,14]. The experiments have a repeated-measure design and directly compare functional (action-based) versus taxonomic (specifically, object-based) categorisation when using egocentric or allocentric frames of reference. They involve categorisation of egocentric and allocentric scene images, primed with action or object words (50% action and 50% object trials). Participants judge whether one of two actions or objects is typically associated (association evaluated by a pilot study) with the scene (a typical action to perform in the scene, or a typical object found in the scene). We present 1/3 of images typically associated with one word, 1/3 typically associated with the other word, and 1/3 associated with neither. In Experiments 1, 3, 5 (Fig.2), the scene image is presented until response, and response times are analysed. In Experiments 2, 4, 6 (Fig.3), it is presented for 100ms and then masked; these experiments also analyse response accuracy.

Experiments 1-2 are the baseline experiments to investigate the contributions of taxonomic and functional knowledge to categorisation in egocentric versus allocentric frames of reference. They present images without any labelled associated objects (e.g., kettle), but with key items that afford the labelled associated action (e.g., stove, microwave, etc. for “cooking”), and that are important in identifying the scene as a typical exemplar of the corresponding category. Participants judge action or object association with the scene category.

Experiments 3-4 further examine action-based and object-based categorisation by disentangling knowledge activation concerning a given scene category from the presence of specific visual information, within the image, concerning the labelled object or the labelled action associated with that category. They thus compare images that contain the labelled associated object and the key items that afford the labelled associated action with images that do not contain this visual information (Figure 4). Participants judge association with the scene category.

Experiments 5-6 use the same images as Experiments 3-4 to investigate the influence of task instructions, and further examine reliance on the scene category versus on the specific visual information. To this purpose, participants judge action or object association with the scene category (as in all the other experiments) versus whether one of the two labelled objects is in, or one of the two labelled actions could be performed in the specific scene image.

Hypotheses: We expect better performance for action-based than object-based categorisation for egocentric images and the opposite pattern for allocentric images, with a greater effect when the image contains visual information directly related to the associated object or action labelled. Ethical approval: University of Keele (REC Number: PS-190107).

Pre-Registration & Open Access

Pre-registration is available at https://aspredicted.org/jj3fb.pdf. Anonymised raw data and the analysis code will be made publicly available on the OSF repository upon publication. Findings will be disseminated through seminars, conferences and papers.

Sample size, costs and recruitment criteria

Based on power calculations ran on G*Power 3.1.3 [15], with estimated effect size f=0.14, α =0.05, correlation among repeated measures=0.03, we need to recruit minimum 225 participants in each experiment to obtain 0.95 Power. Thus, we plan to recruit 1485 participants in total (including 10% expected replacements for poor data quality). Participants in each experiment complete one 30-minute session and will be compensated £3.75. Total cost: £7,796.25, including Prolific fees.

Inclusion and exclusion criteria: English Native speakers, aged 18-40, normal or corrected-to-normal visual acuity, no language-related disorders, no history of epilepsy or seizures. Participants can take part in only one experiment.

Research Impact

This research clarifies the contributions of different types of knowledge to visual categorisation. In particular, it enhances understanding about whether and how a specific perspective, which favours action representation or not, modulates the use of functional and taxonomic knowledge when categorising a scene. Moreover, it will inform future lab-based studies, which will aim to extend the results by using more precise timing and adding eye tracking to examine information gathering within the scene images. This is crucial for guiding future research design decisions when considering online platforms versus lab-based research.

References

[1] Bar, M. (2004). Visual objects in context. Nature Reviews Neuroscience , 5 (8), 617-629.

[2] Biederman, I., Mezzanotte, R. J., & Rabinowitz, J. C. (1982). Scene perception: Detecting and judging objects undergoing relational violations. Cognitive Psychology , 14 (2), 143-177.

[3] Greene, M. R., Baldassano, C., Esteva, A., Beck, D. M., & Fei-Fei, L. (2016). Visual scenes are categorized by function. Journal of Experimental Psychology: General , 145 (1), 82-94.

[4] James, T. W., Humphrey, G. K., Gati, J. S., Menon, R. S., & Goodale, M. A. (2002). Differential effects of viewpoint on object-driven activation in dorsal and ventral streams. Neuron , 35 (4), 793-801.

[5] Budisavljevic, S., Dell’Acqua, F., & Castiello, U. (2018). Cross-talk connections underlying dorsal and ventral stream integration during hand actions. Cortex , 103 , 224-239.

[6] Burgess, N. (2006). Spatial memory: how egocentric and allocentric combine. Trends in Cognitive Sciences , 10 (12), 551-557.

[7] Zaehle, T., Jordan, K., Wüstenberg, T., Baudewig, J., Dechent, P., & Mast, F. W. (2007). The neural basis of the egocentric and allocentric spatial frame of reference. Brain Research , 1137 , 92-103.

[8] Landis, T. (2000). Disruption of space perception due to cortical lesions. Spatial Vision , 13 (2-3), 179-191.

[9] Galati, G., Lobel, E., Vallar, G., Berthoz, A., Pizzamiglio, L., & Le Bihan, D. (2000). The neural basis of egocentric and allocentric coding of space in humans: a functional magnetic resonance study. Experimental Brain Research , 133 (2), 156-164.

[10] Zaehle, T., Jordan, K., Wüstenberg, T., Baudewig, J., Dechent, P., & Mast, F. W. (2007). The neural basis of the egocentric and allocentric spatial frame of reference. Brain Research , 1137 , 92-103.

[11] Ruotolo, F., Ruggiero, G., Raemaekers, M., Iachini, T., van der Ham, I. J. M.,Fracasso, A., & Postma, A. (2019). Neural correlates of egocentric and allocentric frames of reference combined with metric and non-metric spatial relations. Neuroscience , 409 , 235-252.

[12] Committeri, G., Galati, G., Paradis, A. L., Pizzamiglio, L., Berthoz, A., & LeBihan, D. (2004). Reference frames for spatial cognition: different brain areas are involved in viewer-, object-, and landmark-centered judgments about object location. Journal of Cognitive Neuroscience , 16 (9), 1517-1535.

[13] Fu, H., Cai, B., Gao, L., Zhang, L., Li, C., Xun, Z., … & Zhang, H. (2020). 3D-FRONT: 3D Furnished Rooms with layOuts and semaNTics. arXiv preprint arXiv:2011.09127 .

[14] Song, S., Yu, F., Zeng, A., Chang, A. X., Savva, M., & Funkhouser, T. (2017). Semantic scene completion from a single depth image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 1746-1754).

[15] Faul, F., Erdfelder, E., Lang, A. G., & Buchner, A. (2007). G* Power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behavior Research Methods , 39 (2), 175-191.

Sounds like very well-planned experiments. Hope you win. :slightly_smiling_face:

Just want to mention the authors;

Krystian Ciesielski 1, Andrew Webb 2, Sara Spotorno 1

1 Keele University
2 University of Glasgow