Why are some people better at recognising emotions?

There is vast variation in how accurate people are at identifying emotions but a lot of this research has only focused on either emotional faces or voices, not a lot has been done using audio-visual stimuli. Therefore, we do not know how individual differences affect the integration of visual and auditory cues during emotion recognition. For us to fully understand emotion recognition in the real world, and how this varies between individuals, we need to examine these differences in the context of audio-visual stimuli.

Background and Justification

Exploring emotion recognition using multimodal stimuli is crucial as this is how we process, recognise, and experience emotions every day and this gives us an insight into how individuals are helped or hindered due to individual differences. We need to find out how visual and auditory cues contribute to emotion recognition and how they are integrated. One way of addressing this is to compare emotion recognition when the cues are congruent (display the same emotion) and incongruent (display conflicting emotions). Previous studies typically researched congruence using emotional faces and situational cues and found that people are quicker and more accurate in emotionally congruent situations (Aguado et al., 2018; Righart & de Gelder, 2008; de Gelder et al., 2005) but there is little on congruent and incongruent emotional faces and voices and even less on how childhood trauma and psychopathy traits affect the processing of this. Emotions are multimodal in daily life with emotion recognition combining the facial and vocal expressions yet the multisensory nature of emotion processing has barely been explored (Collignon et al., 2008; Wang et al., 2021).

Our key focus for this research is exploring how individual differences (childhood trauma and psychopathy traits) affect the integration of visual and auditory information during multimodal emotion recognition. To do this we will use congruent and incongruent audio-visual stimuli and ask participants to focus on one modality at a time to see how they process the overall and specific emotional information and how the individual differences affect performance. Participants will also be asked to rate how intense the emotion is as previous work found congruence had an effect on emotional intensity ratings, with congruent stimuli creating higher intensity ratings (Föcker et al., 2011), and this can give us further insight into the processing of congruent and incongruent emotions.

Based on the integration work that is already out there we know that visual and auditory emotional information influences the processing of each other (Gerdes et al., 2014; Müller et al., 2012), but we don’t know how much this varies between individuals. Different individuals may preferentially focus on one modality (faces or voices), or display a specific bias towards certain emotions, regardless of the modality in which they are expressed. For example, some individuals may be more sensitive and respond to cues of anger regardless of whether they are expressed visually or audibly.

The only hypotheses we can form are from facial emotion literature as the vocal emotion literature is scarce. In line with facial emotion models, we would expect childhood trauma to create a heightened sensitivity to threatening emotions (e.g., anger) and psychopathy to show a general deficit across all emotions with a particular deficit in sadness and fear. For childhood trauma, this might mean better recognition and/or higher intensity ratings overall in congruent trials and more errors or atypical intensity ratings in incongruent trials where the unattended modality is expressing the threatening emotion (e.g., anger). For psychopathy, it might mean more errors and/or atypical intensity ratings for sadness and fear in both the congruent and incongruent conditions.


We will perform an emotion recognition task where participants see audio-visual stimuli expressing either congruent (e.g., happy face, happy voice) or incongruent emotions (e.g., happy face, angry voice). They will be asked to focus on either the face or the voice and then asked how intense the emotion expressed is on a 10 point Likert scale. The extra measures included will be personality and alexithymia as they are often linked to the key variables and measuring these will allow us to control for this potential co-variance. The individual differences questionnaires used will be the Childhood Trauma Short Form (CTQ-SF), the Self-Report Psychopathy questionnaire (SRP-SF), the Mini-International Personality Item Pool (Mini-IPIP), and the Toronto Alexithymia Scale (TAS-20).

The congruent and incongruent stimuli have been generated from a pre-validated database of audio-visual stimuli and are already created. Everything for this study is set up and ready to run once we have participant funding.

Sample size

The desired sample size is 400 participants. This number was chosen as a similar childhood trauma and emotion recognition paper used a comparable sample of 360 participants (Tognin et al., 2020) but because ours is an online study we have the resources to reach and test more participants, hence the increase to 400. Individual differences research needs a larger sample and more power as individual differences tend to be much smaller than the main effects.

Study costs

We are asking for £4,000. This was calculated using the Prolific costs calculator and includes 400 participants receiving £7.50 an hour (£3,000) and it includes the 33% service fee (£1,000). This money will allow us to produce reliable data and have a better chance of replication.

Preregistration and open research

Aspects of this study has been preregistered using AsPredicted, the link for this preregistration is https://aspredicted.org/gx2wk.pdf. The study’s findings, materials, analysis code, and data will be made openly available by using the Open Science Framework, making it possible for all to access.


  • Aguado, L., Martínez-García, N., Solís-Olce, A., Dieguez-Risco, T., & Hinojosa, J. A. (2018). Effects of affective and emotional congruency on facial expression processing under different task demands. Acta psychologica , 187 , 66-76.
  • Collignon, O., Girard, S., Gosselin, F., Roy, S., Saint-Amour, D., Lassonde, M., & Lepore, F. (2008). Audio-visual integration of emotion expression. Brain research , 1242 , 126-135.
  • de Gelder, B., Meeren, H. K., Righart, R., Van den Stock, J., AC, W., & Charlestown, M. A. (2005). Beyond the face: Exploring rapid influences of context on face processing. Perception ECVP abstract , 34 , 0-0.
  • Föcker, J., Gondan, M., & Röder, B. (2011). Preattentive processing of audio-visual emotional signals. Acta psychologica , 137 (1), 36-47.
  • Gerdes, A., Wieser, M. J., & Alpers, G. W. (2014). Emotional pictures and sounds: a review of multimodal interactions of emotion cues in multiple domains. Frontiers in Psychology , 5 , 1351.
  • Müller, V. I., Cieslik, E. C., Turetsky, B. I., & Eickhoff, S. B. (2012). Crossmodal interactions in audiovisual emotion processing. Neuroimage , 60 (1), 553-561.
  • Righart, R., & De Gelder, B. (2008). Recognition of facial expressions is influenced by emotional scene gist. Cognitive, Affective, & Behavioral Neuroscience , 8 (3), 264-272.
  • Tognin, S., Catalan, A., Modinos, G., Kempton, M. J., Bilbao, A., Nelson, B., … & Valmaggia, L. R. (2020). Emotion Recognition and Adverse Childhood Experiences in Individuals at Clinical High Risk of Psychosis. Schizophrenia bulletin , 46 (4), 823-833.
  • Wang, Z., Chen, M., Goerlich, K. S., Aleman, A., Xu, P., & Luo, Y. (2021). Deficient auditory emotion processing but intact emotional multisensory integration in alexithymia. Psychophysiology , e13806.