Research Brief of Hickman et al.’s (2022) Automated Video Interview Personality Assessments…

Andrei Ion, Lauren Moran, & Maria Kraimer

This study examined the psychometric properties (reliability, validity, and generalizability) of automated video interview personality assessments (AVI-PAs) across different interview contexts. The authors focused on personality constructs because personality predicts job performance and is commonly assessed by current AVI vendors. The authors proposed a conceptual and operational model for understanding AVIs and assessing the construct validity of their scores, which can also be used to evaluate AVIs for measuring knowledge, skills, abilities, and other characteristics (KSAOs) beyond personality. Although the results of their study provided some validity evidence, in the interview context, for the use of AVI personality assessments that were trained with observer reports of personality rather than self-reports of personality, the authors highlight that organizations should proceed cautiously given some mixed findings.

What Are AVIs?

Automated video interviews (AVIs) use machine learning algorithms to evaluate potential hires. Organizations employ AVIs due to their time and cost savings for the company. These algorithms analyze both what the interviewees say (verbal cues), how they say it (paraverbal cues), and how they act (nonverbal cues) during the interview. Because we can’t directly see someone’s innate personality traits, such as the Big Five (conscientiousness, openness, agreeableness, emotional stability, and extraversion), AVIs measure the way traits manifest in subtle behaviors during conversations. Thus, based on the assumption that individuals express latent constructs, in this case personality traits, via relatively similar behavioral manifestations, AVIs represent a potentially valid and consistent method for evaluating candidates’ personalities, providing a promising tool for organizations aiming for a fairer hiring process.

Study Method

The authors collected four samples of mock video interviews, all assessing the interviewee’s Big Five personality traits based on self-reports and interviewer observations. The interviewee participants were MTurk workers in Sample 1 and undergraduate students in Samples 2 through 4. The research team conducted mock video interviews with participants in each sample (i.e., the “interviewees”). The interviewees responded to different interview questions across the samples and to a self-reported measure of personality. The interviewers also rated the interviewees’ personality based on their responses during the interviews. Machine learning models were trained to predict interviewees’ self- and interviewer-reported personality traits in the first three samples. Models trained on Samples 1–3 were then applied to Sample 4 to assess interviewees’ personality. As such, the machine learning models could be employed to predict the self- and interviewer-reported personality traits (representing the criteria) by analyzing the interviewees’ responses to the different interview questions (representing the predictors). To do this, the authors employed R software (R Core Team, 2021 ) with its “caret” package (Kuhn, 2008), and nested k-fold cross-validation was used in these samples to assess validity evidence. The nested k-fold validation is a technique used in machine learning for model evaluation and parameter tuning. The researchers also explored the specific verbal, paraverbal, and nonverbal cues contributing to AVI personality assessments and examined the relationship between AVI personality assessment and the student participants’ academic outcomes (Samples 2–4).

The key findings of the study are

Reliability: Although these models show promise, there’s variability in reliability across different traits and methods. On average, both self-report and interviewer-report models showed similar test–retest reliability. By trait, the highest reliability for self-report models was emotional stability, and the highest reliability for interviewer-report models was extraversion and conscientiousness.
Validity: Overall, the AVI assessments of personality had better validity when trained on interviewer observations, rather than self-report assessments, of personality.

Convergent validity (the association between similar constructs/variables) varied based on the trait being measured and the model used. Among models trained on self-reports, the evidence was mixed; however, interviewer-reported models generally had superior convergence, indicating a significant association between AVI personality assessments and interviewer-rated personality traits.
Linking traits to academic outcomes: In the student samples, the AVI assessment of personality traits correlated with academic outcomes consistent with previous research findings on the relationship between personality and academic performance. For example, conscientiousness was positively correlated with high school GPA, SAT, and ACT scores. Further, the AVI scores often provided incremental validity beyond self- and interviewer-reported traits.

Most models trained on interviewer reports favored verbal behavior (i.e., length of responses) over paraverbal (i.e., pitch, loudness) or nonverbal behavior (i.e., facial expressions and head pose). The models included theoretically relevant indicators (i.e., predictors that are regarded as being informative for a specific underlying personality trait) across multiple samples, which explains why the models remained valid when applied to new interview questions. For example, the AVIs from Samples 1–3 judged interviewees as more extraverted for using a higher volume, speech rate, and more smiles. The three AVIs judged interviewees as more conscientious for using longer words and fewer assent words (e.g., “OK,” “yes”), as well as more agreeable for talking about helping people.

Contributions and Practical Implications

This study contributes to bridging the practice–research gap in our understanding of AVIs, bringing empirical scrutiny to an area that has seen widespread adoption by organizations without corresponding scientific evaluation, standards, or best practices. The results of this study indicated that AVI assessments of personality had stronger construct validity (the extent to which a measure accurately represents and assesses the theoretical trait or theoretical concept it is supposed to measure) and generalized to new interview questions when the machine learning models were trained with interviewer reports, rather than self-reports, of personality. This key finding has several direct implications for the world of HR, recruitment, and selection:

Although this study provides initial evidence that AVI-PAs can be valid for some traits, the evidence is mixed. Thus, organizations should proceed cautiously with using AVI-PA. The authors of the study recommended that AVIs be developed to assess more visible traits, such as agreeableness and extraversion, to enhance the availability of relevant cues during the implementation of AVIs.
If the goal of using AVI-PAs is to overcome the limitations of using self-reported personality traits in selection, then AVI-PA machine learning models should be developed to predict interviewer reports, rather than self-reports, of personality.
Although standardizing interview questions might enhance psychometric properties, the flexibility of using various questions might still be justifiable. The study results suggested that the psychometric properties of AVI-PAs are relatively consistent (for some traits) when models trained on one set of questions are used to assess interviewees who were asked a different set of questions.

Limitations to Consider

The conclusions of this study should be considered in light of a few limitations. One is that the study did not examine the relationship between AVI-PA and work-relevant outcomes, such as job performance, a key criterion in personnel selection. It also needs to be recognized that AVI-PA may not be appropriate for every job role; HR managers would need to ensure that personality traits are relevant criteria for any given job based on job analysis. This study also did not address issues of potential bias. Future research is needed to investigate whether AVI-PA results in any adverse impact on underrepresented demographic groups to ensure the legality and ethicality of using AVI-PA. As organizations become more reliant on AVIs, they must ensure the tools used are not only reliable and valid but also fair, unbiased, and transparent. The current investigation provides a preliminary illustration of how AVIs can be developed and validated; however, this study relied primarily on student samples in mock interviews. Thus, additional research that includes actual interviewees being considered for job openings in organizations with established selection criteria is needed.

For more details, please read the full article:

Hickman, L., Bosch, N., Ng, V., Saef, R., Tay, L., & Woo, S. E. (2022). Automated video interview personality assessments: Reliability, validity, and generalizability investigations. Journal of Applied Psychology, 107(8), 1323–1351.

Andrei Ion, Lauren Moran, and Maria Kraimer were members of the 2023–2024 Scientific Affairs Committee, ; ChatGPT was used to develop the preliminary draft.

697 Rate this article:

5.0

Research Brief of Hickman et al.’s (2022) Automated Video Interview Personality Assessments…

Andrei Ion, Lauren Moran, & Maria Kraimer

Comments are only visible to subscribers.

Categories