Enhancing Judgment: The Case for Human–Algorithm Collaboration

Jon C. Willford, Edison Electric Institute

Algorithms are increasingly used in assessment methods and technologies. The word algorithm is used colloquially in different ways, but in this context it generally refers to computer programs that utilize complex statistical models to combine tens, hundreds, or even thousands of variables to predict an outcome (e.g. job performance) or automate a process (e.g. eliminate unqualified applicants).

But although algorithms clearly have the potential to increase our ability to make better decisions, industrial-organizational (I-O) psychologists and those in related fields have yet to fully consider how to optimize the collaboration between human decision makers and algorithmic decision aids. Viewing the human–algorithm relationship as a collaboration is fitting because better decision-making outcomes are possible when both are involved rather than when making decisions separately.

A Promising Example of Decision Making in Medicine

Have you ever wondered how accurate doctors and nurses actually are in predicting future patient outcomes? This is what a group of researchers wanted to find out in a recent study investigating medical decision making published in JAMA. It turns out they are pretty, pretty good, but their accuracy depends on things like confidence and the outcome they are trying to predict. However, the researchers also wanted to know how doctor and nurse’s predictions compared to other common diagnostic tools, such as statistical models. So they did something cool. They created a model in which they included the doctor’s and nurse’s predictions with the other variables in the statistical model. What they found is that when including these human predictions in the model, accuracy in predicting mortality outcomes was significantly better than the accuracy of the model alone (Detsky et al., 2017).

A Cautionary Tale of Bias From Recruitment

Although algorithms have the ability to provide superior predictive power (Grove, Zald, Lebow, Snitz, & Nelson, 2000; Kuncel, Klieger, Connelly, & Ones, 2013), by themselves they can be problematic in many ways (e.g. bias, error, etc.). A prominent and recent example is an experimental algorithm created by engineers at Amazon designed to provide recruiters with recommendations for whom to recruit for technical jobs. The algorithm was found to discriminate against women because the data used to train the algorithm’s model relied heavily on data from a previous 10-year period in which these jobs were largely held by men (Weissmann, 2018). The development of the algorithm was promptly discontinued, but had recruiters blindly relied on it, Amazon would been subject to a range of unintended consequences related to discrimination.

Toward a Future of Collaborative Algorithmic Decision Aids

When considering how to optimize the pairing of humans and algorithmic decision aids, we can benefit by considering two shortcomings from previous research investigating how people use algorithms to make judgments and decisions. The first is a lack of attention paid to the qualities of the person that might influence whether they are willing to incorporate algorithmic outputs into their decision-making processes, or whether they will use the outputs appropriately. These qualities include things like knowledge and skills (e.g. statistics, critical thinking), abilities (e.g. numeracy, inductive reasoning), or others like personality and overconfidence.

People are notorious for trusting their own judgment rather than relying on the outputs of algorithms. This has been replicated in research over several decades and has recently been referred to as algorithm aversion (Dietvorst, Simmons, & Massey, 2015). Sources of this tendency that stem from the people who interact with algorithms have not been sufficiently explored. Below are some research questions intended to help address this shortcoming:

Are there individual differences that predict the utilization, or alternatively the rejection, of algorithms?
What knowledge, skills, and abilities are needed to effectively interpret and incorporate the outputs of algorithmic aids into decision-making processes?
What training is necessary for people to have to enhance decision making when collaborating with algorithms?
How are the dynamics of this collaboration different in shared decision-making contexts?

The other shortcoming is a lack of focus on the qualities of the algorithm that affect the likelihood that a person will effectively use it. For example, there are aspects of technology that may enable or prohibit people to interact with algorithms. Questions to guide research and practice aimed at addressing this shortcoming include:

How are qualities of algorithms, such as complexity, type, or transparency, related to human use and prediction accuracy?
How can we work with others from areas such as computer science to enhance the interpretability of algorithms and the relationship between their inputs and outputs?
How can we draw from, or collaborate with others, from areas such as human factors and its subfields (e.g. UX and HCI) to design better interfaces that decision makers can use?

These considerations about how people interact with and use algorithms have direct implications for research, technology design, and technology use in assessment. Currently, this area of exploration is wide open for avenues looking to optimize human–algorithm collaboration in assessment. I-O psychologists are already playing a prominent role in revolutionizing how algorithms are being developed and used in modern assessment and selection methods. As decisions aided by technology become more complex, our expertise continues to be essential in ensuring that algorithms are used to enhance our judgments and decisions, rather than unnecessarily replace or riddle them with (human or algorithm) error or bias.

Have questions or comments? Connect with the author on LinkedIn or email comms@siop.org.

2019 LEC

We hope to see you at the Leading Edge Consortium on October 25-26 where we will continue exploring topics related to the future of assessment. Seats are limited so please be sure to register before the 2019 LEC is sold out!

References

Detsky, M. E., Harhay, M. O., Bayard, D. F., Delman, A. M., Buehler, A. E., Kent, S. A., ... Halpern, S. D. (2017). Discriminative accuracy of physician and nurse predictions for survival and functional outcomes 6 months after an ICU admission. Journal of the American Medical Association, 317(21), 2187-2195.

Dietvorst, B. J., Simmons, J. P., & Massey, C. (2015). Algorithm aversion: People erroneously avoid algorithms after seeing them err. Journal of Experimental Psychology: General, 144(1), 114-126.

Grove, W. M., Zald, D. H., Lebow, B. S., Snitz, B. E., & Nelson, C. (2000). Clinical versus mechanical prediction: A meta-analysis. Psychological Assessment, 12(1), 19–30.

Kuncel, N. R., Klieger, D. M., Connelly, B. S., & Ones, D. S. (2013). Mechanical versus clinical data combination in selection and admissions decisions: A meta-analysis. Journal of Applied Psychology, 98(6), 1060-1072.

Weissmann, J. (2018). Amazon created a hiring tool using A.I. It immediately started discriminating against women. Slate. Retrieved from https://slate.com/business/2018/10/amazon-artificial-intelligence-hiring-discrimination-women.html

***The 2019 Leading Edge Consortium (LEC) team is delighted to present the LEC Blog Series. This year’s LEC is focused on bringing together a diverse group of thought leaders who will explore the technologically evolving state and disruption of the assessment space. We are very excited about the LEC and thought we would start the conversation early this year. This LEC Blog Series is designed to present insights and issues related to the future of assessment. We hope to see you in October!

5852 Rate this article:

5.0

Enhancing Judgment: The Case for Human–Algorithm Collaboration

Jon C. Willford, Edison Electric Institute

2019 LEC

Comments are only visible to subscribers.

Categories