Four Interpretations of a Correlation Coefficient: Expectancies, Vector Angles, Scatter Plots, and Slopes
Jeffrey Cucina and Julia Berger
Note. The views expressed in this paper are those of the authors and do not necessarily reflect the views of U.S. Customs and Border Protection or the U.S. Federal Government.
Generally, in statistics, the relationship between two variables, x and y, is represented by a Pearson product-moment correlation coefficient (r). However, it can be challenging for nontechnically savvy audiences to interpret the coefficient without extensive statistical knowledge. I-O practitioners often find themselves in situations where they have to deliver the results of correlational analyses to key stakeholders (e.g., company executives, board members, clients) in a nontechnical way. When presenting to senior leadership the results of the criterion-related validation study of a newly developed personnel selection test, I-O practitioners some-times discuss either the magnitude of the correlation coefficient (r; which in the 16 years of the authors’ combined practical experience has rarely made an impact) or a coefficient of deter-mination derived by squaring r. I-O academics face a similar situation when explaining the magnitudes of correlation coefficients to new students and individuals without statistical knowledge. In fact, the issue of communicating I-O findings to outside audiences has gained enough traction to merit a creation of a new column in TIP, Lost in Translation (Litano & Collmus, 2016). It is a common practice to report r2. Although widespread, this approach is mis-leading, because it may limit the interpretability of the statistic (Ozer, 1985; Schmidt & Hunter, 2014). Thus, neither of the aforementioned methods are effective in answering the bottom line question: Can the value of x predict the value of y?
Although several methods for depicting the x-y relationship exist and new methods are beginning to emerge (i.e., Icon Array method; Zhang, 2016), practitioners often use expectancy charts only. An expectancy chart presents the probability that individuals with a specific range of test scores will be successful performers. For example, it might be stated that individuals who score in the top 10% on a test have a 32% chance of being in the top 10% on a perfor-mance measure (this occurs when both variables are normally distributed and have an inter-correlations of .50; Taylor & Russell, 1939). Expectancies have the advantage of readily provid-ing information on the relationship between two variables without requiring sophisticated sta-tistical training (beyond the comprehension of percentages). Currently, there are two main ap-proaches for constructing expectancy charts. The first approach is to use Taylor-Russell tables to obtain expectancies (Taylor & Russell, 1939). The second approach is to use a raw dataset to compute expectancies. Both approaches have limitations (for a full discussion, please refer to Cucina, Berger, and Busciglio [2016, April]). Cucina et al. have proposed a novel methodology for computing expectancies called the bivariate-normal distribution approach.1
To complement the aforementioned methodology, we have assembled four alternative ways to visually depict correlations (see Figure 1). First, expectancy values are shown for differ-ent correlation magnitudes. For example, when the correlation between a predictor (e.g., a test score) and a criterion (e.g., job performance) is .5, we can expect that a top scoring applicant (in the top 25%) has a higher probability (48%) to be an excellent performer (in the top 25%) compared to a lower scoring counterpart (in the bottom 25%) whose probability is only 7%. Second, the magnitudes of the correlations are depicted by the angles of the vectors represent-ing the two variables, which decrease as the correlation increases and the individual differ-ences for the variables become more similar. Vector angles are commonly used in physics, en-gineering, and aviation; however, variables can also be depicted as angles. Third, the magni-tudes are depicted using scatter plots for 100 randomly generated cases. As the scores on x and y start forming a straight line, the correlation becomes stronger and begins to approach 1. Fi-nally, the magnitudes are depicted using the slopes of the regression line between the two var-iables.
As can be seen in Figure 1, a correlation of .50, despite only accounting for 25% of the
variance in y and only being at the halfway point on the positive side of the correlational scale, results in a near doubling of the number of superior performers and a four-fold decrease in the number of low performers. We suggest that pieces of this figure be presented to nontechnical audiences when explaining the results of a research study or teaching the concepts of correla-tional analyses to new students. For example, a finding that a measure of employee engage-ment correlates .30 with unit-level performance could be explained using the expectancy val-ues of 38% and 14%, the scatter plot, the slope, or the vector angles. Reliability coefficients could also be interpreted using this chart by converting the reliability coefficient to its square root, the reliability index (i.e., the correlation of the observed score with the true score). Any psychological measure with a reliability of .80 or higher can be roughly interpreted using the correlation of .9. We hope that at least one of the graphics in Figure 1 will help others more fully understand the magnitude of correlation coefficients and assist with the translation of sta-tistical analyses into business plans. We are unaware of any existing charts that attempt to combine this information.
Figure 1. The top portion of the chart shows the expectancies for the high scorers and the bottom portion shows the expectancies for the low scorers.
Note: Two copies of a full expectancy chart are included for explanatory purposes. The expectancies were obtained using Cucina et al.’s (2016) R syntax, which implements a formulaic approach for determining probabilities based on a bivariate-normal distribution.
Note
For the R script and the step-by-step instructions, please refer to Table 2 in the poster titled “69-4 Cucina, Berger, Busciglio” available in the SIOP Document Library on www.siop.org
References
Cucina, J. M., Berger, J. L., & Busciglio, H. H. (2016). Creating expectancy charts: A new ap-proach. Poster presented at the 31st Annual Conference of the Society for Industrial and Organizational Psychology, Anaheim, CA.
Litano, M. L., & Collmus, A. B. (2016). Lost in translation: Communicating the value of I-O. The Industrial-Organizational Psychologist, 54(1). Available online at: http://0-www-siop-org.library.alliant.edu/tip/july16/lost.aspx.
Ozer, D. J. (1985). Correlation and coefficient of determination. Psychological Bulletin, 97(2), 307-315.
Schmidt, F. L., & Hunter, J. E. (2014). Methods of meta-analysis: Correcting error and bias in research findings. (3rd ed.). Thousand Oaks, CA: SAGE Publications Inc.
Taylor, H. C., & Russell, J. T. (1939). The relationship of validity coefficients to the practical ef-fectiveness of tests in selection: discussion and tables. Journal of Applied psycholo-gy, 23(5), 565.
Zhang, D. C. (2016). Using icon array as a visual aid for communicating validity infor-mation (Unpublished doctoral dissertation). Bowling Green State University, Bowling Green, OH.