Jenny Baker / Monday, March 30, 2020

/ Categories: 574

Opening Up: Replication Crisis...or Opportunities and Challenges?

Christopher M. Castille, Nicholls State University

Welcome to your second class on open science in I-O psych! Today, I’ll discuss key questions: What are questionable research practices? What do replication failures in areas adjacent to I-O psychology mean for our field? What do we want open science to be about? I do not promise to offer full answers to these questions here. Still, my hope is that my sensemaking (a) taps into some dissatisfaction with the status quo and (b) leaves you wanting to make meaningful changes in the right direction—however small.

To get things started, I’d like to share details from an ongoing conversation with a fellow junior social scientist at my home institution concerning the replication crisis. At the beginning of the semester, like most universities, we held a convocation, giving the faculty a chance to catch up with one another after the long winter break. There, a conversation sprung up with a colleague in the College of Business Administration—an economist—about the replication crisis and the need for open science. He recently picked up Susannah Cahalan’s The Great Pretender (Cahalan, 2019), which chronicles an investigation into a well-known study in psychology published in the eminent journal Science: David Rosenhan’s “On Being Sane in Insane Places.” This study into what has become known as pseudopatients highlighted how eight psychologically healthy individuals feigned mental illness in order to be admitted to a psychiatric facility. Once admitted, these pseudopatients revealed they were actually sane. Subsequently, none were allowed to leave, according to Rosenhan. Rather, as Rosenhan reported, the claims were seen through the lens of mental illness, which was then used to justify their retention. Rosenhan’s work was pivotal; it influenced the third edition of the Diagnostic and Statistical Manual for Mental Disorders. Cahalan’s book reveals details about Rosenhan’s methods, some of which might fall under the broad umbrella term of “questionable research practices” (QRPs), that ultimately call into question his contribution.

What Are “Questionable Research Practices” (QRPs)?

QRPs are often subtle practices such as excluding data or changing analytical strategies until desired results are achieved and failing to report these activities (John et al., 2012). QRPs can involve analyzing one’s data in a multitude of ways until you can have a clean and coherent story to tell using conventional statistical criteria (e.g., a p value less than .05; indices that indicate acceptable model-data fit in structural equation models). QRPs can also include dropping observations (e.g., outliers), dropping conditions that did not work out in an experiment, post-hoc inclusion or exclusion of control variables, switching outcome measures, dropping unsupported hypotheses, and stopping data collection once a p value less than .05 has been attained. These practices are also commonly referred to as p-hacking, data dredging, researcher degrees of freedom, traversing the garden of forking paths, or going on fishing expeditions (Gelman & Loken, 2013; Wicherts et al., 2016). Essentially, exploratory data analysis—useful for hypothesis generation—is recast as confirmatory, violating a fundamental distinction crucial to scientific advancement (Kerr, 1998).

Highlighting p-hacking in our own literature, O’Boyle and colleagues (2019) examined the prevalence of outcome reporting bias in moderated multiple regression (MMR) analyses, which are popular approaches often paired with small effect sizes and insufficient statistical power to detect these effects (e.g., Murphy & Russell, 2016). In examining the distribution of p values from published MMR analysis, there was a substantial spike in values just below the .05 threshold, suggesting that p-hacking or more clearly fraudulent behavior can explain this phenomenon (e.g., “rounding down” to a p value). Focusing in on when unsupported hypotheses are dropped, O’Boyle et al. (2017) found that the ratio of supported to unsupported hypotheses more than doubled (.82 to 1.00 versus 1.94 to 1.00) because (a) statistically nonsignificant hypotheses were subsequently dropped, (b) statistically significant hypotheses were added, and (c) data were altered. As another example, in examining the literature on QRPs in confirmatory factor analysis in top-tier management literature (e.g., misreporting model data fit statistics, reporting mathematically impossible findings), Crede and Harms (2019) estimated a high base rate of QRPs (> 90%).

Though these figures paint a picture of rampant QRP behavior in the field, it should be kept in mind that although some reports suggest QRPs are widespread and incredibly common (e.g., ~90% of scholars, see Banks et al., 2016; Bedeain et al., 2010; Crede & Harms, 2019), other scholars provide evidence that QRPs are less prevalent (see Fiedler & Schwarz, 2015) and may be explained by relatively few authors engaging in a large number of QRPs (Brainard & You, 2018). Also, though it may be tempting to point to bad actors, the vast majority of individuals probably want science to produce credible insights. Unfortunately, novelty and clean narratives can be favored over transparency, leading to bad science being “naturally” selected by the system (see Smaldino & McElreath, 2016).

One aim of open science is to eliminate QRPs. Ideally, researchers generate and specify hypotheses, design a study to test these hypotheses, collect data, analyze and interpret the data, and then publish the results in a transparent manner. This is the hypothetico-deductive approach to scientific advancement. Scientific inquiry adds to our knowledge base when scientific norms privilege methods over results (see Merton, 1973). However, when findings are privileged over methods, QRPs emerge distorting the scientific record (see Figure 1; adapted from Chambers, 2019).

Figure 1: Questionable research practices prevent the hypothetico-deductive model from working
Note: Though adapted with permission from Chambers (2019), it is worth noting that his (and much open science work) adopts a CC0 license, allowing others to build off of prior work without permission.

I think we can all agree that QRPs—however prevalent—harm the scientific record (Nelson et al., 2018) and ought to be nullified. As noted by Aguinis et al. (2018), without transparent reporting of study findings, we cannot know whether what we are reading is true, which is one reason why the open science movement is so popular: It fundamentally represents a push to change the system into one that allows scholars to tell it like it is (Tell it like it is, 2020). When we lose access to strategies that have been tried in the past, we cannot act on all relevant sources of information because the literature is biased towards positives, many of which are false (Nelson et al., 2018). What seems needed, though, are tactics that allow authors to be transparent without harming their own careers.

How Much of Psychology Is True?

Returning to my conversation with my colleague, we discussed a topic of mutual interest, namely, our respective discipline’s unique contributions to the social sciences. His reading of Cahallan’s The Great Pretender prompted him to ask me a question that has been asked time and time again in both academia and in the public writ large: “Is anything in psychology true?”

The overall aim of my response was to keep the baby while throwing out the bathwater. I started with the bathwater and validated his observation. Yes, there are bad actors within any given scientific discipline who engage in outright fraud and deception. Examples in psychology also include Diederik Stapel (see Chambers, 2017) and perhaps even Hans Eysenck (Retraction Watch, 2020). Examples exist elsewhere though; an example outside of but adjacent to I-O psychology (i.e., strategic management) is Ulrich Lictenthaler (see Tourish, 2019). Also, although retractions across the sciences have been growing over time (Fanelli, 2013), they have leveled off (Brainard & You, 2018). Less dramatic and more common though are cases of QRPs. Notable cases here include Amy Cuddy’s work on power posing (see Crede, 2019) and Fred Walumbwa’s work in the authentic leadership (for a summary, see Tourish, 2019).

I also tried to put us both in similar boats. As social sciences, economics and psychology have similar issues that mirror broader trends. Positive results appear quite frequently across all of the sciences yet appear more frequently across the social sciences, including economics, psychiatry, and business (see Fanelli, 2010, 2012). Although all scientific disciplines favor positive results, such confirmation bias appears particularly pronounced in the social sciences (see Fanelli, 2010). Indeed, Kepes and McDaniel (2015) pointed out how most hypotheses in the I-O psychology literature are confirmed. One interpretation of these findings is that we are reaching omniscience! More likely it would seem is that negative results are consistently suppressed, a point long established in the literature (i.e., the file-drawer problem or publication bias, see Rosenthal, 1979). Also, though psychology has been charged with using statistically underpowered studies and publishing inflated effects (see Fraley & Vazire, 2014; Smaldino & McElreath, 2016; for an example from I-O psychology, see O’Boyle et al., 2017), so too has economics been charged (Ioannidis et al., 2017). Though replicated effects in psychology are generally lower than initial estimates (e.g., Camerer et al., 2018), the same appears to be true in economics (see Camerer et al., 2016; Chang & Li, 2015). In other words, replicability in the social sciences is far from guaranteed (see Camerer et al., 2018; Many Labs 1, Klein et al., 2014; Many Labs 2, Klein et al., 2018; Many Labs 3, Ebersole et al., 2016; Many Labs 4, Klein et al., 2019; Open Science Collaboration, 2015) with rates varying from 25% (Open Science Collaboration, 2015) to over 75% (Klein et al., 2014).

My colleague quickly suggested that economics is more replicable and reproducible because certain transparency-inducing practices had been normalized in the field (e.g., use of mathematical/formal modeling, using data that were open and accessible to others, etc.). I agreed that these practices are most certainly helpful and not uniform throughout psychology. Indeed, calls for researchers to use formal modeling are common throughout our history (see Meehl, 1978; Marewski & Olsson, 2009). Still, findings produced in the lab may not replicate in the field, and there is some evidence to suggest that I-O psychology looks pretty good when compared to economics and management. In comparing effects detected in the lab to those detected in the field, Mitchell (2012) found that results from I-O psychology most reliably predicted field results. The same may not be true for many areas of economics (see Dubner, 2020). In regard to management, Pfeffer and Fong (2002) pointed out that less than one-third of tools and ideas that companies pay management consultants to use come out of academia and that those that do originate in universities are used less often and abandoned more frequently. Given valid and recurring complaints about the primacy of irrelevant theory in management scholarship (see Antonakis, 2017; Habrick, 2007; Landis & Cortina, 2015; Tourish, 2019), I suspect that things have not gotten better since Pfeffer and Fong made their observation.

I then pointed out that some pockets within the psychological sciences are more replicable than others and shifted the conversation closer to my own research area: personality and individual differences at work. For instance, one recent preregistered, high-powered replication revealed that correlations linking personality traits (i.e., the Big Five) to consequential life outcomes were quite replicable (87% of linkages replicated, though the effects are also weaker than the initial estimates; Soto, 2019). Of course this literature is not flawless; there is evidence of publication bias here as well. For instance, meta-analytic research on the validity of conscientiousness in the prediction of job performance appears to be inflated by roughly 30% (see Kepes & McDaniel, 2015). However, on the whole the individual differences literature appears more replicable by comparison to other areas of psychology, such as social psychology.

Does it matter that some areas are more replicable than others? Consider the following thought experiment: If nearly 100% of studies were, indeed, replicated in any area (Gilbert et al., 2016 suggest this possibility), could we still point out problems with our literature? Could replicability be too high? Possibly. As Bryan Nosek, who directs the Center for Open Science put it: “Achieveing 100% reproducibility on initial findings would mean that we are being too conservative and not pushing the envelope hard enough” (see Owens, 2018). For instance, in personality and individual differences, the content validity of our measures is usually built in, and so failures to find correlations linking (for instance) conscientiousness to rule-following behavior would, therefore, be unusual. However, in social and organizational psychology, bigger and bolder ideas are considered, and these often come with many auxiliary hypotheses that make it difficult to render a claim falsifiable (see Landy et al., 2020; Świątkowski & Dompnier, 2017). Replication failures could be due to a variety of issues, such as a deficit of expertise. Indeed, in re-examining the Open Science Collaboration studies that estimated a rate of ~40% replicability, Gilbert et al. (2016) found that replicability improves to over 60% when samples and procedures employed by original authors are closely followed.

Does expertise guarantee replicability? A recent study from the open science community (i.e., Many Labs 4; Klein et al., 2019) sought to test experimentally whether expertise improved replicability in a specific theoretical domain: terror-management theory (TMT; Greenberg et al., 1994). TMT posits that human beings have evolved unconscious defense mechanisms to cope with a unique awareness of the finite nature of existence and the inevitability as well as the unpredictability of death. A central claim is this: Making mortality salient (e.g., asking someone to reflect on their own death) promotes behavior aimed at defending one’s cultural worldview. The empirical support for a moderate to strong mortality salience effect (r = .35) may be considered conclusive, coming from 277 experiments (164 published articles; see Burke et al., 2010) conducted over the past few decades and involving thousands of participants. Notably, Burke et al.’s prior meta-analysis used various methods to refute publication bias (i.e., funnel plots, fail-safe N, see Burke et al., 2010). Also notable is that organizational scholars elaborated upon TMT, linking mortality salience to power-seeking behavior in organizations (Belmi & Pfeffer, 2016); workplace aggression, discrimination, and punishment (Stein & Cropanzano, 2011); and prosocial motivations at work (Grant & Wade-Benzoni, 2009).

With Many Labs 4, Klein and colleagues (2019) sought to re-examine the mortality salience hypothesis using a variation of Greenberg et al.’s (1994) design to see if expertise improved replicability. To examine whether expertise mattered, 21 labs were randomly assigned the benefit of working with TMT experts who helped standardize study protocols (i.e., “Author Advised protocol”), which would presumably enhance the quality of the signal detected across sites. These protocols contained subtle details that were believed to be essential for replicating the mortality salience effect (e.g., labs were told to use laid-back research assistants dressed casually to create a relaxed mindset). Labs that did not receive expert protocols created their own based on reviewing the literature only (i.e., in-house protocols). All analyses and code were preregistered.

Most interestingly, Klein et al. (2019) could not reproduce the mortality salience effect in either condition (expert advised or not), posing an important challenge to TMT theory. Of particular note, Klein et al. (2019) point out that because the mortality salience effect could not be detected, they could not examine whether expertise mattered. Klein et al. (2019) suggest that a boundary condition—unknown by TMT experts—exists that could produce the null effect and should be incorporated into TMT theory. Klein et al. (2019) are also careful to point out that their null finding does not overturn the sum whole of the TMT literature. However, if expert advice on central findings prove insufficient to ensure the replicability of a foundational theoretical claim, then there is a great deal of work to be done to ensure that claims are robust and, for our purposes, worth drawing on from the standpoint of organizational scholarship (see also Srivastava, Tullett, & Vazire, 2020). Also, this study raises the possibility that what many of us think of as conclusive scientific evidence, a meta-analysis of experiments, may be less informative than we would like it to be, particularly when the literature contains false positives. Indeed, Nelson et al. (2018), who chronicle what they see as psychology’s renaissance, raise concerns regarding such meta-analytic thinking, arguing that it actually exacerbates false-positive psychology. Now, we should not generalize the findings from Many Labs 4 to the whole of any particular literature—including our own. Yet, prior to Many Labs 4, the idea of discounting conclusive scientific evidence from a meta-analysis of experiments would have seemed absurd to me. Now, I’m willing to at least entertain the notion.

What Do We Want the Open Science Movement to Be About?

So where do I stand with my colleague? Our broader conversation about open science is ongoing. There are many principles that make for a robust science (e.g., transparency, relevance, rigor, replicable, etc.; see Grand et al., 2018), and these will pervade our conversation. I do not foresee us discussing how open science practices ought to be adopted out of obligation but rather focusing on simple yet effective tactics that in small part strengthen those Mertonian norms we aspire to as scientists (Merton, 1973). In becoming more open sciences, not only are we playing a role in strengthening our respective fields, we are also boosting the credibility of the social sciences writ large.

I hope that you got something out of my story. Perhaps now you can see ways to have more meaningful conversations with peers about open science (e.g., building from common ground—we’re all figuring science out together), but perhaps you need more particular advice. Perhaps you’ve found errors in another’s work and are unsure how to proceed. You’d like to do the right thing but fear reputational damages. You know that errors are an inevitable part of scientific progress but need more guidance on how to correct the record. If you need guidance and support, then please consider reaching out for help; that is one reason why this column was created. Also, consider taking a look at Dorothy Bishop’s work on navigating fallibilities in science (Bishop, 2017). Her guidance is commonly used and quite instructive.

Next Time on Opening Up

We move away from a discussion centered on putting people in similar boats to a discussion that emphasizes strategies for helping all boats to rise. Fred Oswald and I will share some simple-yet-effective strategies to promote transparency and reproducibility in our work. Furthermore, we’d like to celebrate positive developments in the field. What are the bright spots in our field where open-science practices are being adopted and leading to credible and meaningful insights? Who has found a helpful way to open up that has not received much attention? We’d like to celebrate those victories with future entries in Opening Up.

Addendum–March 6, 2020

After this article went to press, I learned more about the broader discussion on the Many Labs 4 study that should be shared. Specifically, Chatard et al. (2020) pointed out a significant deviation from the Many Labs 4 preregistration. Several studies included in Many Labs 4 did not attain a minimum sample size requirement (40 participants per condition) as specified in the preregistration. Once the results from these labs were excluded, which reduced the number of labs from 21 to 13, Chatard et al. found support for terror management theory particularly in the condition where hidden expertise was utilized. However, Hilgard (2020, February) found a bug in the Many Labs 4 code that, once accounted for, called into question the Chatard et al. correction. Therefore, the original conclusions put forward by Klein et al. (2019) appear to hold.

Although the discussion regarding Many Labs 4 appears to have settled for now, there are notable observations I’d like to make for our purposes. First, it is remarkable how quickly scholars can collaborate to make sense of a finding when data, code, and analysis plans are made transparent. Such work is happening outside of the conventional peer-review process in PsyArxiv and on social media platforms such as Twitter and Facebook. Second, the Chatard et al. (2020) critique raises an interesting point about the value of preregistration. They are correct to point out a meaningful deviation from preregistration that was undisclosed (i.e., retaining data that would be excluded if rules were followed). Scholars should clarify whether deviations are made, if any, and why. However, it is less clear that the deviations Chatard et al. take issue with are necessarily problematic in this case as Klein et al. (2019) utilized meta-analysis to account for sampling error. Had Klein et al. (2019) followed the sample size exclusionary rule they established in the preregistration, they would have introduced the very file drawer problem that meta-analysis addresses. In other words, preregistration, though encouraging transparency, can introduce constraints that—initially established in good faith—are later revealed as unnecessary when a study is underway.

References

Aguinis, H., Ramani, R. S., & Alabduljader, N. (2018). What you see is what you get? Enhancing methodological transparency in management research. Academy of Management Annals, 12(1), 83–110. https://doi.org/10.5465/annals.2016.0011

Antonakis, J. (2017). On doing better science: From thrill of discovery to policy implications. Leadership Quarterly, 28(1), 5–21. https://doi.org/10.1016/j.leaqua.2017.01.006

Banks, G. C., Rogelberg, S. G., Woznyj, H. M., Landis, R. S., & Rupp, D. E. (2016). Editorial: Evidence on questionable research practices: The good, the bad, and the ugly. Journal of Business and Psychology, 31(3), 323–338. https://doi.org/10.1007/s10869-016-9456-7

Bedeian, A. G., Taylor, S. G., & Miller, A. N. (2010). Management science on the credibility bubble: Cardinal sins and various misdemeanors. Academy of Management Learning & Education, 9(4), 715-725.

Belmi, P., & Pfeffer, J. (2016). Power and death: Mortality salience increases power seeking while feeling powerful reduces death anxiety. Journal of Applied Psychology, 101(5), 702–720. https://doi.org/10.1037/apl0000076

Bishop, D. V. (2017). Fallibility in science: Responding to errors in the work of oneself and others [Preprint]. San Diego, CA: PeerJ Preprints. https://doi.org/10.7287/peerj.preprints.3486v1

Brainard, J., & You, J. (October, 2018). What a massive database of retracted papers reveals about science publishing’s “death penalty.” American Association for the Advancement of Science (Science). https://www.sciencemag.org/news/2018/10/what-massive-database-retracted-papers-reveals-about-science-publishing-s-death-penalty

Burke, B. L., Martens, A., & Faucher, E. H. (2010). Two decades of terror management theory: A meta-analysis of mortality salience research. Personality and Social Psychology Review, 14(2), 155–195. https://doi.org/10.1177/1088868309352321

Cahalan, Susannah. (2020). The great pretender: The undercover mission that changed our understanding of madness. New York: Grand Central Publishing.

Camerer, C. F., Dreber, A., Forsell, E., Ho, T.-H., Huber, J., Johannesson, M., Kirchler, M., Almenberg, J., Altmejd, A., Chan, T., Heikensten, E., Holzmeister, F., Imai, T., Isaksson, S., Nave, G., Pfeiffer, T., Razen, M., & Wu, H. (2016). Evaluating replicability of laboratory experiments in economics. Science, 351(6280), 1433–1436. https://doi.org/10.1126/science.aaf0918

Camerer, C. F., Dreber, A., Holzmeister, F., Ho, T.-H., Huber, J., Johannesson, M., Kirchler, M., Nave, G., Nosek, B. A., Pfeiffer, T., Altmejd, A., Buttrick, N., Chan, T., Chen, Y., Forsell, E., Gampa, A., Heikensten, E., Hummer, L., Imai, T., … Wu, H. (2018). Evaluating the replicability of social science experiments in nature and science between 2010 and 2015. Nature Human Behaviour, 2, 637–644. https://doi.org/10.1038/s41562-018-0399-z

Chambers, C. (2017). The seven deadly sins of psychology a manifesto for reforming the culture of scientific practice. Princeton, NJ: Princeton University Press.

Chambers, C. (2019). Registered reports: Concept and application PhD workshop. https://osf.io/z6xqr/

Chang, A. C., & Li, P. (2015). Is economics research replicable? Sixty published papers from thirteen journals say “usually not.” Finance and Economics Discussion Series, 2015(83), 1–26. https://doi.org/10.17016/FEDS.2015.083

Chatard, A., Hirschberger, G., & Pyszczynski, T. (2020). A word of caution about Many Labs 4: If you fail to follow your preregistered plan, you may fail to find a real effect [Preprint]. PsyArXiv. https://doi.org/10.31234/osf.io/ejubn

Crede, M. (2019). A negative effect of a contractive pose is not evidence for the positive effect of an expansive Pose: Comment on Cuddy, Schultz, and Fosse (2018). (SSRN Scholarly Paper ID 3198470). Social Science Research Network. https://papers.ssrn.com/abstract=3198470

Crede, M., & Harms, P. D. (2019). Questionable research practices when using confirmatory factor analysis. Journal of Managerial Psychology 34(1), 18-30.

Dubner, S. (2020, February 12). Policymaking is not a science (yet) (Ep. 405). Freakonomics. http://freakonomics.com/podcast/scalability/

Ebersole, C. R., Atherton, O. E., Belanger, A. L., Skulborstad, H. M., Allen, J. M., Banks, J. B., Baranski, E., Bernstein, M. J., Bonfiglio, D. B. V., Boucher, L., Brown, E. R., Budiman, N. I., Cairo, A. H., Capaldi, C. A., Chartier, C. R., Chung, J. M., Cicero, D. C., Coleman, J. A., Conway, J. G., … Nosek, B. A. (2016). Many Labs 3: Evaluating participant pool quality across the academic semester via replication. Journal of Experimental Social Psychology, 67, 68–82. https://doi.org/10.1016/j.jesp.2015.10.012

Fanelli, D. (2010). “Positive” results increase down the hierarchy of the sciences. PLoS ONE 5(4), e10068. https://doi.org/10.1371/journal.pone.0010068.

Fanelli, D. (2012). Negative results are disappearing from most disciplines and countries. Scientometrics 90, 891–904. https://doi.org/10.1007/s11192-011-0494-7

Fanelli, D. (2013). Why growing retractions are (mostly) a good sign. PLOS Medicine, 10(12), e1001563. https://doi.org/10.1371/journal.pmed.1001563

Fiedler, K., & Schwarz, N. (2015). Questionable research practices revisited. Social Psychological and Personality Science, 7(1), 45–52. https://doi.org/10.1177/1948550615612150

Fraley, R. C., & Vazire, S. (2014). The N-Pact factor: Evaluating the quality of empirical journals with respect to sample size and statistical power. PLOS ONE, 9(10), e109019. https://doi.org/10.1371/journal.pone.0109019

Gelman, A., & Loken, E. (2013). The garden of forking paths: Why multiple comparisons can be a problem, even when there is no “ﬁshing expedition” or “p-hacking” and the research hypothesis was posited ahead of time. http://www.stat.columbia.edu/~gelman/research/unpublished/p_hacking.pdf

Gilbert, D., King, G., Pettigrew, S., & Wilson, T. (2016). Comment on "Estimating the reproducibility of psychological science." Science, 351(6277),1037a-1037b.

Grand, J. A., Rogelberg, S. G., Allen, T. D., Landis, R. S., Reynolds, D. H., Scott, J. C., Tonidandel, S., & Truxillo, D. M. (2018). A systems-based approach to fostering robust science in industrial-organizational psychology. Industrial and Organizational Psychology, 11(01), 4–42. https://doi.org/10.1017/iop.2017.55

Grant, A. M., & Wade-Benzoni, K. A. (2009). The hot and cool of death awareness at work: Mortality cues, aging, and self-protective and prosocial motivations. Academy of Management Review, 34(4), 600–622.

Greenberg, J., Pyszczynski, T., Solomon, S., Simon, L., & Breus, M. (1994). Role of consciousness and accessibility of death-related thoughts in mortality salience effects. Journal of Personality and Social Psychology, 67(4), 627–637.

Hambrick, D. (2007). The field of management’s devotion to theory: Too much of a good thing? Academy of Management Journal, 50(6), 1346–1352. https://doi.org/10.5465/amj.2007.281661194

Hilgard, J. (2020, February 7). Found a bug in Many Labs 4 exclusion rules 2 and 3 that excluded some datapoints from the summary stats while keeping them in the sample size. [Tweet]. Twitter. https://twitter.com/JoeHilgard/status/1225905555123863552

Ioannidis, J. P. A., Stanley, T. D., & Doucouliagos, H. (2017). The power of bias in economics research. Economic Journal, 127(605), F236–F265. https://doi.org/10.1111/ecoj.12461

John, L. K., Loewenstein, G., & Prelec, D. (2012). Measuring the prevalence of questionable research practices with incentives for truth telling. Psychological Science, 23(5), 524–532. https://doi.org/10.1177/0956797611430953

Kepes, S., & McDaniel, M. A. (2015). The validity of conscientiousness is overestimated in the prediction of job performance. PLOS ONE, 10(10), e0141468. https://doi.org/10.1371/journal.pone.0141468

Kerr, N. L. (1998). HARKing: Hypothesizing after the results are known. Personality and Social Psychology Review, 2(3), 196–217. https://doi.org/10.1207/s15327957pspr0203_4

Klein, R. A., Cook, C. L., Ebersole, C. R., Vitiello, C. A., Nosek, B. A., Chartier, C. R., Christopherson, C. D., Clay, S., Collisson, B., Crawford, J., Cromar, R., Dudley, D., Gardiner, G., Gosnell, C., Grahe, J. E., Hall, C., Joy-Gaba, J. A., Legg, A. M., Levitan, C., … Ratliff, K. A. (2019). Many Labs 4: Failure to replicate mortality salience effect with and without original author involvement [Preprint]. PsyArXiv. https://doi.org/10.31234/osf.io/vef2c

Klein, R. A., Ratliff, K. A., Vianello, M., Adams, R. B., Bahník, Š., Bernstein, M. J., Bocian, K., Brandt, M. J., Brooks, B., Brumbaugh, C. C., Cemalcilar, Z., Chandler, J., Cheong, W., Davis, W. E., Devos, T., Eisner, M., Frankowska, N., Furrow, D., Galliani, E. M., … Nosek, B. A. (2014). Investigating variation in replicability: A “many labs” replication project. Social Psychology, 45(3), 142–152. https://doi.org/10.1027/1864-9335/a000178

Klein, R. A., Vianello, M., Hasselman, F., Adams, B. G., Adams, R. B., Alper, S., Aveyard, M., Axt, J. R., Babalola, M. T., Bahník, Š., Batra, R., Berkics, M., Bernstein, M. J., Berry, D. R., Bialobrzeska, O., Binan, E. D., Bocian, K., Brandt, M. J., Busching, R., … Nosek, B. A. (2018). Many Labs 2: Investigating variation in replicability across samples and settings. Advances in Methods and Practices in Psychological Science, 1(4), 443–490. https://doi.org/10.1177/2515245918810225

Landis, R. S., & Cortina, J. M. (2015). Is ours a hard science (and do we care)? In C. E. Lance, & R. J. Vandenberg (Eds.), More statistical and methodological myths and urban legends (pp. 9–35). New York, NY: Routledge.

Landy, J. F., Jia, M. (Liam), Ding, I. L., Viganola, D., Tierney, W., Dreber, A., Johannesson, M., Pfeiffer, T., Ebersole, C. R., Gronau, Q. F., Ly, A., van den Bergh, D., Marsman, M., Derks, K., Wagenmakers, E.-J., Proctor, A., Bartels, D. M., Bauman, C. W., Brady, W. J., … The Crowdsourcing Hypothesis Tests Collaboration. (2020). Crowdsourcing hypothesis tests: Making transparent how design choices shape research results. Psychological Bulletin. https://doi.org/10.1037/bul0000220

Marewski, J. N., & Olsson, H. (2009). Beyond the null ritual: Formal modeling of psychological processes. Zeitschrift Für Psychologie/Journal of Psychology, 217(1), 49–60. https://doi.org/10.1027/0044-3409.217.1.49

Meehl, P. E. (1978). Theoretical risks and tabular asterisks: Sir Karl, Sir Ronald, and the slow progress of soft psychology. Journal of Consulting and Clinical Psychology, 46, 806–834.

Merton, Robert K. (ed.). (1973). The sociology of science: Theoretical and empirical investigations. Chicago: University of Chicago Press.

Mitchell, G. (2012). Revisiting truth or triviality: The external validity of research in the psychological laboratory. Perspectives on Psychological Science, 7(2), 109–117. https://doi.org/10.1177/1745691611432343

Murphy, K. R., & Russell, C. J. (2016). Mend it or end it: Redirecting the search for interactions in the organizational sciences. Organizational Research Methods, 20(4), 549–573. https://doi.org/10.1177/1094428115625322

Nelson, L. D., Simmons, J., & Simonsohn, U. (2018). Psychology’s renaissance. Annual Review of Psychology, 69(1), 511–534. https://doi.org/10.1146/annurev-psych-122216-011836

O’Boyle, E. H., Banks, G. C., Carter, K., Walter, S., & Yuan, Z. (2019). A 20-year review of outcome reporting bias in moderated multiple regression. Journal of Business and Psychology, 34, 19–37. https://doi.org/10.1007/s10869-018-9539-8

O’Boyle, E. H., Banks, G. C., & Gonzalez-Mulé, E. (2017). The Chrysalis effect: How ugly initial results metamorphosize into beautiful articles. Journal of Management, 43(2), 376–399. https://doi.org/10.1177/0149206314527133

Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349(6251), aac4716–aac4716. https://doi.org/10.1126/science.aac4716

Owens, B. (2018). Replication failures in psychology not due to differences in study populations. Nature. https://doi.org/10.1038/d41586-018-07474-y

Pfeffer, J., & Fong, C. T. (2002). The end of business schools? Less success than meets the eye. Academy of Management Learning & Education, 1(1), 78–95. https://doi.org/10.5465/amle.2002.7373679

Retraction Watch. (2020, February). Journals retract 13 papers by Hans Eysenck, flag 61, some 60 years old – Retraction Watch. https://retractionwatch.com/2020/02/12/journals-retract-three-papers-by-hans-eysenck-flag-18-some-60-years-ol

Rosenthal, R. (1979). The file drawer problem and tolerance for null results. Psychological Bulletin, 86(3), 638–641. https://doi.org/10.1037/0033-2909.86.3.638

Smaldino P. E., & McElreath R. (2016). The natural selection of bad science. Royal Society Open Science, 3, 160384. http://0-dx-doi-org.library.alliant.edu/10.1098/rsos.160384

Srivastava, S., Tullett, A., & Vazire, S. (2020, January). The expertise of death [Audio podcast episode]. The Black Goat. https://www.stitcher.com/s?eid=66950641

Stein, J. H., & Cropanzano, R. (2011). Death awareness and organizational behavior. Journal of Organizational Behavior, 32(8), 1189–1193. https://doi.org/10.1002/job.715

Soto, C. J. (2019). How replicable are links between personality traits and consequential life outcomes? The life outcomes of personality replication project. Psychological Science, 30(5), 711–727. https://doi.org/10.1177/0956797619831612

Świątkowski, W., & Dompnier, B. (2017). Replicability crisis in social psychology: Looking at the past to find new pathways for the future. International Review of Social Psychology, 30(1), 111. https://doi.org/10.5334/irsp.66

Tell it like it is. (2020). Nature Human Behaviour, 4(1), 1–1. https://doi.org/10.1038/s41562-020-0818-9

Tourish, D. (2019). Management studies in crisis: Fraud, deception and meaningless research. Cambridge, UK: Cambridge University Press.

Wicherts, J. M., Veldkamp, C. L. S., Augusteijn, H. E. M., Bakker, M., van Aert, R. C. M.,& van Assen, M. A. L. M. (2016). Degrees of freedom in planning, running, analyzing, and reporting psychological studies: A checklist to avoid p-hacking. Frontiers in Psychology, 7, 1832. doi: 10.3389/fpsyg2016.01832

3082 Rate this article:

No rating

Opening Up: Replication Crisis...or Opportunities and Challenges?

Christopher M. Castille, Nicholls State University

Comments are only visible to subscribers.

Categories