Jenny Baker / Monday, October 5, 2020 / Categories: 582 Opening Up: How Do I Conduct Peer Review With Open Science in Mind? Christopher M. Castille, Nicholls State University; Don Zhang and Rachel Williamson Smith, Louisiana State University The meteoric rise of the open science movement within social sciences has changed the way researchers conduct their research across a wide range of disciplines. A number of prominent journals have introduced new open-science-driven guidelines, which serve to enhance the reproducibility, replicability, and integrity of scientific publications. Before any paper is accepted for publication, however, it must undergo peer review: a long-standing feature of scientific practice (Peters & Ceci, 1982) and widely believed to serve as quality control (Armstrong, 1997; Goldbeck-Wood, 1999; Horrobin, 1990). Scientific journals often consult subject matter experts to serve as reviewers for papers submitted for publication in their journals. Indeed, reviewers often act as “gatekeepers” for scientific publications. Although there is a considerable amount of guidance on how to conduct thorough and rigorous reviews, reviewers and editors are now faced with a new challenge: conducting peer reviews with open science in mind. In this post, we will discuss the role of peer review in an open science era and how reviewers can actively contribute to the open science agenda. With this entry of “Opening Up,” we’ll highlight some interesting data regarding the peer review process, call attention to suggestions from leaders in the open science movement for improving peer review, and point out resources that you can utilize to become a stronger peer reviewer. We’ve also gathered and consolidated some recommendations and tools that have emerged since the open science movement took off. Hopefully, when you have finished reading this entry, you will have some new ideas to take with you as you review more scholarly work. A Selective Review of Research on Peer Reviewing and the Peer Review Process Peer review is expensive, prompting questions regarding cost effectiveness as well as discussion regarding alternative forms of peer review (Nosek & Bar-Anan, 2012). One estimate of the total annual value of volunteer peer review services in terms of time spent reviewing scholarly work was more than $2.5 billion globally (Research Information Network, 2008). Furthermore, reviewers who spend considerable time providing high-quality reviews may place themselves at a disadvantage, as such time could be allocated toward advancing their own research agenda (see Macdonald & Kam, 2007; Tourish & Craig, 2018). How well does peer review identify significant issues with a manuscript? There is plenty of evidence that even diligent reviewers miss key issues with a manuscript (e.g., misreported p values; see Cortina et al., 2017; Crede & Harms, 2019; Schroter et al., 2008; Wicherts et al., 2011). Among reviewers, inter-rater consistency is often low (Bornmann & Daniel, 2010; Marsh & Ball, 1989; Peters & Ceci, 1982; Petty et al., 1999), introducing a substantial amount of chance into the publication process (see Whitehurst, 1984). Of course low inter-rater reliability can be a feature rather than a bug of the peer review process. Given the complexity of the phenomena that we often study, a small number of experts seem unlikely to have all of the information necessary to evaluate every component in an article (Nosek & Bar-Anan, 2012). Associate editors may assign reviewers from multiple sides of an issue in order to get both perspectives or bring in reviewers with complementary expertise (e.g., methods experts, content experts).1 So although high inter-rater reliability may be nice, it is not always the goal. Informational asymmetry—authors knowing more about their work than reviewers (see Bergh & Oswald, 2020)—further complicates reviewing. This can be expected in a system where page lengths and word count requirements force authors to decide which features of a study should be highlighted (Aguinis et al., 2019). Ultimately, peer review may not perfectly guard the scientific record. Unless errors with a manuscript are corrected in some form such as via an addendum by the authors, a commentary on the original article, or (in the worst case) a retraction, the big issue is that they remain in the literature (Nosek & Bar-Anan, 2012). Such imperfections recently motivated a former editor of Social Psychological and Personality Science, Simine Vazire, to argue the peer review process does not—indeed, cannot—serve the function for which it is intended (Vazire, 2020); she argues that there is simply too much for any set of reviewers to know in order to fairly evaluate a submitted manuscript. Reviewer time is a key constraint. She and others (e.g., Nosek & Bar-Anan, 2012) advocate an alternative ecosystem that leverages prepublication (via preprints) and postpublication peer review that can be broadly described as “open reviewing.” Open reviewing occurs as scholars post versions of their work to public repositories (e.g., the Open Science Framework, PsyArxiv) and request reviews or commentary. Unlike the traditional peer review process, open reviewing, which allows reviews to be identified, helps reviewers to gain a reputation for being good reviewers. Although such identification may strike some as a cause for backlash (see Zhang et al., 2020), early evidence suggests that it may have promising features. A randomized trial of blind versus open review found no difference in the rated review quality or the ultimate publication recommendations (van Rooyen et al., 1999). Open reviewing also overcomes an incentive problem in the peer review process alluded to earlier: namely that there are few incentives to doing high-quality reviews (Macdonald & Kam, 2007; Nosek & Bar-Anan, 2012; Tourish & Craig, 2018). Indeed, as noted by Tourish (2019), data analysis problems require diligent scrutiny by reviewers, and the opportunity cost associated with such diligence is high (see also Macdonald & Kam, 2007). In other words, reviewers are more likely to do “good enough” work: Identify a few easily identifiable issues and then proceed with a more superficial review (Köhler et al., 2020). Last, under an open-reviewing framework, other scholars may use published reviews as a potentially valuable resource, thereby further enhancing the scientific utility of the peer review process. Reviewing a Manuscript Without Embodying Reviewer #2? Whether the peer review process remains as it has traditionally been carried out or evolves into a process more aligned with open science, we believe there is merit to cultivating a robust reviewing skillset. Excellent reviewing helps authors highlight what makes their contribution valuable to the field at large (Köhler et al., 2020). Examples include encouraging authors to, if possible, consider the replicability of their work, or—if multiple studies are included—consider combining data into a mega-analysis or meta-analysis to more rigorously test a claim (see Lakens & Etz, 2017; McShane & Böckenholt, 2017; Schimmack, 2012). Such efforts help the broader scholarly community to identify and then leverage useful ideas. We wish to promote these constructive features to peer review. As any experienced scholar knows, it is not possible to conduct an unflawed study. Decisions must be made to trade off certain strengths for others (McGrath, 1981). Some decisions, such as sampling a diverse set of organizations and occupations, can facilitate generalizing a claim to specific populations or across populations (Pedhazur & Schmelkin, 2013). There are a host of measurement practices that can facilitate precision in control and measurements (see Clifton, 2020; Hancock & Mueller, 2011). Certain design decisions (e.g., using a cross-sectional as opposed to a longitudinal or temporal separation design) can facilitate theory testing or the ruling out of alternative explanations (Spector, 2019). Conducting a study in the lab or field can make a study more or less realistic. Many of these decisions stand in opposition to one another. As we have highlighted with our selective review, there is ample evidence that the peer-reviewing efforts can be improved upon. A recent article titled “Dear Reviewer 2: Go F’ Yourself” captures a prevailing sentiment held by many scholars toward peer reviewers (Peterson, 2020). To quote directly from Peterson: Anyone who has ever submitted a paper to a peer-reviewed outlet knows the reviewers can, occasionally, be unpleasant. While rejection always stings, the belief that a reviewer has either completely missed the point of the manuscript, been overtly hostile in his or her review, or simply held the author to an impossible standard is vexing. The source of this frustration has seemingly become personified in the identity of a single person—Reviewer 2. He (and it is always assumed to be a he) is embodiment of all that we hate about other scholars. Reviewer 2 is dismissive of other people’s work, lazy, belligerent, and smug. (p. 1) One source of such unpleasantness may come from the simple fact that reviewers rarely receive training for conducting quality peer review or even developmental feedback (see Köhler et al., 2020). Fortunately, SIOP, in partnership with the Consortium for the Advancement of Research Methods and Analysis (CARMA), has offered a set of online modules to facilitate reviewer training (http://carmarmep.org/siop-carma-reviewer-series/). The modules center on a proposed competency framework for reviewing (see Köhler et al., 2020). At the narrowest level, these competencies include reviewing with (a) integrity (e.g., acknowledging the limits of our expertise), (b) open mindedness (e.g., doing outside research to better position ourselves for reviewing), (c) constructiveness (e.g., giving actionable advice), (d) thoroughness (e.g., reviewing all sections of a paper), (e) appropriate tone (e.g., being tactful), (f) clear writing (e.g., numbering specific comments), (g) appropriately leveraged expertise (e.g., assessing a paper’s contribution to the field), and (h) appropriate representation (e.g., representing the journal for which you are serving as a gatekeeper). At a higher level, these competencies reflect foundational knowledge, skill, and professionalism. CARMA’s training modules help reviewers recognize counterproductive behaviors, such as encouraging authors to engage in questionable research practices (e.g., dropping hypotheses or hypothesizing after the results are known, asking authors to add hypotheses, or remove unsupported hypotheses without a strong rationale). They can help reviewers see how a well-tested null result can be useful or how inconsistency in findings can happen for systematic (moderator) or random (statistical power) reasons that can stimulate future research or indicate that the reliability of a finding may be more constrained than is recognized (see also Nosek & Errington, 2020). Reviewers will learn how to focus their efforts on helping authorship teams recognize those aspects of their study that would contribute meaningfully to the field and avoid pressuring authors to write a paper that they do not want to write. We strongly encourage anyone seeking to improve their peer review skillset to begin with the free online training provided by CARMA. Additionally, in putting any skills acquired from the CARMA reviewer training into practice, we encourage reviewers to routinely ask for feedback on the quality of their reviews or how they might improve their work. Reviewers are occasionally scored on the quality of their work at journals, which helps editors to promote quality within their journals. Asking for these scores and for developmental feedback can be fruitful for improving one’s reviewing skillset. To add to the collective work of Köhler et al. (2020), CARMA, and SIOP, we highlight some easy-to-adopt checklists that should help reviewers to increase the quality of their reviewing efforts. For instance, Eby et al. (2020) provide a short methodological checklist that can be helpful for reviewers to ensure that submitted works are rigorous, replicable, and are transparent/open. Another checklist is provided by Davis et al. (2018), who offer a broader and more comprehensive checklist that offers advice for promoting robustness and transparency when reviewing psychology manuscripts reporting quantitative empirical research. Their checklist is notable in that it contains advice that is unique to open science publication practices, such as reviewing registered reports or soliciting results-blind reviews. Such advice can be broadly applicable—there is value in reviewing manuscripts as if they are registered reports/results-blind submissions; that is, (a) the literature review and methods are reviewed first, followed by (b) the results and then the discussion, and (c) not allowing the findings to too strongly sway one’s opinion (particularly if the methods are robust). Such a strategy may help a reviewer to place more emphasis on the methods rather than results of a study, which is an overarching theme motivating the open science movement. Additionally, Aguinis et al. (2019) offer a series of checklists pertaining to best practices in data collection and preparation. These lists can help reviewers focus on issues with regard to the type of research design, control variables, sampling procedures, missing-data management, outlier management, the use of corrections for statistical and methodological artifacts, and data transformations. Last, we would like to call attention to the American Psychological Association’s Journal Article Reporting Standards (JARS), which also has checklists online for both quantitative and qualitative research (https://0-apastyle-apa-org.library.alliant.edu/jars). Incorporating any of these checklists into the reviewing process can help scholars to improve the quality of their peer-reviewing efforts. In addition to these checklists, we would like to highlight a few other resources that have proven helpful for common technical aspects of reviewing scholarly work in our field (see Table 1).2 First is StatCheck (see Nuijten et al., 2016). This is a useful tool for quickly scanning a manuscript and identifying misalignments between reported p-values and degrees of freedom for relatively simple statistical tests (e.g., t-tests; see Nuijten et al., 2016). Next is the Granularity-Related Inconsistency of Means (GRIM) test. This is a simple and useful procedure for examining whether the means of Likert-type scales, which are commonly in use in our research, are consistent with the sample size and number of items that comprise the scale (see Brown & Heathers, 2017). A similar assessment can occur with standard deviations via the GRIMMER (Granularity-Related Inconsistency of Means Mapped to Error Repeats) test. For a more in-depth assessment (e.g., reconstructing samples based on reported statistics), the Sample Parameter Reconstruction via Iterative TEchniques (SPRITE; see Heathers et al., 2018) can be used to build plausible data sets using basic summary information about a sample (e.g., the mean, the standard deviation, sample size, and the lower and upper bounds of the range of item values). SPRITE complements GRIM and GRIMMER for detecting inaccuracies in published values. When studies involve categorical data, the DEscriptive BInary Test (DEBIT; see Heathers & Brown, 2019) can be useful. Although these tests are broadly applicable, given that much work in our field involves latent variable modeling with large samples, other approaches (e.g., ensuring that degrees of freedom align with those implied by a proposed model) are often more useful (see Cortina et al., 2017). Table 1 A Set of Rather Basic Tools for Evaluating Statistical Claims With regard to the checklists and tools we’ve highlighted here, we’d like to be clear that they should never be applied too rigidly nor do we wish to imply that identifying more issues necessarily invalidates a claim (see also Aguinis et al., 2019). We are not advocating for reviewers to view their job as a policing effort. Rather, our focus is on developing a more transparent and open peer review process for all parties. The tactics we’ve identified are broad, and although we believe that more items addressed by authors make for better research, we do not believe that the absence of any particular item or set of items has veto power against a claim put forward in a manuscript. Rather, there may be alternative—indeed, in many cases there are—explanations that impinge on the phenomena in question. We should aim to have such validity threats reported honestly and transparently (Aguinis et al., 2020); they may even be framed as rival hypotheses for a future study (see Spector, 2019). Action editors will have to weigh in and decide whether to encourage particular debates or not. As reviewers, we can highlight the merits of publishing papers that do have certain validity threats so long as they are reported honestly and transparently. Concluding Thoughts One author of this manuscript (Chris) recalls one reviewer quoting Winston Churchill’s description of democracy to describe the peer review process: It is like “the worst form of Government except for all those other forms that have been tried from time to time.” Fringe and/or unusual ideas might be rejected, but they often find their way into the scholarly community. There are several instances of Nobel-Prize-winning ideas failing an initial peer review process at a journal (e.g., Peter Higgs’ seminal work on the Higgs model, Hans Krebs’s work on the Krebs Cycle). Despite such imperfections and regardless of whether the process changes, we as scholars can still take it upon ourselves to improve the quality of our work as reviewers. To that end, we hope that the literature we’ve highlighted prompts you to adopt a few simple yet effective tactics or seek out those tips we’ve cited in our manuscript. We hope that this work helps you to improve your reviewing toolkit, thereby helping you to help others improve the quality of their contributions to the field. Next Time on “Opening Up”... We’re actually looking for more ideas from you. There are several that we are considering. For instance, Mike Morrison and I are considering examining the I-O psych Twitterverse to see what open science topics are making their way into our online discussions. We are also considering an article on advice that scholars within our field can offer up to others regarding adopting open science practices. Perhaps you are a teacher who is incorporating open science, broadly construed, into your teaching; or maybe you are a practitioner who has found ways to put open science principles into practice. We’d like to hear from you. Please share your thoughts with Chris Castille (christopher.castille@nicholls.edu). Notes [1] Thanks go out to George Banks who in a friendly review pointed this out to us. 2 It is worth pointing out that journals such as the Leadership Quarterly and the Journal of Management also have methods checklists. References Aguinis, H., Banks, G. C., Rogelberg, S. G., & Cascio, W. F. (2020). Actionable recommendations for narrowing the science–practice gap in open science. Organizational Behavior and Human Decision Processes, 158, 27–35. https://doi.org/10.1016/j.obhdp.2020.02.007 Aguinis, H., Hill, N. S., & Bailey, J. R. (2019). Best practices in data collection and preparation: Recommendations for reviewers, editors, and authors. Organizational Research Methods. https://doi.org/10.1177/1094428119836485 Anaya, J. (2016). The GRIMMER test: A method for testing the validity of reported measures of variability. Peer J Preprints. https://doi.org/10.7287/peerj.preprints.2400v1 Armstrong, J. S. (1997). Peer review for journals: Evidence on quality control, fairness, and innovation. Science and Engineering Ethics, 3(1), 63–84. Bergh, D. D., & Oswald, F. L. (2020). Fostering robust, reliable, and replicable research at the Journal of Management. Journal of Management, 46, 1302–1306. https://doi.org/10.1177/0149206320917729 Bornmann, L., & Daniel, H. D. (2010). Reliability of reviewers’ ratings when using public peer review: A case study. Learned Publishing, 23(2), 124–131. Brown, N. J. L., & Heathers, J. A. J. (2017). The GRIM Test: A simple technique detects numerous anomalies in the reporting of results in psychology. Social Psychological and Personality Science, 8(4), 363–369. https://doi.org/10.1177/1948550616673876 Clifton, J. D. W. (2020). Managing validity versus reliability trade-offs in scale-building decisions. Psychological Methods, 25(3), 259–270. https://doi.org/10.1037/met0000236 Cortina, J. M., Green, J. P., Keeler, K. R., & Vandenberg, R. J. (2017). Degrees of freedom in SEM: Are we testing the models that we claim to test? Organizational Research Methods, 20(3), 350–378. https://doi.org/10.1177/1094428116676345 Crede, M., & Harms, P. (2019). Questionable research practices when using confirmatory factor analysis. Journal of Managerial Psychology, 34, 18–30. https://doi.org/10.1108/JMP-06-2018-0272 Davis, W. E., Giner-Sorolla, R., Lindsay, D. S., Lougheed, J. P., Makel, M. C., Meier, M. E., Sun, J., Vaughn, L. A., & Zelenski, J. M. (2018). Peer-review guidelines promoting replicability and transparency in psychological science. Advances in Methods and Practices in Psychological Science, 1(4), 556–573. https://doi.org/10.1177/2515245918806489 Eby, L. T., Shockley, K. M., Bauer, T. N., Edwards, B., Homan, A. C., Johnson, R., Lang, J. W. B., Morris, S. B., & Oswald, F. L. (2020). Methodological checklists for improving research quality and reporting consistency. Industrial and Organizational Psychology, 13(1), 76–83. https://doi.org/10.1017/iop.2020.14 Goldbeck-Wood, S. (1999). Evidence on peer review—Scientific quality control or smokescreen? British Medical Journal, 318(7175), 44–45. Hancock, G. R., & Mueller, R. O. (2011). The reliability paradox in assessing structural relations within covariance structure models. Educational and Psychological Measurement, 71(2), 306–324. https://doi.org/10.1177/0013164410384856 Heathers, J., Anaya, J., van der Zee, T., & Brown, N. J. L. (2018). Recovering data from summary statistics: Sample parameter reconstruction via iterative techniques (SPRITE). Peer J PrePrints. https://doi.org/10.7287/peerj.preprints.26968v1 Heathers, J. A. J., & Brown, N. J. L. (2019). DEBIT: A simple consistency test for binary data. OFS. https://osf.io/pm825/ Horrobin, D. F. (1990). The philosophical basis of peer review and the suppression of innovation. Journal of the American Medical Association, 263(10), 1438–1441. Köhler, T., González-Morales, M. G., Banks, G. C., O’Boyle, E. H., Allen, J. A., Sinha, R., Woo, S. E., & Gulick, L. M. V. (2020). Supporting robust, rigorous, and reliable reviewing as the cornerstone of our profession: Introducing a competency framework for peer review. Industrial and Organizational Psychology: Perspectives on Science and Practice, 13(1), 1–27. https://doi.org/10.1017/iop.2019.121 Lakens, D., & Etz, A. J. (2017). Too true to be bad: When sets of studies with significant and nonsignificant findings are probably true. Social Psychological and Personality Science, 8(8), 875–881. https://doi.org/10.1177/1948550617693058 Macdonald, S., & Kam, J. (2007). Ring a ring o’ roses: Quality journals and gamesmanship in management studies. Journal of Management Studies, 44(4), 640–655. https://doi.org/10.1111/j.1467-6486.2007.00704.x Marsh, H. W., & Ball, S. (1989). The peer review process used to evaluate manuscripts submitted to academic journals: Interjudgmental reliability. Journal of Experimental Education, 57(2), 151–169. https://doi.org/10.1080/00220973.1989.10806503 McGrath, J. E. (1981). Dilemmatics: The study of research choices and dilemmas. American Behavioral Scientist, 25(2), 179–210. Mcshane, B., & Böckenholt, U. (2017). Single paper meta-analysis: Benefits for study summary, theory-testing, and replicability. Journal of Consumer Research, 43(6), 1048–1063. Nosek, B. A., & Bar-Anan, Y. (2012). Scientific utopia: I. Opening scientific communication. Psychological Inquiry, 23(3), 217–243. https://doi.org/10.1080/1047840X.2012.692215 Nosek, B. A., & Errington, T. M. (2020). What is replication? PLOS Biology, 18(3), e3000691. https://doi.org/10.1371/journal.pbio.3000691 Nuijten, M. B. (2018). Research on research: A meta-scientific study of problems and solutions in psychological science [Preprint]. PsyArXiv. https://doi.org/10.31234/osf.io/qtk7e Nuijten, M. B., Hartgerink, C. H. J., van Assen, M. A. L. M., Epskamp, S., & Wicherts, J. M. (2016). The prevalence of statistical reporting errors in psychology (1985–2013). Behavior Research Methods, 48(4), 1205–1226. https://doi.org/10.3758/s13428-015-0664-2 Pedhazur, E. J., & Schmelkin, L. P. (2013). Measurement, design, and analysis: An integrated approach. New York, NY: Psychology Press. Peters, D. P., & Ceci, S. J. (1982). Peer-review research: Objections and obligations. Behavioral and Brain Sciences, 5(2), 246–255. Peterson, D. A. M. (2020). Dear reviewer 2: Go f’ yourself. Social Science Quarterly, 101, 1648–1652. https://doi.org/10.1111/ssqu.12824 Petty, R. E., Fleming, M. A., & Fabrigar, L. R. (1999). The review process at PSPB: Correlates of inter-reviewer agreement and manuscript acceptance. Personality and Social Psychology Bulletin, 25(2), 188–203. Pickett, J. T. (2020). The Stewart retractions: A quantitative and qualitative analysis. Econ Journal Watch: Scholarly Comments on Academic Economics, 17(1), 152–190. Research Information Network. (2008). Activities, costs and funding flows in the scholarly communications system in the UK. Retrieved from https://www.researchgate.net/publication/291195121_Heading_for_the_Open_Road_Costs_and_Benefits_of_Transitions_in_Scholarly_Communications/fulltext/5ab570f2aca2722b97cacef5/Heading-for-the-Open-Road-Costs-and-Benefits-of-Transitions-in-Scholarly-Communications.pdf Rigdon, E. E. (1994). Calculating degrees of freedom for a structural equation model. Structural Equation Modeling: A Multidisciplinary Journal, 1(3), 274–278. https://doi.org/10.1080/10705519409539979 Schimmack, U. (2012). The ironic effect of significant results on the credibility of multiple-study articles. Psychological Methods, 17, 551–566. doi: 10.1037/a0029487 Schroter, S., Black, N., Evans, S., Godlee, F., Osorio, L., & Smith, R. (2008). What errors do peer reviewers detect, and does training improve their ability to detect them? Journal of the Royal Society of Medicine, 101, 507–514. Spector, P. E. (2019). Do not cross me: Optimizing the use of cross-sectional designs. Journal of Business & Psychology, 34, 125–137. https://doi.org/10.1007/s10869-018-09613-8 Tourish, D. (2019). Management studies in crisis: Fraud, deception and meaningless research. Cambridge, UK: Cambridge University Press. Tourish, D., & Craig, R. (2018). Research misconduct in business and management studies: Causes, consequences, and possible remedies. Journal of Management Inquiry, 1–14. van Rooyen, S., Godlee, F., Evans, S., Black, N., & Smith, R. (1999). Effect of open peer review on quality of reviews and on reviewers’ recommendations: A randomised trial. British Medical Journal, 318, 23–27. https://doi.org/10.1136/bmj.318.7175.23 Vazire, S. (2020, June 25). Peer-reviewed scientific journals don’t really do their job. Wired. https://www.wired.com/story/peer-reviewed-scientific-journals-dont-really-do-their-job/ Whitehurst, G. J. (1984). Interrater agreement for journal manuscript reviews. American Psychologist, 39, 22–28. https://doi.org/10.1037/0003-066X.39.1.22 Wicherts, J. M., Bakker, M., & Molenaar, D. (2011). Willingness to share research data is related to the strength of the evidence and the quality of reporting of statistical results. PLOS ONE, 6(11), e26828. https://doi.org/10.1371/journal.pone.0026828 Zhang, D. C., Smith, R. W., & Lobo, S. (2020). Should you sign your reviews? Open peer review and review quality. Industrial and Organizational Psychology, 13(1), 45–47. Print 3198 Rate this article: No rating Comments are only visible to subscribers.