[This is a guest post by Neil Levy (Oxford; Macquarie)--ES]
One thousand scientific papers are retracted every single year. Is peer review broken? I don’t think it is. It’s still serving a useful function (at least in science). The retractions are evidence that peer review needs supplementation. It’s getting that supplementation, but it needs to be more systematic than it is.
Right now, philosophers might be aware of three ongoing scandals (or what some peope might regard as scandals) involving peer review. The first concerns the drug hydroxychloroquine, touted by President Trump for COVID-19. The Lancet published a paper that appeared to show that it increased risk rather than lowering it. The paper and a related NEJM paper have now been retracted, after the data on which they were based were found to be unreliable. Meanwhile, a paper at a very prominent psychology journal has been, um, raising eyebrows due to its claim that without religion, human groups with low IQ tend to engage in much higher rates of homicide, and that sub-Saharan Africans have average IQs at least one standard deviation below the mean for developed nations. Finally, in philosophy there is a brouhaha over the publication of Robin Dembroff’s reply to Alex Byrne in Philosophical Studies.
All these papers were peer reviewed. Holly Lawford-Smith has cast doubt on this claim with regard to Dembroff’s paper, because it was accepted “mere months” after Byrne’s paper appeared, so Dembroff must have written the paper and revised it in the light of comments very quickly. That would make their experience of peer review an unusually happy one, but I’ve gone from paper idea to acceptance that quickly on occasion. We now know that Philosophical Studies used only one reviewer, which speeds up the process.[*] You might worry that the use of a single reviewer risks arbitrariness and therefore casts doubt on the quality of accepted papers: that worry applies equally to both papers, however.
It’s too early to see how things will shake out in the end with regard to these papers, but I wouldn’t be at all surprised if when things die down, all and only the ones I regard as unreliable are retracted. We might think that’s the system working as it should: occasionally bad papers get through peer review, but their errors are picked up and all is well.
In fact, while the Psych Science paper may end up being retracted, it won’t be purely because it’s bad science. On Twitter, defenders of the paper portray support of retraction as “woke demands”. In one sense, they’re right: the paper’s flaws are not of a kind or degree that is really much worse than lots of other work published in psychology and in many other areas too. It’s not because it is bad science it might be retracted; it’s because it’s bad science about a politically sensitive topic. Lots of other work goes uncorrected and unretracted.
Does that mean that peer review is failing? The problem in science is too many false positives (in philosophy, the main problem is the opposite: too many perfectly good papers are rejected). I don’t think we should expect peer review to fix that problem. Here’s the problem (spoiler alert): science is hard. Peer reviewers often lack the time or the incentive to look very carefully at the data and the methods. Papers are produced by multiple authors and they often have different areas of expertise. Any two peer reviewers won’t be able to match that. They should be expected to miss things. They won’t always have the statistical expertise to detect problems in the analysis. They won’t always have expertise in the methods by which the data were gathered. They won’t always have expertise in the theoretical waters the paper treads in. They can’t be expected to, because there are always multiple areas of expertise relevant to a paper; more areas of expertise than it is practicable to have reviewers.
That doesn’t mean peer review is a waste of time. We should think of peer review as nothing more than a certification that a paper appears to meet a certain minimum standard. It doesn’t seem to be the work of cranks or the incompetent. It is an effort at serious science by people who seem to have used appropriate sorts of methods to gather data they have analysed in a way that makes prima face sense. If we look at peer review in that kind of way, we’ll have lower expectations, and we shouldn’t be disappointed.
If we think of peer review as minimal certification, we won’t be tempted to think that the fact that a paper has passed peer review is evidence that it is really reliable. Lots of unreliable work is more or less competently produced. For example, Cailin O'Connor and James Weatherall have shown how competent scientists can deliberately or inadvertently produce unreliable work through competent procedures, by taking advantage of the normal distribution of results and the use of small sample sizes. If peer review is a certification of competence, this kind of work will often sail through (given prevailing standards), producing unreliable peer-reviewed science. But we do want to be able to rely on science: we want to know what work should be given significant weight (rather than simply accepted uncritically). We can’t depend on peer review to certify reliability, and we can’t depend on unreliable work being retracted.
Here there is a role for a more formal post-publication review process. The twitter wars and blog discussions that lead to retraction, when it happens, are post-publication review. They reflect the work of hundreds of people (with a truly diverse range of expertise) poring over a paper. Politically sensitive work tends to get more of this sort of attention, but it’s by no means limited to such work. The result, at the moment, is that a small community of scholars comes to have a good idea which papers in their field are unreliable, but the rest of us may have no idea (and philosophers may then go on to cite these unreliable papers; I’ve certainly done it myself).
Eric Schliesser himself has suggested combing peer-review with post-publication. His suggestion is that after an initial stage of peer review, both the submission and the reports should be posted publicly and opened to comments from those who are suitably qualified. That might be worth doing (its benefits would have to be weighed against costs, in terms of slowing down the process and the danger that many papers simply would never be scrutinized), but it wouldn’t address the issue I’m concerned with. While it might increase the overall quality of publications, it wouldn’t allow us to distinguish the unreliable but good enough for publication from the work that should be given significant epistemic weight.
A more formal post-publication review process might be built around a paper archive, perhaps along the lines suggested by Marcus Arvan (most scientific papers are already posted as preprints). Those with the right credentials (what credentials? That’s a hard question) would be able to post discussion and also to rate papers. We could use a star system, or a pass/fail system. That large mass of papers which are insufficiently reliable to be cited as genuine evidence for a claim, but not bad enough or sensitive enough to be retracted, would remain visible, but with their low rating would be seen for what they are.
Could a system like this work for philosophy papers? I’m sceptical. I don’t think methods are uncontroversial enough for us to converge on assessments on these kinds of grounds. Philosophy is too diverse in its approaches and too dependent on argument. Premises can always be contested and we can reasonably disagree about whether adjustments are epicycles or justified. Any paper bad enough to generate sufficient agreement in post publication review really shouldn’t have got through peer review at all. No solutions for us, I’m afraid. Still, since philosophers rely on scientific research increasingly heavily, we have a stake in making science more reliable.
*UPDATED: Looking again at the various statements by Cohen and the present editors of Philosophical Studies, reported at Daily Nous, it seems Byrne was evaluated by one referee, and Dembroff by at least two.--Eric Schliesser
Peer review serves a useful function not only in the natural sciences but also in the social sciences and the humanities. But peer review is not perfect. There is type one error and there is type two error. Sometimes an excellent paper is rejected for the wrong reasons and sometimes a not so good paper is accepted for reasons that are hard to fathom. If there were no peer review there would not be a better way. Post publication review sometimes takes the form of comments of other scholars. It is not only true of articles in academic journals. Books published by university publishers are also often peer reviewed, and errors occur there, too, especially in books on "popular" subjects that seem to be going with the tide.
Posted by: Johannes (Hans) Bakker | 06/17/2020 at 07:56 PM
The issue I worry about with regard to post-publication review is that it would reinforce groupthink.
Levy talks about how on Twitter hundreds of people pore over papers, but that seems to happen (correct me if I'm wrong) mainly when the paper challenges some established wisdom or sacred cows. Moreover, owing to the nature of Twitter, you can't be sure that the people who comment on a paper have actually read the paper, let alone read it with care. Moreover, because people's online comments are publicly available, you can get the impression that lots of people have independently come to the same conclusion about a paper when in fact all that has happened is that one or two people have come up with criticisms and everybody else assumes that they're right and simply repeat them.
Here's another facet of the groupthink worry. Imagine the following is true: the vast majority of papers, if they were scrutinized closely by lots of people who are deeply invested in their being false, would show themselves to at least have quite questionable assumptions. (Take, for example, every philosophy paper ever.) However, if you scrutinize only a few and leave the rest untouched, those few will appear to be unusually defective while the rest, owing perhaps to the contrast effect, will appear robust. Especially if the ones closely examined are closely examined because they challenge conventional academic wisdom, while the ones that aren't closely examined don't challenge conventional academic wisdom, you will leave with the impression that those who don't toe the line are simply worse scholars than the ones who do.
Posted by: Robert A Gressis | 06/18/2020 at 06:17 PM
The Psychological Science article has now been retracted:
https://retractionwatch.com/2020/06/18/authors-of-article-on-iq-religiosity-and-crime-retract-it-to-do-a-level-of-vetting-we-should-have-done-before-submitting/
https://statmodeling.stat.columbia.edu/2020/06/22/retraction-of-racial-essentialist-article-that-appeared-in-psychological-science/
Posted by: Nicolas Delon | 06/22/2020 at 05:10 PM