Viewed in this way, the data in Table 1 are a fountain of puzzlements. Consider row 2. Given a choice between offering (10, 0) and (8, 2), every Proposer offers (8, 2), but 8.9% are rejected! Responders at the rate of 8.9% do not like this outcome. Why? Well, it is said to reflect very strong fairness attitudes. Again, when the alternative is (2, 8), 73% offer (8, 2) which is rejected by 26.7%. And when only (8, 2) can be offered, 18% are rejected. But how is it fair to punish the proposer who does his best to be other-regarding when the equal-split outcome is unavailable, as imposed by the experimenter? Surely these responses are messages for the experimenter as much as for the proposer subject, where the responder is expressing dissatisfaction with the circumstances of the game.--Vernon Smith (ESI working paper, 2018) "Causal versus Consequential Motives in Mental Models of Agent Social and Economic Action: Experiments, and the Neoclassical Diversion in Economics," p. 32.
Some regular readers of these digressions expressed surprise to me that given my repeated criticism (inspired by Deirdre McCloskey) of the cult of statistical significance (I started blogging about in 2010) and my focus on the role of incentives in the sciences, I was not more vocal and active during the public discussions about the replication crisis (in social psychology)--my friend David M. Levy had frequently told me about his attempts to get economists to pay attention to this issue decades before (see this 1993 paper with Feigenbaum). In fact, much as I like to say, 'told you so,' and despite feeling some all-too-human-annoyance at those fellow philosophers of science (you know the type who think I am not rigorous enough to be a philosopher of science), who previously had seemed rather complacent about such matters, hogging all the attention with new super-duper-technical fixes, I had lost interest in the topic (also the young kids like Liam Kofi Bright and Remco Heesen are doing swell work, so who needs me?) Getting robust science of the ground that can produce high quality evidence (see George Smith's "closing the loop" for what I have in mind) is difficult enough under best circumstances.
But during the last few weeks I have been listening a lot to Vernon Smith (Nobel, economics 2002) describe his new project in Humanomics (which is also in the title of the book he co-authored with Bart Wilson).* Now humanomics is an attempt to develop an ambitious new paradigm for economics--one that treats utility theory as perhaps at most special case (in the paper that I quote Vernon Smith calls it the 'diversion.')** The new paradigm is grounded not just in experiments, but also on a framework derived from a very critical re-reading of Adam Smith's other book, The Theory of Moral Sentiments. Some other time I return to this new paradigm in the making. (To be sure, Vernon Smith does not use the language of 'paradigms' and that may, in fact, be very wise.) But here I want to add a thought about the replication crisis.
In my impression, most of the discussion of the replication crisis has focused on mistaken journal practices (no interest in failures of replication, or any replications), bad incentives (high rewards for spectacular new research, little rewards for careful work, etc.), bad ethics (some researchers engaged in spectacular fraud), the abuse of statistical significance (which became an arbitrary cut-off line that allowed researchers to massage their findings),+ and the increasing tendency to let the news cycle impose its values on scientific publication.
To the best of my knowledge, little has been said about research design and the way that experiment and theory are supposed to interact. That's not so easy, of course, because it requires detailed knowledge of the way in which evidence works in particular experiments and research practices. So, one reason why I always enjoy reading and talking with Vernon Smith is that he is unusually reflective on his own practice and welcomes criticism (one of my first publications was a criticism of his methodological musings in light of my ideas on the relationship between theory and evidential practices).
Okay, let me turn to the thought I want to derive from the comment in the working paper quoted above. One key problem in experimental social science is that the experimental subject may be sending a signal to the experimenter. This is a distinct problem from the well studied issue (in animal research) that an experimenter may be prompting the scientific subject without realizing it. The previous sentence does not mean to suggest that only humans can send signals to the experimenter in the experiment. Clearly animals can do it, too. (A few weeks ago I heard a talk by Sarah Brosnan who also described the phenomena in an experiment with her primates.)
So, one problem that may contribute to replication troubles is that either in the original experiment or in the attempted replication, subjects are responding differently to the experimental set up by their willingness to send a signal to the experimenter. In their book, Wilson and Smith note of one such case that "This seems to be a 'kill the messenger response', not a 'punish the Responder' reaction." (p. 131) Because few experimentalists also model their own role in the experiment -- they tend to see themselves as outsiders looking on, who can manipulate/observe their subjects -- these signals are rarely properly measured. And this can lead to them being misinterpreted as noise (or outliers) or, worse, as evidence for some finding or another within and in different treatments (or attempted replications) of the same experiment. Anyway, I am not claiming this will be a big part of the replication crisis, but until it is systematically investigated, we don't know how small it is.
That's all I wanted to say in this post. But let me close with another relevant thought; in his book with Wilson, Smith and Wilson write in a footnote "working papers are often better than the published version." (Humanomics, p. 122, note 14). They are expressing the familiar thought that referees don't always improve the final product. This is one reason why we should not fetishize refereed publication--this is something that often happens in tenure and promotion and grant-applications, and so undoubtedly contributed to the replication process. Anonymous peer review is also treated as the scientific gold standard in the culture wars/public debates over science. Refereed publications are, like other procedures, just one important element in science, but not the most important one.
*Full disclosure: I am a visiting researcher in their department at Chapman the moment and I have blurbed the book (although when I blurbed the book I had no relationship with the department).
**I use, perhaps, because Vernon Smith has become more radical since I became familiar with the project. I quote from footnote 13: previously "I had argued that the neoclassical diversion to Max-U equilibrium economics had lost the dynamic price-specialization-discovery process prominent in WN and classical economics, then rediscovered in experiments. Never-the-less—I thought—Max-U survived nicely in specifying static equilibria in market supply and demand, which in turn justified the induced utility value methodology for implementing Max-U neoclassical and game-theoretic models in the laboratory. Where it failed—so I thought— was in trust and other small group games. Inoua (2018) corrects my limited thought perspective in a way that includes neoclassical Max-U as a special, and empirically falsified, special case. The classical model simply begins with the observational distinction between “use” value and market price value, where use value corresponds precisely to modern notions of willingness-to-pay, or reservation prices, but with no required commitment to the individual rationality of these self-imposed limits by imperfectly informed, error-prone, individuals, a prominent theme in TMS and WN. I had been answering (in the negative) the question: “Is utility theory a theory of everything?” In substance, Inoua rightfully asks: “Is utility theory a theory of anything?”"
+Arbitrary because disconnected from considerations tied to the theories that were being tested or advanced.
Comments