A reader writes in regarding Allen Wood's new post at the APA blog, "Advice for Applying for Academic Jobs in Philosophy: Indiana University (Part 3)":
Allen Wood says…
"I myself do not care where an article is published. Someone I know at a major research university pays no attention to whether publications have been “peer reviewed” because, as he puts it, “I think I can judge papers better than most of the referees they get for these journals.” That may sound arrogant, but I confess I feel exactly the same way. I even think that caring about whether papers have been peer reviewed is an admission that you are incompetent to judge writing samples yourself, hence incompetent to be conducting the search at all."
I’d love some discussion of this claim on the cocoon…[Wood] thinks he is specially qualified to judge the quality of every published paper. Throw out the peer review system and replace it with Allen Wood.
Indeed, I think Wood's claims do warrant discussion, not primarily because they are his views on the matter, but rather because of the manner in which his claims are tacitly embodied in prevailing selection and hiring methods within academia, and within academic philosophy specifically. Allow me to explain.
We philosophers tend to conceive ourselves as fairly rational people–as people who care about things like the quality of evidence and arguments. However, or so I will argue in this post, I believe we systematically violate these ideals when it comes to hiring practices. Although I am not an expect in the science of hiring, my spouse is PhD candidate at a top-5 program in that field, and so I have come to hear from her a lot about the science of selection (i.e. hiring). And what I have heard from her, and read myself about the science of selection, systematically contradicts prevailing hiring practices in our field, in academia more broadly, and more deeply still, claims that are commonly/widely made by people involved in hiring (see e.g. our recent series, Notes from Search Committee Members, as well as the APA Blog's similar series).
In what follows, I will explain what I have learned about the science of selection/hiring, and argue that the science of selection requires us to radically rethink our hiring practices and preconceptions. Although I will mention things I have heard from my spouse, the thoughts I offer in what follows are my own–and any mistakes are mine, not hers. I put forth my thoughts not as an expert in the science of selection, but rather merely as a learned person who has looked into these matters, and as someone who is concerned about systematic discrepancies I see between common practice and the science, at least as I understand it. This post, then, is not intended to be "God's Honest Truth" about the science of hiring. Rather, it is intended to get people in our discipline to think about, and better research, the science of hiring much more carefully than we have to date–so that we can better separate fact from fiction. Finally, as always, I am more than happy to have my mistakes pointed out, and to reconsider my arguments and position. All that I ask in return is that discussants not simply deny the science, claiming (for example) that the science "seems obviously wrong" — for, as I mention below, scientists who study selection/hiring have invested a great deal of time and energy arguing that our best science disproves a lot of "commonsense." Quite a lot of scientific findings contradict common sense — and moreover, or so I will argue, there turns out to be a very commonsensical story to tell precisely why many commonsense views about selection/hiring are (probably) false.
Here, to begin with, is a very rough outline of common/prevailing hiring methods in academia, and academic philosophy specifically:
- We collect a variety of materials from job candidates, without any clear, validated empirical measure of which materials predict future career success (e.g. publishing, tenure, etc.).
- Search committee members read that material, making their own subjective judgments about different candidates, on the basis of things each committee member thinks "matters" (e.g. some care about grad school prestige, others care about teaching reviews, etc.)–again, without any clear, validated empirical measure of which elements of a candidate's dossier actually predict career success.
- Committee members get together and discuss their subjective judgments of candidate quality.
- After some process of deliberation, search committee members select some handful or dozen candidates for an initial 30-60 minute interview–unguided, again, by any clear, validated empirical measures of predictors of future career success.
- During interviews, individual search committee members develop their own subjective judgments of candidate quality–again without any clear, empirically validated measure of which interview qualities predict future career success.
- The committee meets, sharing their subjective judgments, and arriving at a "short list" of candidates to invite for an on-campus interview.
- On-campus interviews are held, and search committees judge which candidate(s) to extend job-offers to, once again without any clear, empirically validated picture of which element(s) of on-campus interviews reliably predict future success.
In short, here's the long and short of standard practice: search committee members make up the process as they go along, according to their best judgment, at each stage, of what "matters" in a candidate/hire–all the while, at every stage, not appealing to much, if any, empirically validated science.
In my experience, there are a couple of reasons why search committees work this way. First, search committee members tend to think they have a "candidate divining rod"–a good, reliable ability to determine which candidates are better than which. Indeed, one often hears remarks to this end, viz. "So-and-so's writing sample/job talk, etc., was so impressive", or, "S0-and-so is obviously brilliant." Second, search committee members seem to think there is no better way to select candidates–that (obviously!) philosophy is such a specialized enterprise that we can, should, and must defer to our own judgments of dossier materials (e.g. CV, research statement, writing sample, etc.) and candidate performance (e.g. interview/job-talk/teaching demo performances).
Alas, here's the thing: decades of research in Industrial-Organizational Psychology pretty unequivocally reveals this entire picture to be false.
Here, for instance, is the abstract of one of the most famous, and corroborated, studies in the science of selection–a 1996 meta-analysis of 136 studies entitled, "Comparative efficiency of informal (subjective, impressionistic) and formal (mechanical, algorithmic) prediction procedures: The clinical–statistical controversy":
Given a data set about an individual or a group (e.g., interviewer ratings, life history or demographic facts, test results, self-descriptions), there are two modes of data combination for a predictive or diagnostic purpose. The clinical method relies on human judgment that is based on informal contemplation and, sometimes, discussion with others (e.g., case conferences). The mechanical method involves a formal, algorithmic, objective procedure (e.g., equation) to reach the decision. Empirical comparisons of the accuracy of the two methods (136 studies over a wide range of predictands) show that the mechanical method is almost invariably equal to or superior to the clinical method: Common antiactuarial arguments are rebutted, possible causes of widespread resistance to the comparative research are offered, and policy implications of the statistical method's superiority are discussed. (my underline)
That's right: in just about every area studied to date, for anything you want to predict (likelihood of future job success, etc.), algorithmic methods (i.e. "resume counting") tend to statistically perform as well as or better than more subjective, "clinical" methods involving "the human element" (discussion, interviews, etc.).
Why is this? According to my spouse, people who work in her field recognize these are shocking findings that "defy commonsense." But, or so she tells me, it is a consensus view of the evidence in her discipline–one that, insofar as it is difficult to convince laypeople of, is one of the greatest frustrations people in her field face. Although an increasing number of government agencies (the NSA, CIA, etc.) and Fortune 500 companies are beginning to finally get on board with these findings (adopting more algorithmic hiring process), by and large people just don't want to believe that algorithmic methods predict outcomes better than "the human element." How, after all, could we all be so wrong? Don't "human factors" matter when it comes to hiring? Well, as it turns out, the closer one looks at the empirical reality, the easier it is to see why, in fact, no, human factors are bad. They are far more biased than algorithmic processes, which more reliably track job-relevant skills and qualifications far better than "the human element". Allow me to briefly explain.
Consider, to begin with, the simple issue of predicting career success in philosophy. What is career success? Well, at a research university, it is very roughly this: publishing influential papers and books with top journals and book presses–places where one's work will be read, discussed, cited, etc. Somewhat differently, in teaching-centered universities, career success is roughly this: publishing enough research to get tenure, being a great teacher, having students love you and sign up for your classes and the major, doing great "service" work at your university (viz. organizing conferences, committees, etc.). Now let us think about something Rousseau talked about a long time ago, in Of the Social Contract: that idea that distributed cognition in a large system (i.e. lots and lots of people) will outperform individual judgers, by negating the biases and predilections (viz. "the private will") of individual judgers. Think, in particular, about the peer-review publication system we have in philosophy. It is a vast plurality of referees and editors, all of whom (in general) are experts at evaluating papers in different philosophical areas. It is (A) a system of massively distributed expertise, that is (B) explicitly designed to function to determine which work is best, rewarding good/great work with publication. Now think, against this background, what a past record of publishing success means (i.e. publishing peer-reviewed articles in journals like Mind, Phil Studies, etc.): it means a person has (1) demonstrated an ability to succeed in a distributed system of expertise designed to determine which work is best, in (2) the very system that a person will have to succeed in in the future to obtain career success. If you wanted a measure of how likely someone is to be a successful researcher in the future, you would be hard-pressed to find a better system than this kind of distributed system–for again, that's what the entire system is designed to do: determine, through a system of vast, distributed expertise, who the best researchers are. Yet, instead of actually deferring to that system–"counting publications" on a person's CV (and, perhaps, weighting publications according to journal ranking), which is what the aforementioned "algorithmic" method of selection would advise (a method which, again, has been found to be systematically superior to alternatives)–search committee members (and entire committees), by their own admission, don't do that: they instead appeal to "their own judgment" of who the best researcher is. But, if your aim is to predict future success, this is just bizarre. You're not going to be the one to judge whether a candidate publishes in the future; referees and journal editors–whose views about "good philosophy" might be totally different than yours–will be the ones to decide that. By putting your "own considered judgment" of a candidate's promise as a researcher in front of the collective judgments of the very system that person will have to succeed in to be successful, you are, in effect, putting a less-reliable predictor of success in front of a more reliable predictor: the system of distributed expertise/gatekeepers the candidate must succeed in to be successful.
Not only that, systems of anonymized peer-review–unlike search committee members' individual or collective judgments–are systematically designed to mitigate against bias. Although anonymized review is far from perfect, there are at least a number of systems/processes in place to help ensure that a person's work is judged on the basis of its quality, as opposed to, say, tacit/implicit prestige biases, etc. In contrast to peer-review–which, again, is not only designed to mitigate bias and is the distributed network of expertise one must succeed in to have a successful career–I/O psychology has not only shown that judgments about the quality of a candidate's resume tends to be biased by their perceived race/gender; it has shown that perceived candidate quality in interviews is influenced by:
- attractiveness
- weight
- height
- gender
- race
- speech style
- voice-timbre
- personality traits (very few of which, by the way, are reliably related to job-performance).
Further, judgments of interview performance have been found to reliably fail to reliably track interview behaviors, such as lying, that confound interviewer judgments of candidate quality (viz. people lie/deceive in interviews, and studies show that interviewers are unable to discern who the liars are). Not only that. Here's a fun new finding [thanks to Feminist Philosophers for drawing my attention to this], not on hiring per se, but on a related phenomenon–the phenomenon of "being quick one one's feet"–that many of us know to be prized in professional philosophy:
A behaviour that’s linked to higher perceptions of charisma.
People who are mentally quick on their feet are seen as more charismatic by friends, a new study finds. Speed is of the essence, though, the researchers found, while IQ and mental agility were not as vital as they expected.
Professor William von Hippel, who led the research, said:
“Our findings show that social intelligence is more than just knowing the right thing to do.
Social intelligence also requires an ability to execute, and the quickness of our mind is an important component of that ability.”Professor Hippel was fascinated by why some people exude more charisma than others.
He said:
“… When we looked at charismatic leaders, musicians, and other public figures, one thing that stood out is that they are quick on their feet.”
The study included 417 people who were rated on their charisma by friends.
They also took tests of personality and intelligence.
Each was then asked 30 questions which are common knowledge, such as: “Name a precious gem.”
People who were quicker to come up with easy answers like this were perceived as more charismatic by their friends, the results showed.
This was even true when people’s personality and intelligence was taken into account.Professor Hippel said:
“Although we expected mental speed to predict charisma, we thought that it would be less important than IQ.
Instead, we found that how smart people were was less important than how quick they were. So knowing the right answer to a tough question appears to be less important than being able to consider a large number of social responses in a brief window of time.”
Being mentally agile also allows people to consider different social responses on the spot.
This enables charismatic people to rule out inappropriate actions as well as pick out potentially witty responses.The study was published in the journal Psychological Science (von Hippel et al., 2015; my bold-type).
Finally, consider some recent findings presented in Psychology Today:
- People with thin lips and wrinkles around the eyes tend to be regarded as distinguished, intelligent, and determined.
- People with "baby-faces" tend to be judged weak, naive, and submissive. (This despite the fact that "There is no good evidence that trait inferences from facial appearances are accurate", p. 88).
- A person's impression of a individual's personal characteristics differ by camera-angle.
- Talking to someone at a wobbly table can lead one to consider the person to be less reliable.
- Meeting someone in a dirty environment can lead one to judge person's moral characteristics negatively.
- People are judged to be "warmer" to other people if they are holding a cup of hot coffee.
- A single piece of negative information tends to undoes positive first impressions, and tend to require something "heroic" to overcome.
In other words, here's the long and short of it. Decades of scientific research increasingly shows that introducing "the human element" into the selection process does not improve the ability of hiring committees to make sound predictive decisions. It shows the opposite: that the "human element" systematically introduces unchecked biases into the selection process that have little, no, or negative predictive value. To put it more simply still: as much as we all like to think that we have an "awesome candidate detector", empirical science shows this to be false across a wide range of occupations and predictants. If you want to know who the person most likely to succeed in an occupation is, the best thing to do is not appeal to your own judgment or the judgment of "people you trust." The best thing to do is to trust the distributed expertise of the entire system around you. The best, least biased selection process is algorithmic: it really is "counting publications."
Now, I know what you're thinking: philosophy is unique. Unlike other lines of business, philosophy is an academic discipline where one must deploy one's expertise in determining who's a promising philosopher. There are, however, two problems with this line of thought. First, people in every occupation think that. People in business think they have a "great entrepreneur detector", and that one must utilize one's expertise to select a great CEO. People in tech think they have a "tech genius detector" that must be utilized to select the best young programmer. But, here's the thing: although people in domains tend to think this way, the empirical research provides strong inductive grounds for thinking everyone is wrong. If you want to hire a programmer, your own judgment may miss critical things: a person may be a genius, but turn out to be "unable to finish anything" (or worse, turn out to have personal habits that get in the way of their genius consistently manifesting itself on the job). And indeed, I've seen this sort of thing play itself out in academia on a number of occasions.
I guess I'll stop here. It is often said, of the philosophy job-market, that "it's not about counting publications." It is then often added that the job-market "is a crap-shoot", too much about "prestige", not meritocratic enough, etc. But could this all really just be reflective of the fact that, when it comes to hiring, we have it all wrong? As I understand the science, it seems to suggest the opposite: that hiring should be about counting publications, and for roughly the reasons that the reader at the outset of this post implied. Who, offhand, is going to be a better judge of the quality of work in, say, philosophy of physics–a search committee member who specializes in Kant, or an entire peer-review system of people who specialize in the philosophy of physics? Well again, given that it's the latter group who eventually decide (in the peer-review system) which papers are published in that field, it would seem to be them.
To sum up: prevailing methods of selecting/hiring academic philosophers presuppose that individual search committee members, and small search committees–who may or may not have expertise in a candidate's subfield–have better predictive abilities that vast distributed networks of experts in that field. But, the more one thinks about it, this is an extraordinary idea. Individual search committee members have explicit and implicit biases, strange whims, and may not even be experts in the areas they are judging candidates on. The entire system of anonymized peer-review is designed (imperfectly, of course) to counteract these biases and whims, and ensure that experts evaluate the quality of a person's work. And decades of empirical science suggest that, for these types of reasons, algorithmically counting past successes is more predictive than subjective judgments–again, across a wide variety of predictands. So, is it time that we–as a discipline, and as individuals–rethink our preconceptions about hiring? I think so. What do you think?
Leave a Reply to BobCancel reply