A recent study (Rothwell, P.M. and Martyn, C.N. Reproducibility of peer review in clinical neuroscience: Is agreement between reviewers any greater than would be expected by chance alone? Brain 2000 123:1964-1969) measured the level of agreement between reviewers of manuscripts submitted for publication in a scientific journal. These reviewers are usually professors in universities with extensive expertise in the subject of the reviewed manuscript.
The editor of the journal asked the professors two questions: 1. should the manuscript be accepted, revised, or rejected, and 2. is the priority for publication low, medium, or high. Every manuscript was evaluated by two professors. The study was repeated with manuscripts submitted to two journals. In journal A the study compared the evaluations of 179 papers and in journal B the evaluations of 116 manuscripts. The agreement between the professors was calculated using the k statistic.
The results showed no agreement between the reviewers regarding both the recommendation and priority for publication. In fact, the level of agreement was no greater than which would produced by flipping a coin. Moreover, when a larger number of independent reviewers evaluated the same manuscript, the results were the same, no agreement. As the author of the study write "if peer review is an attempt to measure the overall quality of research in terms of originality, the appropriateness of the methods used, analysis of data, and justification of the conclusions, then a complete lack of reproducibility is a problem.
These specific assessments should be relatively objective and hence reproducible." The assessments should be reproducible, but they are not. When one professor said "accept for publication," the other said "reject," when one reviewer said "high priority for publication," the other said "low priority."
Points to consider:
1. The first stage of most decisions is gathering data. For instance, prior to making a marketing decision, researchers conduct focus groups, perform in-depth interviews, or use open-ended questions in surveys to ask customers for their opinion. Before hiring a new employee, human resource managers conduct interviews with candidates to gather information about their background and proficiencies. Prior to selecting a college, applicants (and their parents) collect articles from the internet, interview peers, and question teachers about their target colleges. Before making an investment, investors collect data about their target companies. In all these cases, and many others, the information is captured in the form of words. In this study the professors analyzed the words in the manuscripts. In light of the professors failed analysis, how confident should the decision makers be in their ability to correctly analyze these words, and hence, make the right decision?
2. In this study, t
3. The criteria in this study were whether the research reported in the manuscript is original, uses appropriate methods, correctly analyzes the data, and properly justifies the conclusions. As the authors of the study say, these criteria are regarded relatively objective. Unlike this study, the great majority of qualitative studies involve subjective criteria such as tastes, morals, values, or preferences. If the professors failed to consistently apply objective criteria when evaluating the manuscripts, how can the less trained professionals and layman be trusted to consistently apply subjective criteria when evaluating qualitative data?
4. In this study, pairs of professors assigned different values to the same manuscript. Who is right? After all this is science and both cannot be right. Now, if such great experts failed to convince us that they can process a qualitative dataset correctly, or at least consistently, how can we trust professionals or layman when they say that they can?
Mike T. Davis, SCI, Rochester NY. We are the inventors of Computer Intuition, a psycholinguistics based program that analyzes the language that people use to describe themselves and their environment. When clients hire our services, they send us their qualitative data. We input the data to the computer, which calculates the psychological intensity, or psytensity, of every idea found in the text. We then isolate the ideas with the highest psytensities, and document them in a report that also includes our "Do this, do that" recommendations. Within a week of receiving the data, we present the results to the client. SCI's clients include many Fortune 500 companies, such as Apple Computer, Sears, Allergan Pharmaceuticals, Chrysler, Citibank, IBM, Motorola, Eastman Kodak, Hewlett-Packard, Anheuser-Busch, Xerox, and Frontier Communications. We also serve many smaller companies that came to realize that Computer Intuition is the only tool for a correct analysis of text.