Data Flaws

Consumers are often inundated by studies touted in the media, promoting certain foods and disparaging others, sometimes frustrating people with credible-sounding but contradictory advice. Often the culprit creating these contradictory bits of wisdom is a flawed method of data interpretation dubbed "data dredging" by its critics.

Data dredging is "the process of trawling through data in the hope of identifying patterns...a sufficiently exhaustive search will certainly throw up patterns of some kind by definition, data that are not simply uniform have differences which can be interpreted as patterns. The trouble is that many of these 'patterns' will simply be a product of random fluctuations and will not represent any underlying structure," according to J.H. David in The American Statistician.1 Although the correlations may be supported by statistical analysis, specious conclusions may be drawn, then publicized, thereby misleading the consumer.

An interesting example of data dredging is the conclusion of researchers who studied the effects of prayer on the longevity of heart surgery patients, according to Richard Sloan, a Columbia University psychologist.2 Sloan pointed out that the researchers initially wanted to consider five variables; however, upon looking at the data, they decided to consider the variables independently. Only one of the five variables could be correlated to increased longevity of patients after heart surgery; thus, Sloan considered the single association to be insignificant after all, the other four variables did not corroborate the researchers' assertion that prayer increased the longevity of patients. "You can take any data set, and if you cut it apart in a number of ways, you'll find something eventually. But you can't just go in and pick out one finding that's significant and say 'Aha, here it is,' when the other findings are not significant," Sloan explained in an article in the New York Times.2

Recommendations to avoid eating hot dogs and even certain prescription drugs have also been associated with data dredging.

One of these studies considered a litany of factors, such as environmental chemicals, magnetic fields, past medical history, use of hair dryers, and "dietary intake of certain food items," to determine if there was a link between these factors and leukemia. Foods included breakfast meat, oranges, grapefruits, apple juice, hot dogs, and hamburgers. In Science Without Sense,3 Cato Institute adjunct scholar Steven Milloy argues that "the researchers had no idea what they were looking for" when beginning the project; thus, their conclusions are the result of data dredging. They cast a wide net and were bound to find some random correlation.

Researchers established a correlation between the consumption of hot dogs (greater than twelve per month) and leukemia and justified this correlation by explaining that processed meats contain nitrites. Leukemia in rats and mice has been linked to nitrite precursors. But, as Milloy notes, the scientists did not find a correlation between leukemia and other nitrite-containing processed meats, such as ham, bacon, and sausage. The assertion of a casual relationship between nitrites and leukemia was not supported.

The pharmaceutical industry is not immune to charges of data dredging. In 2000, after a three-year, double-blind study of over 2500 HIV-positive individuals, the Immune Response Co. with the University of California at San Francisco (UCSF) concluded its tests of Remune, a drug intended to inhibit the progression of AIDS and/or decrease mortality. The analysis of a subset of patients who seemingly benefited from Remune but only during certain weeks of the study has caused a controversy. Although an independent safety monitoring board concluded that the patients did not show any benefit from receiving the drug, Immune Response "claimed that an analysis of a subset of people who underwent more frequent blood tests indicates that Remune reduced the amount of HIV in their blood the 'viral load'," according to an Immune Response press release cited in the November 10, 2000 issue of Science. AIDS researcher Dr. James Kahn of the University of California at San Francisco countered that "the company thinks data dredging makes sense." He added, "There are no differences at certain interim time points...One cannot pick and choose data points to suit one's needs."

How scientists avoid data dredging? George Davey Smith, professor of clinical epidemiology at the University of Bristol suggests that decreasing the P value from the currently accepted publication standard of one in 20 associations being spurious to one in 100 associations being spurious, thereby eliminating many "false positive" findings.4 In addition, it is vital that people be wary of conclusions based on correlations alone. One must inquire about the data as well as the correlations developed from these data. When we hear nutritional recommendations about a specific wonder food, we must ask What unique characteristic does this food possess that similar foods do not? Furthermore, confirmation of the results through subsequent studies is necessary. Science demands the replication of results.


References

1 David, J.H. Data Mining: Statistics and More? The American Statistician, May 1998, Vol. 52, No.2.

2 Duenwald, Mary. "Religion's Power to Heal Dissected." New York Times, May 9, 2002. Online at www.iht.com/cgi-bin/generic.cgi?template=articleprint.tmplh&articleId=57192

3 Milloy, Steven. Science Without Sense: The Risky Business of Public Health Research, 1995. Chapter 5: Mining for Statistical Associations, online at www.junkscience.com/news/sws/sws-chapter5.html

4 Smith, George Davey and Shah Ebrahaim. Data Dredging, Bias, or Confounding. British Medical Journal December 21, 2002. Online at http://bmj.com/cgi/content/full/325/7378/1437?etoc

line
line
Responses:

July 29, 2003

It's clear that John Tukey's approach to data analysis might be confused with "Data dredging" look at the numbers and see what comes up. It's also clear (at least to me!) that setting an arbitrary level (whether .05 or .01) as the "appropriate" level for eliminating correlations as spurious is...arbitrary.

Any correlation may be worth further study. If it doesn't work, discard it. If it does, use it. Whether it's .05 or .01, you'll get both false positives and false negatives. It's then worth spending some time determining whether the positives or negatives are false the amount of time spent will vary with researchers, with grant money, and with political agendas.

We're all better off if we lobby for clear information on what data were collected (specifying at the very least who collected it, on what sample, with what change observed, when, and where, with units of measurement specified, and what the observer intended to measure) than we are deciding that some measurements are spurious a priori because they don't reach a certain threshold of statistical relevance.

A data curmudgeon,

Tom Whitmore