Somewhere along the way critical reasoning and a healthy dose of skepticism were supplanted by tacit acceptance as fact press releases and publications generated from academic institutions, those “perfectly” credentialed and arbitrarily deemed scientifically “pure.” To do so actually undermines the scientific method and potential advancement, discovery and innovation. It can also place the public in harm’s way.
Yet, with the current competition today, it is no surprise corner-cutting and mastery of how to get published has evolved statistical tricks for those in the know to optimize their chances. The latest example from Harvard will be discussed here. Since publishers are enabling these behaviors, arming the media and public with tools to separate the wheat from the chaff is essential.
The tangible adverse effects of a recent junk study published in the prestigious journal HEART, because it was wrapped in a Harvard bow, include, the diversion of resources away from important work, the devaluing of legitimate science and the underscoring of the consequences of funding practices that prioritize a pressure to publish at our society’s ultimate expense.
And, when the study itself supports chocolate as possibly beneficial to heart health then that’s a winning formula to pique not only most people’s curiosity, but certainly most media outlets. For full disclosure, I am indifferent to chocolate. If I never had another piece, then it would still be a good life. However, heart disease and arrhythmias like atrial fibrillation (AF), on the other hand, along with their prevention inspire me to a much greater extent. Hence, why I am compelled to review new research associating the two published by a team led by the Harvard T.H. Chan School of Public Health (HSPH).
The authors performed a “large population-based prospective cohort study...based on 55,502 participants aged 50-64 years” and set out “to evaluate the association between chocolate intake and incident clinically apparent atrial fibrillation or flutter (AF)," it turns out among many other questions.
Yes, when people with compelling credentials say eat some chocolate because maybe there might be a possible positive health benefit, we are game.
To be clear, for an otherwise healthy person incorporating some chocolate into their diet is fine, but this significantly flawed study provides little meaningful evidence to support any health benefit either way. Others might, but not this one. For, even the authors conclude: “Accumulating evidence indicates that moderate chocolate intake may be inversely associated with AF risk, although residual confounding cannot be ruled out.” And, it does not appear exhaustive efforts were taken to minimize such confounding factors. There is a systemic and intentional flaw in the analysis methods used in this paper.
When we hear over 50,000 participants in a study, we jump to associate that with highly valuable results. In many cases, this could be true. We think automatically it means there is study validity. This is often not true. The usual arbiter of validity is the p-value, so if it is below a certain number many of us amateur statisticians will assign it more meaning and jump to classifying the study as legitimate.
The reality is often that p-hacking is being employed to manipulate the meaning of a p-value. This is done by asking a ton of questions of a data set and reporting some of those with a p-value less than 0.05. In fact, the more you test, the greater the chance of a spuriously low p-value. The likelihood of false positives increases. The authors note a large number of foods and that the participants completed a 192-item Food Frequency Questionnaire (FFQ) while it is not obvious how many health conditions were available for analysis, likely tens.
So now we can begin to understand that a lot of questions are in play. Chocolate is but one food among many. Atrial fibrillation is but one health effect among many – not at all obvious from the abstract.
Traditionally, a researcher will come up with a hypothesis before embarking on his research. In addition to p-hacking, it appears this work used HARKing as well. This is when a hypothesis is generated after the results are known. Making up a plausible explanation after obtaining your findings is doing it backwards and without a sufficient method to determine any cause and effect.
According to ACSH Scientific Advisor Dr. S. Stanley Young, “researchers at HSPH know about p-hacking and HARKing, but they do essentially nothing to guard against finding false positive results.” He notes that “prospective” makes a study sound better than “retrospective,” but it is “still junk if there are very many questions at issue.” He suggests a number of unanswered questions remain: “Was the age range adjusted to get statistical significance? Did they look for the same effect in younger or older individuals? Could they replicate their findings?”
According to Dr. Young, “The paper is a rather typical data dredge. HSPH researchers know better. It is a winning formula.” Due to the difficulty of getting published today, researchers try many things to bring that p-value down and increase their chances. “There are simple ways to curtail p-hacking. You have to register the study and how it will be analyzed BEFORE it is started. Drug companies do this before they start a randomized controlled trial (RCT)—the FDA is watching. HSPH did not do anything to correct for multiple testing. They used multiple testing to get a publication.”
He further elucidates with multiple testing and multiple modeling (192 foods and multiple outcomes),“there is about a 99.9% chance of getting one or more p-values less than 0.05. Good odds for the researcher; poor odds for the reader. As currently analyzed FFQ studies are a statistical fraud. Editors would never let a RCT researcher get away with this behavior. Editors should consider protecting their readers from very likely false positive results.” (1)
Okay, but it is about chocolate so is it really that big of a deal?
Yes. For one, these practices are commonplace now. Begin to imagine the relevance when it comes to a study on a more aggressive intervention that carries greater risks, for example. Also, consider the ramifications of a diabetic believing extra chocolate could stave off arrhythmias while actually adding sugar. How about the endless litany of products that are useless or potentially harmful that will boast this Harvard study in its marketing claims without splitting hairs?! Or, patients opting out of effective therapies due to such information.
In medical practice, deviating from comprehensive, thorough examination and investigation of personal and family history along with implementing bad data could yield a missed or aberrant diagnosis.
How do we and the media facilitate greater scientific integrity?
Don’t swoon over institutions
Never just read a press release— when possible, always review the primary source to draw your own conclusions
Be skeptical— a healthy dose is good for us as it protects us from harm and challenges others to aspire to the highest standard.
Subject anyone to due diligence, even a Nobel Laureate. In the real world, I would challenge you to show me a person, let alone an institution, who is right one hundred percent of the time.
Avoid “all” this or that snap judgements. For example, there are plenty of exceptional researchers in academia and corporations. Bad actors abound in all fields and environments. Most have the best of intentions -- to which Dr. Young adds "The road to hell is paved with good intentions. Here the intentions are not even good. The intention is to get a publication and funding."
Corporations aren’t all evil and universities all saints. Most products are coming from industry work; a lot of junk science is coming from universities, even Harvard.
In the end, when it comes to chocolate...
Who among us wouldn’t want to hear that eating a universally desirable treat like chocolate semi-regularly could stave off an irregular heartbeat that is often the culprit in strokes and other serious health conditions? Compliance, in general, with medications and therapies is tough enough for many of us, but a feel-good prescription like that would certainly ease the burden.
The problem is most things in life and in health aren’t that simple. Taking a pill or eating a “magic” food is a much more attainable reality than effectively changing bad behaviors—which in the long run tend to be in our best interest. There are no guarantees in life, but for most of us eating a well-balanced diet much of the time with consistent exercise, good sleep, maintaining low stress, having a purpose and being sufficiently socially connected can pose challenging. We tend to live in extremes when moderation would best serve us from an overall improved health perspective.
1. Dr. S. Stan Young’s answer to this question: How could causation have been better determined?
These things would have helped:
a. Place in a public location an analysis protocol.
b. Split the data into two parts. Look for a relationship in one part and confirm or not in the other part.
c. Show the reader ALL the p-values that are computed. Give the number of questions at issue.
2. Disclosure: I went to Yale, which my colleagues seemed to think should be included. Go figure.
3. For another example of poor study design, see Cannabis and Strokes: Linked Or Not?