Improving the Peer-Review Process & Preventing Data Chicanery

Peer-reviewed research is the gold standard for science. We rely on that system to weed out the discoveries from the detritus. However, growing concerns over how the peer-review system operates are forcing the academic community to take a long, hard look at the process and ask, “How can we improve this?”

Much has been made of the peer review system over the years. It’s simultaneously described as elitist, echoing the Ivory Tower while also protecting us from cranks and bad science. Publishing peer-reviewed articles in scholarly journals remains the flagship of research dissemination. There are many ways to potentially improve the process, but I want to talk about two specifically that aim to improve transparency and faith in collected data: making data available to reviewers and having data reviewed by a statistician.

Making Data Available

To submit to a peer-reviewed journal, authors need to submit

  • A manuscript with an abstract, tables, figures, and references formatted to the journal’s specification
  • A list of author affiliations
  • A statement concerning conflicts of interest
  • A statement identifying the roles and responsibilities of each author in developing the manuscript

But rarely required are the raw data or final dataset underlying the study, generating those relevant tables, figures, and statistics.  

Collecting data for research is a significant hurdle - it’s hard to do. That is why grants are often awarded over years and require considerable documentation around data collection methods. It’s why a Ph.D. dissertation takes so long to complete. Unsurprisingly, the news is alight with academic misconduct allegations surrounding data generation and use. Retraction Watch has reported on misconduct from Stanford, Weill Cornell Medicine, and the University of Alabama, among others.

What if there was a way to hold data under the microscope before a study is published? There might be a way by requiring submitting authors to submit their final raw datasets along with their manuscripts. While this isn’t a perfect solution, it balances transparency, due diligence, and not being overly burdensome for authors during the peer review process.

Transparency - A manuscript's methods and results sections, if done well, in theory, provide enough information so that other researchers can replicate the study. There should be no guessing about how data were collected or analyzed. However, most print journals have word limits for original research papers, which can limit the amount of detail in these sections. But, if final datasets are submitted and made available to reviewers, the findings and numbers should be able to be replicated.

Due DiligencePeer review is to be conducted with a critical eye. How grand are the claims? Do they seem to have the evidence to support those claims? What are the limitations of their study? Did they consider all the feasible factors? If not, do they explain why? This is to say that it is a reviewer’s job to scrutinize every aspect of a submitted manuscript. Reviewing the data should be a part of that process.

Not Overly Burdensome The time required to submit a study for a peer review always takes longer than you think it will or should. There are always hidden hurdles to jump before hitting the coveted “Submit” button. Requiring submission of the final raw dataset would be yet one more component for researchers to prepare. Realistically, this would require data formatting and de-identification. Statistical programs make data exporting quick and easy, so data can be exported into a CSV file that is not tied to a specific program. While many, if not most, projects conduct final analyses with de-identified data, not all do. When data remains identified, there are basic steps that can be incorporated to protect patient privacy.

Review by Statistician

Many data misconduct stories are uncovered by data sleuths – most frequently not employees of journals but concerned citizens with the skill set necessary to understand the nuances of data. But, if quantitative papers submit their final raw dataset and a statistician reviews the analyses, we wouldn’t need so many data sleuths acting as unofficial oversight.

As of 2020, very few journals put papers through a statistical review by a statistician. Instead, they rely on the subject-matter expertise of reviewers who may have some statistical training but are far from statisticians. Review by a statistician could prevent many statistical bad habits, like p-hacking, underpowered studies, and selective use of data, among others.

Limitations

There are potential limitations and considerations. First, the peer review process is not paid. Peer reviewers are doing so voluntarily. This makes requiring review by a statistician more challenging. Consider this along with the number of statisticians, just under 31,000 in the US, and you may have a data Everest to climb. Second, not all datasets can be de-identified or shared. Healthcare data, in particular, is subject to federal protections that can make it challenging to share. Third, journals would need servers to handle data and protect privacy, which would require even more money.

These are not easy considerations to incorporate into a new system, but aren’t impossible. While this isn’t a perfect solution, this will never be an ideal system, and that’s okay. Making tweaks and even major changes in how research is produced and reviewed brings us closer to the ideal.

We should not continue to rely on the assumed innate goodness of people. Instead, we should assume the best but check for the worst. These proposed changes would help weed out more of the bad actors before they’re published than is currently happening. They would increase transparency in the research process for the people who need it most: peer reviewers and editors. In science, we rely on what the data tell us. We should do the same for our peer review system.