How Many People Are Infected by COVID-19? Stanford Starts to Look For the Denominator

To understand how severe and lethal COVID-19 really is, we need to know how many have been infected, which, in this equation, is the "denominator." An early study from Stanford of Santa Clara County says we may be underestimating how many cases there already are, which inaccurately gauges COVID-19’s infectivity and eventual mortality.

Stanford researchers tested 3330 people in one county in California, Santa Clara, that at the time of testing had the largest number of confirmed cases in the state. They collected finger-stick samples of blood looking for COVID-19 antibodies, evidence that an individual’s immune system was exposed to the virus. They recruited their sample using targeted Facebook ads over 24 hours, seeking to capture a representative sample based on gender, age, and ethnicity. With a population of 1,943,411 and 1091 confirmed cases the incidence would be 0.05% - take that as the baseline

  • Fifty participants were antibody positive; they had been infected by COVID-19, out of their sample for an incidence of 1.5% - 30 fold higher.
  • The sample had more women, with fewer children under four and adults over 65, and more “non-Hispanic whites” than minorities. When adjusted for Santa Clara’s actual population, the incidence rose to 2.81% - 50 fold higher than the baseline.
  • “The most important implication of these findings is that the number of infections is much greater than the reported number of cases.” 

One can be drawn to the conclusion that the disease is more widespread and not as lethal as we feel it might be. But the study comes with significant caveats, freely acknowledged that shed more light on the problem. 

The Sample

The first concern is about the representativeness of the sample. You can see from the figure in the study that it is not representative. More importantly, did the sampling method, using Facebook ads, “enrich” the sample with COVID-19 cases? To participate, you needed to have seen the Facebook ad, have a car, be well enough to drive and be concerned with your exposure. As one thoughtful critic pointed out, the research allowed exposed individuals to get tested when tests were scarce, and they could have recruited friends for the study with the same concerns. The bias in participation can not be measured but indicates that answering the question requires representative samples. What is representative will vary by locale and the population at risk. I wrote about this in describing New York’s disparity in testing within hard-hit minority communities. 

The testing instrument – False Positives and Negatives

The test for antibodies, seropositivity, is new, and the sensitivity and specificity of the test are not well calibrated. The researchers tested the test against know positive and negative serum (based on a more accurate nucleic acid amplification test – the swab in the nose or collected before the COVID-19 outbreak) among their patients and made use of the sensitivity and specificity studies done by the manufacturer on similar clinically confirmed COVID-19 patients and controls. While the specificity, false positives, with high and very similar for both test conditions, the sensitivity, the false negatives, were much higher in the local serum as compared to the data from the manufacturer. The researchers used blended false positives and negatives in their calculations. 

  • Fewer false positives lowered the calculated incidence of COVID-19 to 2.49%.
  • Higher false positives increased the calculation to 4.16%.
  • Their blended value gave the 50-fold increase being reported.

In that thoughtful “peer-review,” it is pointed out that the false-positive rate may be higher, it depends on how it is calculated. As that false positive rate increases, given their sample size, the conclusion may become lost in statistical uncertainty. In other terms, in the 50 positive tests, there might be 16 to as many as 40 false-positive results.

Without a good handle on the number of patients infected, we have no real information on how infectious or lethal COVID-19 is; more importantly, it makes it far more challenging to begin social mingling safely. There is good evidence that merely looking at hospitalizations and deaths are biased towards COVID-19 being a very bad actor. To unwind social distancing and its economic effects, we need better information. What the study shows is that testing needs to be local, the incidence of COVID-19 in Montana is different than in Queens. Testing needs to be representative of what we know about risk, looking at groups by age and co-morbidities. 

Finally, testing requires a very reliable test, calibrating the test to reduce false positives is critical. 99.5% specificity is pretty good, and given the time urgency, unlikely to improve so that the usefulness of the test in populations with a low true incidence may be limited.


Source: COVID-19 Antibody Seroprevalence in Santa Clara County, California

Medium, Peer Review of “COVID-19 Antibody Seroprevalence in Santa Clara County, California