The context surrounding the study of the impact of testosterone on elite athletes is essential in understanding its underlying hypothesis – that “excess” endogenous testosterone confers a competitive advantage. This is a direct outgrowth of the Dutee Chand’s suspension that was overturned because while “possessing high levels of testosterone and thereby increasing lean body mass creates a competitive advantage,” there was no evidence of endogenous testosterone’s impact. It also explains the apparent conflict of interest of the authors; one the head of the International Association of Athletics Federation (IAAF)  Health and Science Department, the other a consultant and member of the workgroup on hyperandrogenic females and transgender issues and was a witness in the Chand case.
While exogenous testosterone and androgens are among the most commonly used doping substances of female athletes accounting for 55% of 116 females serving under a doping ban in 2016, the issue at hand is endogenous production – that is, elevated levels of testosterone and androgens made naturally by these athletes because of their innate metabolomics , characterized as hyperandrogenic “disorders.”
21 female and 22 male elite athletes participating in the 2011 and 2013 IAAF World Championships in track and field form the basis of the study. The researchers utilized 1332 observations of women 795 of men based on completion of an event as a solo performer with a complete blood panel. Blood was collected after being on site for at least 24 hours and at least 2 hours after intense exercise. The best performance during the competition was used for the performance benchmark.
Among the female elite athletes, there were 44 observations in which free testosterone was greater than 3.08 nmol/L placing them in the 99thpercentile among elite female athletes.  It should be noted that of the 18 female athletes identified half were found to be doping and for an additional six it was unclear whether the free testosterone levels were endogenous or exogenous. The performance data for all 18 were used raising the question of whether this data accurately reflects the performance of women like Chand or Caster Semenya. These clearly elevated levels were identified in two areas, running and throwing. Of the 21 track and field events, a statistical difference in times or distances was noted in the 400 and 800m, the 400m hurdles, pole vault, and hammer throw when comparing those athletes in the highest quartile to those in the lowest. This represented a 2.73%, 2.78%, and 1.78% faster time in running events and 2.94% and 4.53% greater distances for the vault and hammer respectively. When compared to all of the elite runners, and you might argue that this was a better measure of their “advantage” the improvements were generally half as much. 
Among the male elite athletes, 101or about 13% had significantly decreased levels of free testosterone (<0.23nmol/L), and they participated in all events except for sprints and combined events. There was “no significant difference in performance” noted in the men.
“Our study design cannot provide evidence for causality between androgen level and performance, but can indicate associations….” The researchers first considered the effect of testosterone on hemoglobin production which improves performance by increasing energy production. They found about a 0.6g/100mL difference in hemoglobin levels based on the highest and lowest free testosterone quartiles, but this difference in hemoglobin did not impact speed in the three running events they considered. Consequently, they concluded that free testosterone "concentration is a stronger determinant of performance in the 400 m, 400 m hurdles and 800 m than Hb [Hemoglobin] concentration.” They also felt that androgens might influence lean body mass as well as “mental drive and aggressiveness” none of which they quantify, which is reasonable given that ambiguity of such a quantification. What is not reasonable is including other readily measured qualities that enhance performance, most importantly the body height and mass. There is significant evidence based upon mechanics of the advantages of both of these factors. 
In discussing the pole vault and hammer throw, the narrative of an enhancing effect of testosterone breaks down. First, there is no ready or plausible explanation for the greater vaulting height beyond greater strength, but this did not hold true for other events like shot put or javelin. To provide a reason an advantage in the hammer throw, the authors pointed to the rotation required before the toss and its associated “mental rotation task” which they indicate is greater under the androgenic influence. A deeper dive into that particular statement shows that the mental rotation task was to identify whether a visualized object had been rotated not whether the participants recognized their spatial orientation after being rotated a task more closely aligned with the hammer throw than recognizing whether the hammer has rotated in your hand. Finally, they concluded that the lack of difference in male performance, when stratified by free testosterone, suggested that women had more ”to gain in muscle mass and strength.”
To summarize, some of the data used reflected exogenous, not endogenous free testosterone, a difference in outcomes was found in only some, not all of the events so that a plausible underlying physiologic mechanism is not clear and physical differences that might account for outcomes were not taken into account. Do you believe that this is sufficient evidence that free testosterone confers a competitive advantage requiring chemical “correction” to level the playing field?
COMMENTS FROM THE AUTHOR:
I think one of the main controversies coming out of the discussion surrounding Caster Semenya is the definition of gender. If you look historically, you can see the progression in our thinking.
Initially, gender was defined phenotypically, by one's anatomy. Either you were male or female, and there was a small group of others labeled hermaphrodites. Fast forward centuries but to less than a century ago, in 1923, when the work of Mendel lead to the discovery of our "sex" chromosomes, the X and Y of life; gender could now be defined genetically, XX or XY. Of course, again that lead to the discovery of other genetic versions, like XXY. In reality, the determination of male and female doesn't require the entire Y chromosome, just one gene, SRY. And SRY has been found in individuals we would consider women.
The IAAF has moved the definition once again, away from chromosomes and genes, gender is now to be based on metabolic activity. It is not that these athletes are not phenotypically or genetically what we have considered female, but that our measurements can now detect differences in metabolic behavior.
Of course, interacting with all of these science-based definitions are cultural ones that we have confused and conflated with one another. From my view, defining gender based on overall metabolism, the metabolome, is too fine a distinction and serves us poorly in clinical care. Altering the definition of gender in this way means that we have to test for gender in doing observational and randomized studies, we cannot simply decide by self-reported phenotype.
Finally, who said that athletics, especially at the elite level, occurs on a level playing field? As I wrote in the review of the IAAF's study, no one accounted for height. If we are going to flatten the field for metabolism certainly height and muscle mass, the real target of those testosterone-concerns, confer an unfair advantage. Are we going to have the short stretched or the tall shortened in a new version of the Procrustean Bed? Whatever advantage, if any, these athletes have, it was randomly assigned, and they should not be penalized. They certainly should not have to take drugs to change their metabolism to meet the definitions of the IAAF, a group of 27 elected members where “there is guaranteed a minimum of six female members.” Is 22% equitable representation? And by the way, when the IAAF talks about its board in this way do they mean the board's genetics or their metabolism?
 The International Association of Athletics Federation (IAAF) is the governing body for global athletics. The disclaimer itself is purposely specific claiming “no other relevant financial involvement with any organization or entity with a financial interest with the subject matter or materials discussed in the manuscript.” (Italics added) They do have a significant “eminence-based” investment in the manuscript’s outcome.
 Metabolomics is the study of the chemicals of our metabolism, their concentrations, effects, and interactions
 The study provides a wealth of metabolic data. Testosterone is found bound to two proteins in the blood, albumin and sex hormone-binding globulin (SHBG), but it is the free testosterone that is felt to be biologically active.
 The improvement when considered for all the elite athletes was 2.04% and 1.06% for the 400 and 800m, 0.99% for the 400m hurdles, 0.66% for the vault and 2.19% for the hammer throw
Source: Serum androgen levels and their relation to performance in track and field: mass spectrometry results from 2127 observations in male and female elite atheletes. British Journal of Sports Medicine DOI: 10.1136/bjsports-2017-097792