Before the 1920s, statistics weren't standardized in the world of scientific experimentation. How many trials were done and how many samples per trial were determined by not much more than the intuition of the experimenter.
That all changed with the ideas of Sir Ronald Fisher, a British biologist who is considered the father of statistics. In his 1935 book, "The Design of Experiments", Fisher included an experiment that was the first of its kind and blew open the idea that a randomized analysis of experimental data is a necessary component of experimentation.
The now famous "Lady Tasting Tea" experiment is said to be based on a true story when a woman, thought to be Dr. Muriel Bristol, made the claim that she was able to tell whether the milk or tea was added to her cup of tea first.
The question that Fisher worked to answer was - how many cups of tea would the lady have to taste (and how should they be prepared) in order to separate an actual ability to tell the difference from random, lucky guessing?
How many cups would it take?
Thankfully, Ronald Fisher did the hard work for us and came up with an experiment that paved the way for future studies and, in essence, defined the p-value of 0.05, where it still stands today.
In the experiment, "the lady" as she is called (remember, it was the 1930s) was told, before tasting the cups of tea, how the experiment was to be set up. She would be given eight cups to taste. Four cups would have milk added first and four would have tea added first. The order that the cups were offered to her was randomized by writing the numbers 1-8 on small slips of paper and then drawing them out of a hat - an old fashioned random number generator. She would taste the eight cups, giving her the ability to compare the two types of cups, and after tasting all eight, she would identify the four cups with tea added first.
Why eight cups?
Eight cups can be placed in 70 possible orders. For example, the cups could be tasted (with M meaning milk first and T meaning tea first) in the order MMMMTTTT or MTMMMTTT or MMTMMMTT or MMMTMTTT and so on. We could make 66 more possible combinations just like those four.
With 70 possible combinations, there is exactly one in 70 chance that the lady will guess all eight correctly - a 1.4% chance. There are 16 ways that she could guess one cup wrong - a 23% chance. Fisher decided that a 23% chance is far too high to distinguish lucky guessing from actually knowing. A 23% chance is too likely to happen at random and that is the essence, in its most basic form, of statistical relevance.
A scientific experiment needs to be able to determine what is a real effect and what is a random coincidence. The lady who thought that she could tell when the milk was added to her tea started a revolution that defines what is science and what is chance.
At the end of the day, it is said that the lady did get all eight cups correct. Could she have been guessing? Of course. But, with only a 1.3% chance of being able to guess all eight cups correctly, we have to fall on the side of science this time and conclude that maybe she actually could tell the difference.