Let me tell you a story about William Sealy Gosset. William was a Chemistry and Math grad from Oxford University in the class of 1899 (they were partying like it was 1899 back then). After graduating, he took a job with the brewery of Arthur Guinness and Son, where he worked as a mathematician, trying to find the best yields of barley.
But this is where he ran into problems.
One of the most important assumptions in (most) statistical tests is that you have a large enough sample size to create inferences about your data. You can’t make many comments if you only have 1 data point. 3? Maybe. 5? Possibly. Ideally, we want at least 20-30 observations, if not more. It’s why when a goalie in hockey, or a batter in baseball, has a great game, you chalk it up to being a fluke, rather than indicative of their skill. Small sample sizes are much more likely to be affected by chance and thus may not be accurate of the underlying phenomena you’re trying to measure. Gosset, on the other hand, couldn’t create 30+ batches of Guinness in order to do the statistics on them. He had a much smaller sample size, and thus “normal” statistical methods wouldn’t work.
Gosset wouldn’t take this for an answer. He started writing up his thoughts, and examining the error associated with his estimates. However, he ran into problems. His mentor, Karl Pearson, of Pearson Product Moment Correlation Coefficient fame, while supportive, didn’t really appreciate how important the findings were. In addition, Guiness had very strict policies on what their employees could publish, as they were worried about their competitors discovering their trade secrets. So Gosset did what any normal mathematician would.
He published under a pseudonym. In a startlingly rebellious gesture, Gosset published his work in Biometrika titled “The Probable Error of a Mean.” (See, statisticians can be badasses too). The name he used? Student. His paper for the Guinness company became one of the most important statistical discoveries of the day, and the Student’s T-distribution is now an essential part of any introductory statistics course.