“Data science” is hot right now. The number of undergraduate degrees in statistics has tripled in the past decade, and as a statistics professor, I can tell you that it isn’t because freshmen love statistics.
Way back in 2009, economist Hal Varian of Google dubbed statistician the “next sexy job.” Since then, statistician, data scientist and actuary have topped various “best jobs” lists. Not to mention the enthusiastic press coverage of industry applications: Machine learning! Big data! AI! Deep learning!
But is it good advice? I’m going to voice an unpopular opinion for the sake of starting a conversation. Stats is indeed useful, but not in the way that the popular media – and all those online data science degree programs – seem to suggest.
While all the press tends to go to the sensationalist applications – computers that watch cat videos, anyone? – the data science boom reflects a broad increase in demand for data literacy, as a baseline requirement for modern jobs.
The “big data era” doesn’t just mean large amounts of data; it also means increased ease and ability to collect data of all types, in all walks of life. Although the big five tech companies – Google, Apple, Amazon, Facebook and Microsoft – represent about 10 percent of the U.S. market cap and dominate the public imagination, they employ only one-half of one percent of all employees.
Therefore, to be a true revolution, data science will need to infiltrate non-tech industries. And it is. The U.S. has seen its impact on political campaigns. I myself have consulted in the medical devices sector. A few years back, Walmart held a data analysis competition as a recruiting tool. The need for people that can dig into the data and parse it is everywhere.
In a speech at the National Academy of Sciences in 2015, Steven “Freakonomics” Levitt related his insights about the need for data-savvy workers, based on his experience as a sought-after consultant in fields ranging from the airline industry to fast food. He concluded that the next-generation super-employee is someone with a bit of business sense, a bit of computing know-how and a bit of statistics under his or her belt.
Data is increasingly being called on to inform all our decisions. But this broad utility means that it isn’t sexy. The sexy jobs – working on self-driving cars or Go-playing computers – are going to require more than an undergrad major in statistics or a week-long bootcamp on prediction using Python. In fact, I was once told by an industry colleague that the term “data scientist” was coined to placate Ph.D. physicists who were tasked with running linear regressions all day long.
So, the way I see it, there will be egghead types off at the edge of the field, and there will some folks doing the necessary drudge work, and there will be a lot of people in between, looking carefully at the data and trying to glean useful insights. But – and this is the big point – everyone had better know how to make basic graphs and poke around a database.
So where do I sign up?
Five years ago there was no such thing as a data science degree, and now the list runs for pages and pages. And that’s not counting the traditional statistics programs, or programs in related subjects like computer science or operations research. LinkedIn’s sidebar strongly feels I should consider an online master’s degree in data analytics, from several different places.
The proliferation of these programs speaks to the inadequacy of many people’s undergraduate educations in terms of statistics and data competency. Although stats majors have tripled, there were only 3,000 last year, compared to 370,000 business degrees and 117,000 psych degrees. More of these students should certainly give statistics (or one of the newer data science degrees) a hard look, given that a bachelor’s degree is borderline compulsory these days.
But I worry that the premise behind the appeal of these degrees – especially at the master’s level – is the idea that the technology alone can solve problems. Nothing could be farther from the truth. Statistics is a tool for understanding data, but cannot by itself understand anything. Probably the biggest mistake people make when applying statistical or machine learning methods is not recognizing that the data being analyzed is insufficient to answer the relevant question. A degree that teaches you only about the hottest predictive analytics technology, like deep learning, is a bit like learning how to drive without knowing the first thing about how to navigate.
Setting realistic expectations for the added value of a statistics education is important to me because I’m a true believer. I feel that more people should learn statistics and how to analyze data because it is a powerful way to understand modern life. In addition to boosting one’s job prospects, a statistics education can teach you when to ignore your doctor’s bad advice, help you understand important financial ideas and, in general, help you be wrong less often. These real virtues are undermined by big data hype.
So yes, lots more folks are studying statistics at the college level than in the past and, absolutely, even more people should be. But I think focusing on the surge in data science specialists is misinterpreting the nature of the demand. Everyone should have more of these skills, even if it isn’t their primary job title.