I’ve been thinking deeply about the need for more people facile with extracting meaning from data - or a “data scientist” for lack of a better term. My friend and colleague Drew Conway developed a useful model for thinking about the attributes of a data scientist. He essentially views this individual as sitting at the intersection of three spheres of a Venn diagram - Hacker/coder, Math/stats and Domain Experience. The data scientist, therefore, has the coding tools and analytical rigor that when applied to a specific domain can yield valuable insights.
I love Drew’s framework because it gets to the single biggest issue I’ve seen lacking in many brilliant Ph.D’s who have “data scientist” written all over them yet fail to translate knowledge into production code - applied knowledge. In fact, Drew is a very accomplished data scientist in his own right yet achieved this standing with only an undergraduate degree (and not from Stanford, Caltech or MIT, mind you). My point is this: the ivory tower notion of a data scientist is total bullshit. Can a Ph.D play the game? Sure. But does one need a Ph.D to be a successful data scientist? No way.
I can easily see a cross-disciplinary undergraduate degree in Data Science conferred by the schools of engineering, information sciences or business. It would be a mix of classroom, lab and field work, with fundamentals of coding, CS and user experience, mathematics and statistics and marketing and strategy. For those wishing to delve more deeply into one of these areas, an optional fifth-year Masters degree could be offered. And yes, there will be those for whom a Ph.D is the goal either because of a desire to enter academia or to perform original research. This is perfectly awesome as well. But becoming a skilled data scientist focused on application versus theory does not, in my experience, substantially benefit from a Ph.D. In fact, it may do the opposite.
The data scientist v2.0 will be out in the world applying their skills to real-world problems, not toiling away in a lab, in solitude. The will get better and better by having more of these real-world experiences from which to hone their hypotheses and glean their insights. And yes, collaborating with a larger pool of data scientists about better techniques for achieving these insights will help as well. Perhaps the Academy will not appreciate my perspective on this matter. But I am not of the Academy. I respect and value the Academy but believe there is much to be learned - that must be learned - on the outside. And this will be part of the fiber of our next generation of data scientists: of the people, by the people, for the people.