Data science has gone from a newly coined term in 2007 to being one of the most sought-after disciplines in the professional world. But what does a data scientist really do? And how can you break into the field? Here’s what you need to know if you’re looking to get the skills to become a data scientist.
What Do Data Scientists Do?
Data scientists combine statistics, computer science, and data analysis to bring order to the massive amounts of unruly data that are now collected by thousands of companies. It’s well-known that your Facebook account contains valuable information, and that Google wants to know absolutely everything about you. But now even local start-ups collect data that they hope can be mined and turned into useful strategies for growing their businesses.
The data that companies collect is often very messy — it’s incomplete, disorganized, incoherently labeled, and often just plain wrong. But there’s a lot of valuable information there, and data scientists are the ones who generate insights that can be put into action by the company’s business side of operations.
Many descriptions of data science emphasize the importance of discovery in the field; data scientists might not know what they’re looking for as they go through terabytes of data, but they’ll know when they see something interesting (this need for intuition and discovery is one of the reasons why this is a job that can’t be done well by robots). They also need to be good at presenting this information to others, as managers and executives aren’t usually well-versed in the language of data analysis as data scientists.
In short, data scientists analyze massive amounts of data and turn them into actionable strategies. Make no mistake: this is not an easy job. But it’s hugely valuable to companies, and always will be, which is why data scientists can expect to have secure jobs into the future. And they get paid well for these skills: a data scientist can easily make over $90,000 per year.
What Skills Do Data Scientists Need?
As “data science” is a quickly changing and often ill-defined field, the range of skills that you’ll find among data scientists is impressively wide. Most have some training in statistics, data analysis, and mathematics. Almost all have programming experience, especially in Python, R, Hadoop, SQL, and other languages that are used for data storage, statistics, and machine learning. Because it’s especially popular in data analysis, learning Python is a good place to start.
Knowing other data analysis programs as well, like MATLAB, SAS, and Minitab can also be quite useful.
The ability to communicate clearly with people who don’t understand machine learning, statistics, or data analysis is also very important. If you find something ground-breaking but can’t explain it to anyone, it’s not going to be of any use. Clear communication is a soft skill that is required of any technology worker these days.
Experiences in multiple fields is an asset if you are an aspiring data scientist — both within and outside the area that you’re working in. Being able to think creatively and address problems from multiple different angles is hugely useful when working in data science, as new problems often require innovation and ad-hoc solutions.
Learning the Skills for Data Science
Because data scientists need to be able to work with a variety of tools that come from different fields, as distinct as application development and probability theory, the path to joining the profession isn’t a clear one. Many data scientists start out as computer scientists or statisticians and gain the necessary skills while on the job. Others come from completely different backgrounds that give them the experience they need to solve problems in creative ways.
“Data scientists are involved with gathering data, massaging it into a tractable form, making it tell its story, and presenting that story to others.”
– Mike Loukides, VP, O’Reilly Media.
However, data-science-specific training is becoming more available by the day. Although the places for degree-level training are small in number and extremely competitive, they’re worth looking into. Having a headstart on the skills that you’ll develop in these programs will increase the chances that you’ll get into a program, and land a job, even without a degree in computer or data science.
The resources listed below will help you start racking up the skills that you need to be a data scientist. Some are free online college courses, and some are more professional-development-type resources. All of them are free, unless noted. At the end of the list, I’ve included some certification, immersive, and degree programs, in case you’re wondering where you can get some serious training in data science — there are more out there, but these should give you an idea of what’s available.
- Python (Google)
- Computing for Data Analysis (Coursera)
- Data Analysis with R (Coursera)
- Data Mining with R (Big Data University)
- Hadoop Fundamentals I (Big Data University)
Statistics and Data Analysis
- Probability and Statistical Reasoning (Carnegie Mellon Univerity; free for independent learners, $25 for academic students)
- Introduction to Applied Statistics (Online Courses)
- Data Analysis (Coursera)
- Machine Learning (Stanford University via Coursera)
Data Science Certifications
- Data Science (John Hopkins University via Coursera; free without certificate, $475 with certificate)
- Data Analysis Nanodegree (Udacity; $200/month, 9–12 months)
Data Science Immersive Program
Data Science Degree Programs
- Professional Master of Information and Data Science at UC Berkeley
- MS in Data Science at NYU
- MS in Data Science at the University of St. Thomas
- Online MS in Data Science at the University of Wisconsin
- MS in Analytics at North Carolina State University
- MS in Analytics at Northwestern University
The list above should give you plenty to get started with. Once you’ve worked your way through the free resources, you can start looking at some field-specific things, like biostatistics, healthcare data analytics, or data analysis for security — there are a lot of resources that you can use without going back to school for a degree.
You can find courses on these topics in places like Coursera, Udacity, and even on YouTube. Going on to more advanced programming resources is also a good idea. There are tons of things out there for you to learn; you’ll just have to take some time to find the ones that are most applicable to you.
Here’s a short 1-minute inspiring video from Adobe on the life of a data scientist.
Do you want to be one? If you have any good resources to share for aspiring data scientists, please share them in the comments so that others interested in the field can take advantage of them!