Its an exciting time for data science. The field is new, but growing quickly. Theres huge demand for data scientists average compensation in SF is well north of 100 thousand dollars a year. Where theres money, there are also people trying to earn it. The data science skills gap means that many people are learning or trying to learn data science.
The first step to learning data science is usually asking how do I learn data science?. The response to this question tends to be a long list of courses to take and books to read, starting with linear algebra or statistics. I went through this myself a few years ago when I was learning. I had no programming background, but knew that I wanted to work with data.
I cant fully explain how immensely unmotivating it is to be given a huge list of resources without any context. Its akin to a teacher handing you a stack of textbooks and saying read all of these. I struggled with this approach when I was in school. If I had started learning data science this way, I never would have kept going.
Some people learn best with a list of books, but I learn best by building and trying things. I learn when Im motivated, and when I know why Im learning something. Best of all, when you learn this way, you come out with immediately useful skills. From my conversations with new learners over the years, I know many share these views.
Thats why I dont think your first goal should be to learn linear algebra or statistics. If you want to learn data science, your first goal should be to learn to love data. Interested in finding out how? Read on to see how to actually learn data science.
An example of the visualizations you can make with data science (via The Economist)
Nobody ever talks about motivation in learning. Data science is a broad and fuzzy field, which makes it hard to learn. Really hard. Without motivation, youll end up stopping halfway through and believing you cant do it, when the fault isnt with you its with the teaching.
You need something that will motivate you to keep learning, even when its midnight, formulas are starting to look blurry, and youre wondering if this will be the night that neural networks finally make sense.
You need something that will make you find the linkages between statistics, linear algebra, and neural networks. Something that will prevent you from struggling with the what do I learn next? question.
My entry point to data science was predicting the stock market, although I didnt know it at the time. Some of the first programs I coded to predict the stock market involved almost no statistics. But I knew they werent performing well, so I worked day and night to make them better.
I was obsessed with improving the performance of my programs. I was obsessed with the stock market. I was learning to love data. And because I was learning to love data, I was motivated to learn anything I needed to make my programs better.
Not everyone is obsessed with predicting the stock market, I know. But its important to find that thing that make you want to learn.
It can be figuring out new and interesting things about your city, mapping all the devices on the internet, finding the real positions NBA players play, mapping refugees by year, or anything else. The great thing about data science is that there are infinite interesting things to work on its all about asking questions and finding a way to get answers.
Take control of your learning by tailoring it to what you want to do, not the other way around.
A map of all the devices on the internet
Learning about neural networks, image recognition, and other cutting-edge techniques is important. But most data science doesnt involve any of it:
What all of this means is that the best way to learn is to work on projects. By working on projects, you gain skills that are immediately applicable and useful. You also have a nice way to build a portfolio.
One technique to start projects is to find a dataset you like. Answer an interesting question about it. Rinse and repeat.
Here are some good places to find datasets to get you started:
Another technique (and my technique) was to find a deep problem, predicting the stock market, that could be broken down into small steps. I first connected to the yahoo finance API, and pulled down daily price data. I then created some indicators, like average price over the past few days, and used them to predict the future (no real algorithms here, just technical analysis). This didnt work so well, so I learned some statistics, and then used linear regression. Then I connected to another API, scraped minute by minute data, and stored it in a SQL database. And so on, until the algorithm worked well.
The great thing about this is that I had context for my learning. I didnt just learn SQL syntax I used it to store price data, and thus learned 10x as much as I would have by just studying syntax. Learning without application isnt going to be retained very well, and wont prepare you to do actual data science work.
This guys trying to predict the stock market, but needs some data science, apparently (via DailyMail)
Data scientists constantly need to present the results of their analysis to others. Skill at doing this can be the difference between an okay and a great data scientist.
Part of communicating insights is understanding the topic and theory well. Another part is understanding how to clearly organize your results. The final piece is being able to explain your analysis clearly.
Its hard to get good at communicating complex concepts effectively, but here are some things you should try:
Its amazing how much you can learn from working with others. In data science, teamwork can also be very important in a job setting.
Some ideas here:
Are you completely comfortable with the project youre working on? Was the last time you used a new concept a week ago? Its time to work on something more difficult. Data science is a steep mountain to climb, and if you stop climbing, its easy to never make it.
If you find yourself getting too comfortable, here are some ideas:
This is less a roadmap of exactly what to do that it is a rough set of guidelines to follow as you learn data science. If you do all of these things well, youll find that youre naturally developing data science expertise.
I generally dislike the heres a big list of stuff approach, because it makes it extremely hard to figure out what to do next. Ive seen a lot of people give up learning when confronted with a giant list of textbooks and MOOCs.
I personally believe that anyone can learn data science if they approach it with the right frame of mind.
Im also the founder of Dataquest, a site that helps you learn data science in your browser. It encapsulates a lot of the ideas discussed in this post to create a better learning experience. You learn by analyzing interesting datasets like CIA documents and NBA player stats. You also complete projects and build a portfolio. Its not a problem if you dont know how to code we teach you python. We teach python because its the most beginner-friendly language, is used in a lot of production data science work, and can be used for a variety of applications.
As I worked on projects, I found these resources helpful. Remember, resources on their own arent useful find a context for them:
This post is adapted from my Quora answer on how to become a data scientist.