The number one question asked by people who attend our webinars & in-person events that has now been asked too many times to not be turned into a blog post: what are the four skills necessary to become a data scientist?
In any order of importance, they are: programming, statistics, communication, & domain expertise.
Statistics is the true basis to what is before the data: in data analysis, & what is after the data: machine learning. As a data scientist, you are expected to understand what certain metrics printed mean to your data, such as the p-value in hypothesis tests & how to assess too great of a RMSE error for your business question. But at the same time, a data scientist is not expected to know how to create more machine learning algorithms (though it helps) but more along the lines of how to identify when a certain algorithm will be best to use in the situation along with what is the best parameter to tweak if necessary. This also includes certain areas within statistics that brings more value to certain industries than others, like time series analysis vs geospatial analysis.
By far, the most overlooked skill in not just data science, but overall in the job community: communication. Is communicating your results to those who will make decisions on it: is it your manager or is the C-suite? Communication for data science is asking the right questions to who maintained this data: is it your in-house data engineer or was it an outsourced data? Communication for data science is speaking to those who will also give you more insight in your data than a simple Google search or physical book: is this correct for a certain stock price to jump up so suddenly or what was the reason for this jump? Communication is key to data science in all aspects of the project, the end, the middle, before the middle, the beginning, before the beginning, & even after it ends. Do not overlook communication as a data science skill.
theDevMasters considers this to be two to three years of experience in an industry, or enough time to know exactly what the data contains & what questions to ask about the data as soon as the industry to brought up. For example, without knowing all the facts behind the financial industries, why would we check if someone’s credit score is a reasonable number for their credit history or their credit balance? Or without experience with certain sections of the lungs, how would we know which area should we investigate first to consider potential tumors? Simple stated, data itself has a genre & if you know how to read that genre, you know what to expect in the story. This skill is what truly makes a data scientist great in their chosen domain: prior knowledge to understand all aspects of the problem before it arises.