What are the four skills I need to be a Data Scientist?

The number one question asked by people who attend our webinars & in-person events that has now been asked too many times to not be turned into a blog post: what are the four skills necessary to become a data scientist?

In any order of importance, they are: programming, statistics, communication, & domain expertise.


Computer programming revolutionized the way we interpret results & the factor to drive those results being not only more accessible to us, but faster & more accurate is: programming. The mediums chosen by data scientists include Python, R, C, C+, C#, Scala, Spark, Hadoop, SQL, NoSQL, Java, Javascript, etc. & mediums for computation such as Amazon’s AWS, Microsoft Azure, Google’s Cloud Platform, followed by various software for results, like SPSS, EViews, Matlab, Alteryx, SAS, Tableau, etc. Which one should you focus on, is usually the next question. Programming is a tool for a very results-oriented job environment; find one language for your processes, find one software to automate it if possible, find one language for your analytical visuals, find one language for data clusters, find one medium for hyperthreaded results, find one software to display your results/reports, find one software for your own organizational purposes, etc. etc. The list goes on, but it is a tool to do your machine learning & any other processes that line up to what matters most: results.

Mastering Applied Data Science


Statistics is the true basis to what is before the data: in data analysis, & what is after the data: machine learning. As a data scientist, you are expected to understand what certain metrics printed mean to your data, such as the p-value in hypothesis tests & how to assess too great of a RMSE error for your business question. But at the same time, a data scientist is not expected to know how to create more machine learning algorithms (though it helps) but more along the lines of how to identify when a certain algorithm will be best to use in the situation along with what is the best parameter to tweak if necessary. This also includes certain areas within statistics that brings more value to certain industries than others, like time series analysis vs geospatial analysis.


By far, the most overlooked skill in not just data science, but overall in the job community: communication. Is communicating your results to those who will make decisions on it: is it your manager or is the C-suite? Communication for data science is asking the right questions to who maintained this data: is it your in-house data engineer or was it an outsourced data? Communication for data science is speaking to those who will also give you more insight in your data than a simple Google search or physical book: is this correct for a certain stock price to jump up so suddenly or what was the reason for this jump? Communication is key to data science in all aspects of the project, the end, the middle, before the middle, the beginning, before the beginning, & even after it ends. Do not overlook communication as a data science skill.

Domain Expertise

theDevMasters considers this to be two to three years of experience in an industry, or enough time to know exactly what the data contains & what questions to ask about the data as soon as the industry to brought up. For example, without knowing all the facts behind the financial industries, why would we check if someone’s credit score is a reasonable number for their credit history or their credit balance? Or without experience with certain sections of the lungs, how would we know which area should we investigate first to consider potential tumors? Simple stated, data itself has a genre & if you know how to read that genre, you know what to expect in the story. This skill is what truly makes a data scientist great in their chosen domain: prior knowledge to understand all aspects of the problem before it arises.



Data Scientist

Check our Next Webinars


Recent Post