What is Data Science?
Data Science is an interdisciplinary field about processes and systems to extract knowledge or insights from data in various forms, either structured or unstructured, which is a continuation of some of the data analysis fields such as statistics, data mining, and predictive analytics.
Data science utilizes data preparation, statistics, predictive modeling and machine learning to investigate problems in various domains such as agriculture, marketing optimization, fraud detection, risk management, marketing analytics, public policy, etc. It emphasizes the use of general methods such as machine learning that apply without changes to multiple domains. Machine learning is the result of good data science.
What Does A Data Scientist Do?
Data scientists use their data and analytical ability to find and interpret rich data sources; manage large amounts of data despite hardware, software, and bandwidth constraints; merge data sources; ensure consistency of datasets; create visualizations to aid in understanding data; build mathematical models using the data; and present and communicate the data insights/findings. They are often expected to produce answers in days rather than months, work by exploratory analysis and rapid iteration, and to get/present results with dashboards (displays of current values) rather than papers/reports, as statisticians normally do.
What Skills Are Needed?
The data scientist should have a good math/statistics background. Also a data scientist’s most basic, universal skill is the ability to write code.
A data scientist must be able to tell a story to stakeholders based on the data that they arrived at as a result of experimenting and testing. This data is a result of the code they wrote. Thus, they created their own programming tools for this experimenting and testing. So they are definitely scientists. And they should be extroverts too since they are presenting their data to others.
A data scientist needs to be able to code in Java, C (C, C++, C#), or likewise language. A person who can code these languages can learn Python pretty quickly. Python is the most popular data science programming tool since it has libraries supporting math, statistical analysis, and plotting. It’s used by mathematicians and analysts to prototype new functionality. R is a good programming tool for graphing and plotting.
Some of the best and brightest data scientists are PhDs in esoteric fields like ecology and systems biology. George Roumeliotis, the head of a data science team at Intuit in Silicon Valley, holds a doctorate in astrophysics. A little less surprisingly, many of the data scientists working in the business today were formally trained in computer science, math, or economics. They can emerge from any field that has a strong data and computational focus.
So the essential skills of a Data Scientist are:
- 1. Ability to Write Code in Java, C type languages, Python. R is a programming option too.
- 2. Degrees in a scientific field, computer science, math, economics or related.
- 3. Good with math and statistics.
- 4. A gregarious person with good communication skills
Senior Technical Project Manager