About Data Science

“Data science is the science of generating knowledge or insights from data to solve business or research problems”

Data science and conventional
decision making

Data science is a data-driven approach to solving problems. Put is in comparison with conventional decision making, which often relies to a great extent on human experience and instinct. This is not to say that we should ignore human experience in data science, but rather combine it with mathematical models and data to train machines to find solutions. 

Data Science as a profession

Data science as a profession has become increasingly important over the last decade and the first time the term was introduced was in 2008 and it was around then that the designation of data scientist came into use as well. 

We could therefore say that data science is a fairly new profession, however let’s bear in mind that it is one that combines skills and knowledge from the long-standing disciplines of computing, mathematics and social sciences. Data science as a profession has grown dramatically in a short period of time and has proved its tremendous usefulness across business and society in general.

Data Science versus Data Analytics

Before data science emerged, there were already statistical analysts, data analysts and business analysts, for instance. So what’s the difference between these and a data scientist?  

Essentially, data analytics is a subset of data science and data science is much broader in scope. Data analytics is more specific, where trends or patterns are observed based on pre-defined problem statements, and inferences are then drawn based on what has happened in the past.   

Data science, on the other hand, looks at finding solutions to entire problems. It is an end-to-end process and it may involve discovering a new algorithm or a new model. There is a lot of discovery in data science, and this is one thing that makes it substantially different from data analytics. We might come up with a new model that you train your machine with, a machine learning model, and apply it to predict real life outcomes in the future. 

Where is Data Science used?

The first thing we should note is that data is everywhere. Basically, you can name an industry and there will be data. Also, there’s no doubt that the world’s most successful companies have been using data science to create competitive advantage.

Thanks to the internet, cloud computing and modern telecommunications, the problems of collecting and storing data on a massive scale have for the most part been solved. Hence the term, Big Data. 

The challenge now lies with organising it, analysing it and drawing inferences to find solutions, and this is where data science comes in. 

Banking and Finance

Function wise, there is a lot of maturity in the field of risk management, for instance. It could be financial risk, or risk in the insurance domain, or risk related to fraudulent activities.   

These are areas where data science is already used substantially. Identifying credit card fraud, or which customers are likely to default on a loan are common applications, and data science is used to predict this, so that risk is minimised. Applying data science here has been very successful in comparison with more traditional approaches to decision-making and, overall, data science has worked very well in finance globally. most leading banks, insurance providers and other financial institutions have data science solutions in place. Fintech, that is the application of technology to financial services, is a rapidly growing field and data science is at the heart of this.

Marketing

Another function where data science has been very successful is marketing. Areas including customer acquisition, identifying which customers are likely to leave, cross-selling and upselling all use data science to build models.

To give an example, cross-selling was traditionally very subjective. Someone would decide which customer segment to sell to, based on categories such as age group, gender or geographic location. You would then find that segment and cold call to sell your products.   

This traditional approach had limited success, and there are now much more complex models. Data on customer behaviour for the last, say, one or two years can now be captured easily.  

Data science then adds several complex variables in a machine learning model, so that the right customer is targeted at the right time. As a result, this kind of model has increased the probability of success dramatically. 

Taking this a little further, we now have something called a recommendation engine, which has become an integral part of marketing. What this does is look at the transaction behaviour of a specific customer and then builds a model that enables you to make them very targeted and personalised offers.

Human Resources

Human resources is another area where data science is being increasingly used. Traditionally, HR carried out activities such as candidate screening, job interviews, exit interviews and so on. Now there’s a focus on data to understand employee feedback and sentiment. HR is fast becoming data-driven, which wasn't the case even just 5 years ago.  

For example, HR can use data science to match the characteristics of a candidate with the characteristics of past employees to predict if they are likely to succeed in an organisation.

Sports

Then there is sports analytics. In the last few years there has been tremendous growth in this area. Most of the world’s top sports leagues, whether that’s football, baseball, cricket or rugby, for example, use data science. Data is available at the player level, the match level and the season level.    

Even in-match strategies are now developed using data science. Coaches and analysts sit with computers throughout a game and adapt their team’s strategy in real time with models and the data fed to them.  

Data Science famously contributed successfully to Germany’s victory at the football world cup in 2014, when individual and team data was analysed and used to adapt the team’s training practises and match tactics that enabled a huge improvement in performance.

Education

Next there’s Elearning, another new area. For example, universities and schools are using learning analytics to understand when students might need early intervention, whether they need extra support or to identify likely dropouts or failures. 

Pharmaceutical Industry

A very important and topical domain where data science is used is the pharmaceutical sector. Even though data Science is a relatively new term, big data and statistical analysis has been used in the Pharma industry since at least the 1990s.   

Biostatistics has been an integral part of clinical and pharma research, and because it is governed by regulators it is a mandatory part of the research process. Every clinical trial report must have a lot of biostatistics, hypothesis testing and statistical models in it.    

Data science was critical in the research for Covid vaccines, where there has been a huge amount of statistical analysis. Even the sample size of patients and volunteers used for research is determined using data science.

Skills and Knowledge needed by Data Scientists

Data science is a very exciting field to be involved in, so what skills and knowledge are actually required to be a data scientist.

Maths and Statistics

The number one skill is definitely statistics and mathematics. There is no substitute for this. You don't need to have done theoretical statistics or mathematical studies, but you at least need to learn the basic concepts and practical aspects of these two subjects. 

Data science isn’t just coding or programming, and if you don't have maths or statistical knowledge, you won't know which algorithms to use in a given situation.

Programming

The number two skill is programming. This is required because of the volume and variety of data we deal with, and it is very difficult to develop models and find solutions without programming skills. 

The R and Python programming languages are the most commonly used, and people often ask which one they should choose to learn. R tends to be used when a project uses a lot of statistics, and Python is preferred when a lot of data management is needed.   

Generally speaking however, a well-rounded data scientist will be competent in both languages and use them appropriately as needed.

Business/Domain Knowledge

Thirdly, you will need domain or business knowledge, because you need to structure your data science solutions while bearing in mind the specifics of a particular domain.  

What factors do we need to consider when we're dealing with the banking sector as opposed to the human resources function, for instance? Obviously, the features or factors are going to be different, and initially a machine would not understand that.   

So you need to train a machine using human domain knowledge, so that going ahead, the machine will be able to remember the specific domain factors while it learns from the data given to it.

Communication Skills

Finally, there are communication skills. This is the ability to present findings to decision makers. You may be using very advanced data science, but you have to explain it in a very simple manner to the end user. Otherwise, a model might not get implemented. Therefore the skills of presenting data visually and explaining the research and findings with clear written and verbal communication is hugely important

There are many examples where good, but complicated models have not been implemented because the message about how useful they would be to the organisation was not communicated effectively.