About Data Science
“Data science is the science of generating knowledge or insights from data to solve business or research problems”
Data Science versus Data Analytics
Where is Data Science used?
Banking and Finance
Function wise, there is a lot of maturity in the field of risk management, for instance. It could be financial risk, or risk in the insurance domain, or risk related to fraudulent activities.
These are areas where data science is already used substantially. Identifying credit card fraud, or which customers are likely to default on a loan are common applications, and data science is used to predict this, so that risk is minimised. Applying data science here has been very successful in comparison with more traditional approaches to decision-making and, overall, data science has worked very well in finance globally. most leading banks, insurance providers and other financial institutions have data science solutions in place. Fintech, that is the application of technology to financial services, is a rapidly growing field and data science is at the heart of this.
Marketing
Another function where data science has been very successful is marketing. Areas including customer acquisition, identifying which customers are likely to leave, cross-selling and upselling all use data science to build models.
To give an example, cross-selling was traditionally very subjective. Someone would decide which customer segment to sell to, based on categories such as age group, gender or geographic location. You would then find that segment and cold call to sell your products.
This traditional approach had limited success, and there are now much more complex models. Data on customer behaviour for the last, say, one or two years can now be captured easily.
Data science then adds several complex variables in a machine learning model, so that the right customer is targeted at the right time. As a result, this kind of model has increased the probability of success dramatically.
Taking this a little further, we now have something called a recommendation engine, which has become an integral part of marketing. What this does is look at the transaction behaviour of a specific customer and then builds a model that enables you to make them very targeted and personalised offers.
Taking this a little further, we now have something called a recommendation engine, which has become an integral part of marketing. What this does is look at the transaction behaviour of a specific customer and then builds a model that enables you to make them very targeted and personalised offers.
Human Resources
Human resources is another area where data science is being increasingly used. Traditionally, HR carried out activities such as candidate screening, job interviews, exit interviews and so on. Now there’s a focus on data to understand employee feedback and sentiment. HR is fast becoming data-driven, which wasn't the case even just 5 years ago.
For example, HR can use data science to match the characteristics of a candidate with the characteristics of past employees to predict if they are likely to succeed in an organisation.
Sports
Then there is sports analytics. In the last few years there has been tremendous growth in this area. Most of the world’s top sports leagues, whether that’s football, baseball, cricket or rugby, for example, use data science. Data is available at the player level, the match level and the season level.
Even in-match strategies are now developed using data science. Coaches and analysts sit with computers throughout a game and adapt their team’s strategy in real time with models and the data fed to them.
Data Science famously contributed successfully to Germany’s victory at the football world cup in 2014, when individual and team data was analysed and used to adapt the team’s training practises and match tactics that enabled a huge improvement in performance.
Education
Next there’s Elearning, another new area. For example, universities and schools are using learning analytics to understand when students might need early intervention, whether they need extra support or to identify likely dropouts or failures.
Pharmaceutical Industry
A very important and topical domain where data science is used is the pharmaceutical sector. Even though data Science is a relatively new term, big data and statistical analysis has been used in the Pharma industry since at least the 1990s.
Biostatistics has been an integral part of clinical and pharma research, and because it is governed by regulators it is a mandatory part of the research process. Every clinical trial report must have a lot of biostatistics, hypothesis testing and statistical models in it.
Data science was critical in the research for Covid vaccines, where there has been a huge amount of statistical analysis. Even the sample size of patients and volunteers used for research is determined using data science.
Skills and Knowledge needed by Data Scientists
Maths and Statistics
The number one skill is definitely statistics and mathematics. There is no substitute for this. You don't need to have done theoretical statistics or mathematical studies, but you at least need to learn the basic concepts and practical aspects of these two subjects.
Data science isn’t just coding or programming, and if you don't have maths or statistical knowledge, you won't know which algorithms to use in a given situation.
Programming
The number two skill is programming. This is required because of the volume and variety of data we deal with, and it is very difficult to develop models and find solutions without programming skills.
The R and Python programming languages are the most commonly used, and people often ask which one they should choose to learn. R tends to be used when a project uses a lot of statistics, and Python is preferred when a lot of data management is needed.
Generally speaking however, a well-rounded data scientist will be competent in both languages and use them appropriately as needed.
Business/Domain Knowledge
Thirdly, you will need domain or business knowledge, because you need to structure your data science solutions while bearing in mind the specifics of a particular domain.
What factors do we need to consider when we're dealing with the banking sector as opposed to the human resources function, for instance? Obviously, the features or factors are going to be different, and initially a machine would not understand that.
So you need to train a machine using human domain knowledge, so that going ahead, the machine will be able to remember the specific domain factors while it learns from the data given to it.
Communication Skills
Finally, there are communication skills. This is the ability to present findings to decision makers. You may be using very advanced data science, but you have to explain it in a very simple manner to the end user. Otherwise, a model might not get implemented. Therefore the skills of presenting data visually and explaining the research and findings with clear written and verbal communication is hugely important
There are many examples where good, but complicated models have not been implemented because the message about how useful they would be to the organisation was not communicated effectively.