Skip to content

Introduction to Data Analytics

Learning Objectives

After this unit, students should be able to

  • describe the concept of knowledge discovery.
  • describe the DIKW data model
  • identify an analytics problem.
  • identify type of an analytics problem.

Data analytics is the use of data, statistical analyses, quantitative methods, and mathematical models to help administrators gain improved insights and make better decision1. Business studies tend to emphasise the domain knowledge while analysing the data. Often, the domain knowledge is not intrinsic to the data and requires moderation by the domain experts. Computer scientists tend to favour a parsimonious approach of Knowledge Discovery, which is the non-trivial extraction of implicit, previously unknown and potentially useful information from the data2.

Both approaches share a common goal: uncovering hidden patterns in data to facilitate improved decision-making. This process mirrors the progression of an analyst from the base of the DIKW pyramid (shown in the following figure3) to its apex.

  • Data is a collection of raw and unprocessed evidences.
  • Information is organised, refined data put into the context of the desired use case.
  • Knowledge is the result of knowledge discovery process.
  • Wisdom is the ability to make data-drive decisions.

DIKW

Example. Imagine a scenario where a grocery store engages us to offer data-driven assistance for decision-making. We begin by examining the database of customer receipts over the past six months. We extract essential information from the receipts such as the product name. Employing techniques like frequent pattern mining, we identify items frequently purchased together. Guided by these insights, the store can then make informed decisions regarding product placement or bundled promotions.

Mathematical models4 help us climb up the pyramid from data to wisdom. Every step can be tied to a building blocks in the analytics pipeline presented in the Overview. For instance the step from data to information is part of the data preprocessing whereas the step from information to knowledge is made possible by the use of mathematical models. For majority of the course, we stop at this level of the pyramid. In a business setting, we often want to maximise profit or minimise the operating based on the knowledge discovered from the data. Thus, the step from knowledge to wisdom requires constraints that may not be intrinsic to the data.

Danger

The knowledge discovery without sound mathematical analyses can go wrong and produce absurd results. For instance:

  • Studying the impact of ice cream sales on a beach using the datasets about shark attacks (or sunburns).
  • Finding patters that do not generalise over unseen data.
  • Finding patterns in the dirty dataset.

We will learn in the further lessons how each of these hazards significantly affect the analysis.

Which problems are not analytics problems?

Not all data-driven problems can be classified as the analytics problems. Consider the following problems:

  • Searching for the contact number of the person in a catalogue.
  • Finding the flight information for SQ26.
  • Computing income tax for the specified income.

Knowledge, in each of the listed problem, can be discovered either by a lookup in the appropriate database or by the application of established rules. Although these are interesting problems, we do not consider them as analytics problems.

Types of Analytics

Business analytics solutions, i.e. data analytics problems put in the context to solve business problems, typically fall under one of the four categories. They are as follows:

Descriptive Analytics

Descriptive analytics elucidates past occurrences, providing businesses with insights into their historical data through scientific analysis. This primarily involves generating reports, visualizations, and interactive dashboards to comprehend their standing. While descriptive analytics clarifies what transpired in the past, it doesn't delve into the reasons why those events occurred.

Diagnostic Analytics

Diagnostic analytics, also referred to as root cause analysis, delves into the reasons behind past events. Often necessitating the expertise of domain specialists, diagnostic analytics assists businesses in refining strategies and steering clear of past errors. However, its major limitation lies in scalability. Conducting detailed analyses on large-scale data poses a significant challenge.

Predictive Analytics

Predictive analytics utilizes historical data alongside machine learning models to forecast future results. Effective implementation demands a thorough comprehension of the prediction task to employ suitable statistical and machine learning techniques. The accuracy of predictions significantly hinges on the quality of historical data. However, a notable drawback is the analysis's restriction to the available data, often disregarding human, socio-economic, and environmental influences.

Prescriptive Analytics

Prescriptive analytics gathers insights from all three kinds of analytics and provides actionable insights. It is a complex process that involves multiple variables and constraints to provide suggestion for an appropriate course of action. While akin to diagnostic analytics in answering why, prescriptive analytics not only leverages past data but also integrates results from predictive analytics and domain expertise.

An example in the retail analytics that covers all these types is given as follows:

  • Descriptive Analytics. Distribution of sales of various items in a store.
  • Diagnostic Analytics. Why were the sales of ice-creams low in December?
  • Predictive Analytics. Use customer's purchasing history to make personalised recommendations.
  • Prescriptive Analytics. Discovering the features needed to maximise profit through personalised recommendations.

In this course, our focus will be on descriptive and predictive analytics.


  1. "Business Analytics: Methods, Models, and Decisions" by James R. Evans 

  2. Frawley, William J., Gregory Piatetsky-Shapiro, and Christopher J. Matheus. "Knowledge discovery in databases: An overview." AI magazine 13.3 (1992): 57-57. 

  3. The figure is taken from the lecture slides of CS5228. 

  4. In this course, we use the umbrella term mathematical models to refer to statistical analyses, machine learning models as well as optimisation models.