Towards personalised medicine: subtyping patients using their genomic data

17 December 2020

Data Science & Business Analytics, Department of Information Systems & Analytics, Faculty, Feature, Healthcare Informatics

Vaibhav Rajan

Assistant Professor

Information Systems and Analytics

SHARE THIS ARTICLE

Most pundits gazing into the crystal ball will likely shout two words in their prediction of healthcare’s future: precision medicine. Increasingly, there is growing recognition that tailoring treatments based on an individual’s lifestyle, genes, and environmental factors can yield much improved outcomes.

Majority of therapy options today adopt a one-size-fits-all approach, but this is hardly optimal given the vast differences that exist from patient to patient, despite them having the same disease. These differences can significantly impact how the disease develops and progresses in each patient, as well as how responsive they are to various drugs. Measuring these factors and using them to slot patients into different sub-populations can have a significant impact on their treatment outcomes.

“When we have a particular person’s data, such as their molecular or genetic information, we may be able to see what subtype or group he or she belongs to. Then we can potentially make things more personalised,” says Vaibhav Rajan, an assistant professor from NUS Computing, who studies healthcare informatics. Doctors can use this information to determine treatment strategies. “They may know that for this group, these drugs work well and for a different group, another set of drugs work better,” he says.

Today’s breast cancer treatments are perhaps the furthest we have come to realising the goal of precision medicine. There are four to five well-accepted breast cancer subtypes. Patients have their genes analysed to determine which subtype they have and doctors devise a treatment strategy based on the results. A woman who has triple-negative breast cancer, for example, is likely to benefit from chemotherapy. In contrast, a woman with luminal A breast cancer, a different subtype, is likely to be put on hormone therapy in addition to chemotherapy.

Clustering with copulas

Determining what the subgroups of a particular disease is typically begins with studying the biomedical data of patients. The discovered groups are called subtypes. Patient subtyping is especially useful when it comes to tackling diseases like cancer. Not only does a multitude of factors affect how and where a tumour grows, but cancer itself is ever evolving. “It’s not a static thing,” says Rajan. “The cancer tissue can also have different clusters that can evolve as time goes by.”

To carry out the kind of patient subtyping Rajan describes, researchers use a technique called clustering. This involves the use of computer algorithms to statistically analyse large amounts of biomedical data and identify patterns within them — in other words, clustering patients into subtypes based on the characteristics they share.

There are many different kinds of clustering algorithms. Model-based clustering algorithms assume an underlying statistical model (e.g. Gaussian mixture distribution) for the data and then attempt to infer model parameters such as the distribution mean from the data. These algorithms are commonly used because of the interpretability they offer. For instance, after fitting the model, one can “see” how different clinical variables are correlated within each cluster. This allows us to understand the subtypes in greater detail and characterise them based on the statistics of the clinical variables.

However, most standard model-based clustering algorithms make the simplifying assumption that the variables in question must have the same type of data distribution — in other words, that they must all be Gaussian, exponential, and so on. “But this may restrict their modeling flexibility and deteriorates their clustering performance,” says Rajan.

To overcome this limitation, Rajan and his group have been exploring statistical tools called copulas. Meaning “tie” or “link” in Latin, copulas are used to describe the dependence between random variables. Using copulas allow greater flexibility in modeling the data because it enables distinct assumptions to be made about the different distributions of different clinical variables.

In addition, copulas enable us to model complex, non-linear correlations in the data that many other simpler models do not. For instance, consider two variables that are highly correlated at their lower values but uncorrelated in higher values. Copulas can be effectively used in such cases to model the strength and type of correlations. Thus, copulas are highly flexible statistical tools that are useful for modeling complex clinical and genomic data.

However, copula-based mixture models could not be used with modern high-dimensional data, which can contain up to thousands of clinical variables. This is due to technical limitations that adversely affect their accuracy and scalability.

To overcome this problem, Rajan — together with his PhD student Siva Rajesh Kasa and collaborator Sakyajit Bhattacharya from the TCS Innovations Labs in India — worked to develop a new inference algorithm for a copula-based clustering model that in turn can be used effectively to find subtypes from high-dimensional clinical data. In 2019, they published a paper in the journal Bioinformatics announcing their new technique: HD-GMCM or high dimensional-Gaussian mixture copula model.

“We have a specific way of finding intrinsic patterns in the data and we are able to do it at high dimensions, which is important for a lot of biomedical datasets, especially in cancer,” says Rajan. “Our model is set up in such a way that it reduces the number of parameters to be inferred, without adversely affecting the clustering accuracy.”

“To our knowledge, nobody has ever used copulas for patient subtyping before,” he says.

Helping real patients
To test how robust GMCM is, Rajan and the team applied it to a number of real gene-expression datasets, as well as a simulation study. They also used it in a case study to characterise lung cancer patients into different subtypes, and examined the survival rates of those in each cluster, and the pair-wise correlations among survival rate, smoking and age.

In all instances, the new method outperformed state-of-the-art clustering methods. Not only did it lead to better clustering, but also potential subtypes that were clinically meaningful. “For the case study, there was a statistically significant difference in the survival probability across the two clusters that the algorithm discovered,” says Rajan. “This implies that the clusters HD-GMCM found has potential clinical significance.”

His team is now working to improve the algorithm further. They are also working closely with oncologists in Singapore to develop more personalised treatment strategies.

“What I would like to have, and I’m trying to set it up in my group, is the entire research pipeline, says Rajan. “From the development of new models and algorithms to innovative, practically useful healthcare applications, and finally, actual implementations that may be used in hospitals.”

Paper: Gaussian mixture copulas for high-dimensional clustering and dependency-based subtyping

Trending Posts

4 September 2020

Bringing video games to life

Your heartbeat quickens as you watch your video game avatar run through the twisting corridors of the castle. There is still treasure to be found and a hostage to be ...

25 July 2019

Building a Vibrant Innovation Ecosystem

From driverless cars to life-saving medical devices and everything in between, the technologies of the future not only promise to change the world, but also to create high-paying jobs and ...

13 December 2024

Exploring DiffPath: A Revolutionary Approach to Detecting Out-of-Distribution Data with AI

In the world of artificial intelligence (AI), one major challenge is teaching models to recognise when they encounter something they’ve never seen before—known as out-of-distribution (OOD) data. Imagine training a ...

4 June 2025

Seeing Safety: How Augmented Reality Could Transform Drone Inspections Forever

Associate Professor Ooi Wei Tsang and his team at NUS Computing have developed SafeSpect, an adaptive augmented reality system that enhances drone pilots' situational awareness for drone inspections. ...

12 March 2020

Humans, Robots, and the Trust that binds them

Like so many parts along the Californian coast, Honda Point is breathtakingly beautiful. People go to visit, but when they do, it’s not for the views. ...

10 June 2022

Want to make a good app? Update often and get customers involved

Modern-day learners have a wealth of “teachers” to turn to: online books, e-learning courses, YouTube tutorials, and even smartphone apps. If, for instance, you are yearning to lead a more ...

20 April 2022

Walk, Watch, Learn: On-the-go video learning

As COVID crept across the world, confining people to their homes and chaining them to their desks — for work, school, and play — Zhao Shengdong was no exception. Involved ...

27 March 2023

New Algorithm Revolutionises a Decades-Old Estimation Problem

When Covid-19 came barrelling through the world, it upended nearly every aspect of our lives, forcing us to live, work, and play in completely new ways. We became accustomed to ...

28 January 2020

Lost in masses of clinical data? Help is here

The intensive care unit where Dr. Jean-Daniel Chiche works in Paris is what you would expect from an ICU. Amidst an atmosphere of respectful quiet and hushed tones lie patients ...

8 February 2019

Sing Your Way to Language Success

Have you ever struggled to learn a new language? Maybe you should spend less time trying to speak it and start singing instead! ...

4 June 2025

A New Grasp on Robotics: Teaching Robots to Hold the Future

A new framework developed by NUS Computing’s Asst Prof Shao Lin and collaborators brings robots closer to human-like dexterity, overcoming a key barrier in robotic grasping. ...

27 January 2026

The Hidden Influence of Touch: How Our Digital Interfaces Quietly Reshape the Way We Think

You probably don’t even notice it anymore. One moment you’re scrolling through Instagram on your phone, the next you’re clicking a mouse on your laptop to finish a work document. ...

19 June 2025

Breaking the Bottleneck: Making Zero-Knowledge Proofs Practical at Scale

Explore how scalable collaborative zk-SNARKs enable fast, secure zero-knowledge proofs across multiple servers. This breakthrough improves privacy and scalability for AI verification, blockchain, and data markets, making advanced cryptography more ...

9 July 2022

Mastering the beast that is the company-wide IT system

Anyone who’s part of an organisation, big or small, will likely be familiar with a company-wide IT system of some sort. It’s the boon and bane of many an employee’s ...

11 October 2018

Online Shopping and the Science of Serendipity: NUS Computing Researcher Jack Jiang on Product Search in Social Commerce

Have you ever gone to an e-commerce website with the intention of buying one specific thing, but then ended up with something totally different? ...

28 April 2022

Explainable AI gets more human-centric — thanks to cognitive psychology

Imagine if Amazon Alexa could recommend a tub of ice cream or Siri could play a cheerful song if they hear sadness in your voice. AI voice recognition can now ...

26 May 2026

The Gut-Brain Hack: How Vibrations Can Reshape the Way We Feel Hunger

You've felt it before. That low, insistent rumble in your belly during a long meeting. The unsettling churn before a nerve-wracking presentation. We call these "gut feelings" – and while ...

22 October 2024

COACHing the Future: How Robots Are Revolutionising Learning

Unlocking the Future of Learning: Robots as Your Personal Coaches Imagine a world where robots don’t just assist with daily tasks but actively help you learn new skills. We’re talking ...

26 June 2026

Practising What They Post: How Health Platforms Are Changing the Way Doctors Practise

Think about the last time you looked up a doctor online. Maybe you checked their credentials, read a few patient reviews, or noticed that they had answered hundreds of questions ...

14 February 2025

DiversiNews Helps Increase News Exposure and Broaden People’s Minds

Those of us who don’t belong to Gen Z or Alpha may recall simpler times, when there were no smartphones or internet, and the only news we got was via ...

25 November 2019

Making Bitcoin Safer — By Breaking It

In Greek mythology, Erebus is the primeval god of darkness, son of Chaos. It’s also the region of the underworld, where souls pass through after dying. The word is so ...

27 November 2017

Picture Perfect: How Two Guys Changed Drone Photography

It is the 28th of July 2016. A crowd is gathered around a marked off area near the entrance of Level 2, COM1. Among them are NUS President, Professor Tan ...

13 April 2023

Spotting concurrency bugs in software with sampling

In the summer of 1983, the government organisation Atomic Energy of Canada Limited launched its newest radiation therapy machine. The Therac-25 was highly anticipated — it boasted a revolutionary dual ...

22 April 2021

Reuse, Recycle…Recode

For an electronic device to ‘know’ what to do, computer programmers need to give it a set of instructions, called code. Writing software programmes can be an immense task — ...

30 April 2020

Human-centred explainable AI: Helping people to faithfully interpret machine learning with less mental effort

These days, artificial intelligence (AI) is everywhere we look. It’s what powers predictive searches on Google, enables Spotify and Amazon to recommend new songs and products, puts self-driving vehicles on ...

3 January 2023

So you have a dataset? Think about the values it’s missing

Imagine that you’re a book publisher gathering feedback for a new novel that your firm has recently released. Sales figures are useful, but you’re keen to find out more about ...

17 November 2021

How understanding supermarket checkout queues can help smooth video streaming

Technology has been a boon to our lives in so many ways. At dinner with friends and can’t agree who Jennifer Aniston is currently married to? A couple of taps ...

1 March 2025

Revolutionising 3D Modelling with Tetsphere Splatting: A New Era of Digital Geometry

Explore how a groundbreaking technique developed by NUS Computing’s Assistant Professor Wang Bohan is set to transform digital geometry. Tetsphere Splatting, recently presented at ICLR 2025, uses virtual clay-like spheres ...

1 June 2020

Vanquishing smartphone zombies with EYEditor

If you have been to parts of Orchard Road or Bugis Junction, two busy shopping streets in Singapore, you might have noticed something unusual. There, familiar “traffic light men” flash ...

4 May 2025

Unlocking the True Potential of Enterprise Systems: Why User Behavior Matters More Than You Think

A new study by NUS Computing’s Assoc Prof Tan Chuan Hoo reveals how leadership, user mindset, and system design determine whether enterprise systems are used effectively—or fail despite good technology. ...

Towards personalised medicine: subtyping patients using their genomic data

SHARE THIS ARTICLE

Trending Posts

Programmes

ADMISSIONS

RESEARCH

DEPARTMENTS

RESOURCES

Programmes

ADMISSIONS

RESEARCH

DEPARTMENTS

RESOURCES