Lost in masses of clinical data? Help is here

28 January 2020

Data Science & Business Analytics, Department of Information Systems & Analytics, Faculty, Feature, Healthcare Informatics, Intelligent Systems, Research

Vaibhav Rajan

Assistant Professor

Information Systems and Analytics

SHARE THIS ARTICLE

The intensive care unit where Dr. Jean-Daniel Chiche works in Paris is what you would expect from an ICU. Amidst an atmosphere of respectful quiet and hushed tones lie patients in isolated rooms, often tethered to a bewildering array of tubes, wires, monitors and machines.

Alongside a team of doctors and nurses, the blinking, sometimes beeping, apparatus monitor critically ill patients round-the-clock. Every day, they gather between 7,000 to 8,000 data points from each patient, revealed Dr. Chiche at a health conference in June.

ICU data — an array of radiology scans, nursing notes, medication orders, lab investigations, and other measurements — form a treasure trove of information. Physicians can use it to predict possible complications, improve patient outcomes, and deliver better care; while researchers can gain a deeper understanding of diseases to develop new treatments.

But the mass of information gathered can be overwhelming. “The central question is how do we take all this data and learn a clinically meaningful and succinct computational representation of the patient, which can then be used for exploratory or predictive analytics?” asks Vaibhav Rajan, an assistant professor from the NUS School of Computing who studies machine learning and algorithm design for healthcare applications.

Rajan runs the Clinical Data Analytics Lab at NUS, where he and his students are working to take disparate sources of data — genomic, physiological and social — and use these to model health information of individuals. But using such heterogeneous data is fraught with obstacles.

The first challenge is alluded to in the term itself. Heterogeneous data can involve numerous sources that come in varying formats. An ICU patient’s vital signs may be measured continuously or periodically in the form of numbers and waveforms. Nurses scribble on charts, taking note of a patient’s appearance and condition (e.g. How alert is he? How much did he eat for lunch? How much pain is he in?). Then there are lab tests, radiology images, drug prescriptions and so on.

“Each of these data sources is telling us something about the patient, but there may be correlated information as well as errors of various kinds during data processing,” says Rajan. For example, an inflammation visible in a CT scan might be related to elevated counts of a certain chemical as revealed in the blood test. Clinical notes may be full of inconsistently used medical abbreviations that can be difficult to process automatically and may introduce errors while learning patient representations.

“If we take all these different data sources and naively combine them, then the information about each patient may be too much for predictive algorithms” he says. This is especially true for image and text data, such as MRI scans and clinical notes. It’s a problem researchers call “the curse of dimensionality”.

“Moreover, without careful processing and removal of errors, subsequent analysis may give misleading results. So, we have to be able to combine them in a smart way so that errors and correlations are suitably recognised and handled to obtain the concise representations with which effective predictive models can be developed,” he says.

Integrating multiple sources of clinical data: predicting ICU complications

Rajan’s previous projects have demonstrated the value of such integration for predicting unforeseen ICU complications.

For example, his team invented a novel binary classification method to identify ICU patients at risk of experiencing an Acute Hypotensive Episode (AHE; when low blood pressure suddenly occurs and remains for a sustained period of time) using multiple vital signs. When tested on data from more than 4,500 patients, the classification method outperformed existing ones, identifying those at risk of AHE two hours in advance of onset with a 95 percent specificity and a sensitivity of close to 80 percent.

In another study, his team developed techniques to effectively preprocess clinical notes and combine them with other numerical clinical information. When applied to more than 700 patient records, this method successfully extracted discriminatory information from the notes, allowing physicians to identify patients at risk of postoperative acute respiratory failure up to days in advance with an overall accuracy of more than 80 percent.

Figuring it out with factorisation: from critical care to developing cancer drugs

While these previous studies had developed methods to integrate specific clinical data sources, Rajan and his team at the Clinical Data Analytics Lab are now developing more general factorisation-based techniques to integrate arbitrary collections of heterogeneous data for learning patient representations. These can be used in a wide variety of clinical applications.

Collective Matrix Factorisation (CMF) is one such technique that takes heterogeneous data in the form of matrices (which shows pairwise relational data between two entities, such as patients and genes) and analyses the relationships between them. This is done by factorising them to obtain low-dimensional representations. “Doing this gives you a concise representation of the entity you’re interested in, which makes it easier for predictive algorithms to handle.”

However, classical CMF is limited in the kinds of correlations it can model because clinical and genomic data may exhibit rather complex correlations. In order to enhance our ability to model such complexities, Rajan’s team has developed a neural version of CMF, a technique called Deep Collective Matrix Factorisation (DCMF), that leverages the strength of deep learning within the framework of CMF. His group was the first in the world to develop such a deep-learning architecture for CMF.

Numerous biological datasets contain information about the interaction among the same entities, such as similarity between genes in terms of their functions, or similarity between patients in terms of their disease progression. Recognising that CMF cannot be directly used with such data, Rajan’s team developed methods to transform data without distorting their information content, to make them amenable for analysis with CMF or DCMF.

With these methods, his team has improved our ability to effectively integrate heterogeneous sources of clinical data to obtain useful representations. The efficacy of these methods, over previous best methods, have been empirically demonstrated in predicting potential drug targets for cancer treatment, and in studying how certain genes are associated with various diseases.

Many research questions still remain open. For instance, clinical data can be at different temporal resolutions, with measurement frequency ranging from a few times during a hospital episode (such as for lab investigations) to continuous recordings like ECG. Effectively integrating such data remains a challenge. Another uphill task is finding complex dependency patterns across multiple sources of information that may yield insights on novel clinical associations. Rajan and his team are currently working on these and other related problems.

All over the world, in hospitals, labs as well as in our smartphones, we are collecting large amounts of data that can inform us about our health. This presents an unprecedented opportunity to study and gain a deeper understanding of diseases, develop new treatments and improve healthcare ecosystems. Rajan and his team aspire to develop effective computational techniques that can seamlessly integrate such multiple heterogeneous sources of information to sieve out the most useful elements required for a clinical application.

He says: “We are drowning in a deluge of data and we believe that our ability to use and make sense of this data for clinical applications will crucially depend on such algorithms.”

Paper:
A dual boundary classifier for predicting acute hypotensive episodes in critical care

Trending Posts

27 November 2017

Picture Perfect: How Two Guys Changed Drone Photography

It is the 28th of July 2016. A crowd is gathered around a marked off area near the entrance of Level 2, COM1. Among them are NUS President, Professor Tan ...

19 December 2019

Lost? Eyes in the sky can tell you where you are

No matter how many times you’ve flown, sitting at the window seat and watching the world shrink away from view as the plane takes off never seems to grow old. ...

29 July 2022

‘Hearing’ how you walk

In one scene from the hit TV series Star Trek, Dr Bones McCoy runs to the aid of his fallen crewmate, who lies strewn across a barren, other-worldly landscape. He ...

3 January 2023

So you have a dataset? Think about the values it’s missing

Imagine that you’re a book publisher gathering feedback for a new novel that your firm has recently released. Sales figures are useful, but you’re keen to find out more about ...

12 November 2019

Here’s to better apps for all of us

This is a scenario that’s probably familiar to many of us: You touch down at your long-awaited holiday destination, collect your luggage, and step outside the airport, raring to go. ...

10 June 2022

Want to make a good app? Update often and get customers involved

Modern-day learners have a wealth of “teachers” to turn to: online books, e-learning courses, YouTube tutorials, and even smartphone apps. If, for instance, you are yearning to lead a more ...

4 March 2025

Breaking the Bottleneck: Making Zero-Knowledge Proofs Practical at Scale

A team led by Asst Prof Zhang Jiaheng has developed a scalable, privacy-preserving way to generate zk-SNARKs—unlocking faster, secure proof generation across multiple machines. ...

19 February 2025

Finding the Fastest Route: How a New Algorithm is Revolutionizing Shortest Path Calculations

Finding the Fastest Route: How a New Algorithm is Revolutionizing Shortest Path Calculations Imagine you’re planning the fastest route to work, navigating through a city or even across a massive ...

2 May 2025

Building the Right Features: Rethinking Innovation in the App Economy

A new study published in Information Systems Research by NUS Computing Assistant Professor Aditya Karanam sheds light on how feature strategy influences app adoption in the competitive app market. ...

15 May 2025

Helping AI Helps Us Too: The Surprising Mental Health Benefits of Assisting Artificial Intelligence

A study led by Assistant Professor LEE Yi-Chieh and his team at the AI 4 Social Good Lab (AI4SG) at NUS Computing has uncovered a surprising finding that assisting even ...

21 December 2020

LiveSnippets: Writing on-the-go

In April 2018, Hyeongcheol Kim flew to Montreal for work. The young PhD student was excited — it was his first time in the Canadian city and the conference he ...

14 February 2025

DiversiNews Helps Increase News Exposure and Broaden People’s Minds

Those of us who don’t belong to Gen Z or Alpha may recall simpler times, when there were no smartphones or internet, and the only news we got was via ...

13 November 2018

Of beer and diapers, and other sale-boosting tricks

One of the most famous folklore in marketing and data mining goes like this: many years ago, Walmart noticed that on Fridays, men would head to the store, pick up ...

11 October 2018

Online Shopping and the Science of Serendipity: NUS Computing Researcher Jack Jiang on Product Search in Social Commerce

Have you ever gone to an e-commerce website with the intention of buying one specific thing, but then ended up with something totally different? ...

20 February 2025

The Future of AI in Software Development: How AutoCodeRover is Changing the Game

Artificial Intelligence is transforming every industry, and software development is no exception. AI has already made its mark through tools like GitHub Copilot, which assists developers in writing code by ...

27 March 2023

New Algorithm Revolutionises a Decades-Old Estimation Problem

When Covid-19 came barrelling through the world, it upended nearly every aspect of our lives, forcing us to live, work, and play in completely new ways. We became accustomed to ...

28 May 2019

What Bayesian Optimisation can teach us about baking better cookies and more

Mention “Bayesian Optimisation” to Professor Bryan Low Kian Hsiang and he begins to talk about baking cookies. That’s because to the uninitiated, concepts such as “distributed batch Gaussian process optimisation” ...

9 May 2019

Thinking Beyond STEM: A Lifelong Quest for Lifelong Learning

With science, technology, engineering and mathematics (STEM) skills in greater demand than ever before, many people see STEM education as a ticket to a successful and rewarding career. University graduates ...

27 November 2024

Transforming Stroke Recovery with AI and Music Therapy

Harnessing the Power of Music AI to Heal Stroke Patients Many people, including Wang Ye, can attest to the healing power of music. On bad days, it helps us relieve ...

30 December 2024

Unlocking the Power of High-Dimensional Simulations with STDE

In a world increasingly driven by artificial intelligence and complex computations, tackling the most challenging problems—from modeling galaxies to designing personalized medicine—requires innovation. One such breakthrough is the Stochastic Taylor ...

15 July 2021

Lost in masses of clinical data? Help is here

SHARE THIS ARTICLE

Trending Posts

Programmes

ADMISSIONS

RESEARCH

DEPARTMENTS

RESOURCES

Programmes

ADMISSIONS

RESEARCH

DEPARTMENTS

RESOURCES