Gregory Kang Ruey Lau

I am a PhD student in the School of Computing at NUS, advised by Bryan Kian Hsiang Low and supported by the AI Singapore-CNRS@Create Descartes Joint PhD Scholarship.

My research adopts a data-centric approach to tackling critical bottlenecks in the practical deployment of AI systems. I anchor my work around a basic question: what is the impact of each data point on model behavior? By developing principled methods in algorithmic data selection and data provenance, I aim to lay the data-centric foundations for autonomous AI systems capable of driving the next generation of scientific discovery.

Previously, I completed my Bachelor of Science in Physics and in Economics at MIT, where I had worked with Wolfgang Ketterle, Eric Hudson and Dave Donaldson . I also obtained my Master of Finance at MIT Sloan and Master of Business Administration at Quantic. Before starting my PhD, I was a policymaker in the Singapore government, leading efforts in diverse areas such as data strategy, labour market policy, industry development, and social policy. I also spent some time as an entrepreneur, working on tech start-ups focused on education and career development.

Here is my CV. Please reach out if you are interested in collaborating!

news

Jun 2, 2026	My co-first authored paper, Watershed: A Unified Benchmark for End-to-End Data Provenance Evaluation, got accepted to the ICML 2026 AI4Good Workshop workshop.
May 26, 2026	My co-first authored paper, TIGER: Bridging the Multimodal Reasoning-Access Gap via Modality Counterfactuals, got accepted to the ICML 2026 Foundations of Deep Generative Models Workshop (FoGen) workshop.
May 26, 2026	The paper Rethinking Bayesian Optimization for Co-Optimizing LLM Training Configurations which I co-authored has been accepted to the ICML 2026 Decision-making From Offline Datasets to Online Adaptation (DEMO) Workshop as an oral paper.
Jan 26, 2026	My co-first authored paper WaterDrum: Watermark-based Data-centric Unlearning Metric got accepted to ICLR 2026.
Jan 26, 2026	The paper DUET: Optimizing Training Data Mixtures via Feedback from Unseen Evaluation Tasks which I co-authored has been accepted to ICLR 2026.
Dec 25, 2025	My co-first authored paper README: Rapid Equation Discovery with Multimodal Encoders got accepted to the NeurIPS2025-AI4Science workshop.
Sep 20, 2025	The position paper Position Paper: Uncover Scaling Laws for Large Language Models via Inverse Problems which I co-authored is accepted to Findings of EMNLP 2025 .
Sep 20, 2025	My co-first authored paper, Dipper: Diversity in Prompts for Producing Large Language Model Ensembles in Reasoning tasks, got accepted to EMNLP 2025.
Sep 6, 2025	I am visiting the University of Washington from Sep-Dec 2025.
Jul 30, 2025	I received the NUS School of Computing Research Achievement Award, which is awarded to PhD students who have achieved outstanding research performance over the past academic year.
Jul 9, 2025	My co-first authored paper, README: Rapid Equation Discovery with Multimodal Encoders, got accepted to the ICML 2025 AI4Math Workshop workshop.
Jul 1, 2025	My co-first authored paper, Uncertainty Quantification for MLLM, got accepted to the ICML 2025 Workshop on Reliable and Responsible Foundaation Models (R2-FM’25) workshop.
Jun 11, 2025	My co-first authored paper, WaterDrum: Watermarking for Data-centric Unlearning Metric, got accepted to the ICML 2025 Workshop on Machine Unlearning for Generative AI (MUGen’25) workshop.
Jun 6, 2025	I am visiting the University of Oxford Department of Statistics from Jun-Aug 2025.
Apr 9, 2025	My co-first authored paper, PIED: Physics-Informed Experimental Design For Inverse Problems got accepted to the AI4X 2025 conference for oral presentation.
Mar 6, 2025	The paper DUET: Optimizing Training Data Mixtures via Feedback from Unseen Evaluation Tasks which I co-authored has been accepted to the ICLR 2025 Workshop on Data Problems for Foundation Models (DATA-FM).
Mar 5, 2025	My co-first authored paper, Uncertainty Quantification for MLLMs, got accepted to the ICLR 2025 Quantify Uncertainty and Hallucination in Foundation Models (QUESTION) workshop.
Jan 21, 2025	My co-first authored paper PIED: Physics-Informed Experimental Design for Inverse Problems got accepted to ICLR 2025.
Oct 18, 2024	I received the EMNLP 2024 D&I Award.
Oct 9, 2024	My co-first authored paper, Dipper: Diversity in Prompts for Producing Large Language Model Ensembles in Reasoning tasks, got accepted to the NeurIPS MINT 2024 workshop.
Sep 20, 2024	My co-first authored paper, Waterfall: Framework for Robust and Scalable Text Watermarking, got accepted to EMNLP 2024.
Sep 20, 2024	The position paper Data-centric AI in the Age of Large Language Models which I co-authored is accepted to Findings of EMNLP 2024.
Aug 5, 2024	I received the NUS School of Computing Research Achievement Award, which is awarded to PhD students who have achieved outstanding research performance over the past academic year.
Jul 26, 2024	PINNACLE was awarded the Best Paper award (out of 225 submissions) at the ICML2024 AI4Science workshop.
Jul 3, 2024	My co-first authored paper, Waterfall: Framework for Robust and Scalable Text Watermarking, got accepted to the ICML2024-FM-Wild workshop.
Jun 27, 2024	I was one of the 3 CS PhD students selected for the NUS School of Computing Teaching Fellowship Scheme award, which is given to those with excellent performance as a tutor.
Jun 19, 2024	My co-first authored paper, Protecting Text IP in the Era of LLMs with Robust and Scalable Watermarking, got accepted to the ICML2024-GenLaw workshop.
Jun 17, 2024	Two of my co-first authored papers got accepted to the ICML2024-AI4Science workshop: PINNACLE: PINN Adaptive ColLocation and Experimental points selection (oral) and PIED: Physics-Informed Experimental Design For Inverse Problems.
Jan 15, 2024	My co-first authored paper PINNACLE: PINN Adaptive ColLocation and Experimental points selection got accepted to ICLR 2024 for spotlight presentation.
Dec 22, 2023	I passed my PhD Qualifying Examinations.

selected works

NeurIPS

Quantum Bayesian Optimization

Zhongxiang Dai*, Gregory Kang Ruey Lau*, Arun Verma, Yao Shu, Bryan Kian Hsiang Low, and Patrick Jaillet

In Advances in Neural Information Processing Systems 2023, 2023

Abs arXiv Code Poster

Kernelized bandits, also known as Bayesian optimization (BO), has been a prevalent method for optimizing complicated black-box reward functions. Various BO algorithms have been theoretically shown to enjoy upper bounds on their cumulative regret which are sub-linear in the number T of iterations, and a regret lower bound of Ω(sqrtT) has been derived which represents the unavoidable regrets for any classical BO algorithm. Recent works on quantum bandits have shown that with the aid of quantum computing, it is possible to achieve tighter regret upper bounds better than their corresponding classical lower bounds. However, these works are restricted to either multi-armed or linear bandits, and are hence not able to solve sophisticated real-world problems with non-linear reward functions. To this end, we introduce the quantum-Gaussian process-upper confidence bound (Q-GP-UCB) algorithm. To the best of our knowledge, our Q-GP-UCB is the first BO algorithm able to achieve a regret upper bound of O(polylog T), which is significantly smaller than its regret lower bound of Ω(sqrtT) in the classical setting. Moreover, thanks to our novel analysis of the confidence ellipsoid, our Q-GP-UCB with the linear kernel achieves a smaller regret than the quantum linear UCB algorithm from the previous work. We use simulations, as well as an experiment using a real quantum computer, to verify that the theoretical quantum speedup achieved by our Q-GP-UCB is also potentially relevant in practice.
ICLR (Spotlight)ICML Workshop
(Best Paper)

PINNACLE: PINN Adaptive ColLocation and Experimental points selection

Gregory Kang Ruey Lau*, Apivich Hemachandra*, See-Kiong Ng, and Bryan Kian Hsiang Low

In 12th International Conference on Learning Representations (ICLR 2024), 2024

Abs arXiv PDF Code Poster

Physics-Informed Neural Networks (PINNs), which incorporate PDEs as soft constraints, train with a composite loss function that contains multiple training point types: different types of \textitcollocation points chosen during training to enforce each PDE and initial/boundary conditions, and \textitexperimental points which are usually costly to obtain via experiments or simulations. Training PINNs using this loss function is challenging as it typically requires selecting large numbers of points of different types, each with different training dynamics. Unlike past works that focused on the selection of either collocation or experimental points, this work introduces \textscPINN Adaptive ColLocation and Experimental points selection (\alg), the first algorithm that \emphjointly optimizes the selection of all training point types, while automatically adjusting the proportion of collocation point types as training progresses. \alg uses information on the interaction among training point types, which had not been considered before, based on an analysis of PINN training dynamics via the Neural Tangent Kernel (NTK). We theoretically show that the criterion used by \alg is related to the PINN generalization error, and empirically demonstrate that \alg is able to outperform existing point selection methods for forward, inverse, and transfer learning problems.
ICLRAI4X (Oral)

PIED: Physics-Informed Experimental Design For Inverse Problem

Apivich Hemachandra*, Gregory Kang Ruey Lau*, See-Kiong Ng, and Bryan Kian Hsiang Low

In 13th International Conference on Learning Representations (ICLR 2025), 2025

Abs arXiv

In many science and engineering settings, system dynamics are characterized by governing partial differential equations (PDEs), and a major challenge is to solve inverse problems (IPs) where unknown PDE parameters are inferred based on observational data gathered under limited budget. Due to the high costs of setting up and running experiments, experimental design (ED) is often done with the help of PDE simulations to optimize for the most informative design parameters (e.g., sensor placements) to solve such IPs, prior to actual data collection. This process of optimizing design parameters is especially critical when the budget and other practical constraints make it infeasible to adjust the design parameters between trials during the experiments. However, existing experimental design (ED) methods tend to require sequential and frequent design parameter adjustments between trials. Furthermore, they also have significant computational bottlenecks due to the need for complex numerical simulations for PDEs, and do not exploit the advantages provided by physics informed neural networks (PINNs) in solving IPs for PDE-governed systems, such as its meshless solutions, differentiability, and amortized training. This work presents Physics-Informed Experimental Design (PIED), the first ED framework that makes use of PINNs in a fully differentiable architecture to perform continuous optimization of design parameters for IPs for one-shot deployments. PIED overcomes existing methods’ computational bottlenecks through parallelized computation and meta-learning of PINN parameter initialization, and proposes novel methods to effectively take into account PINN training dynamics in optimizing the ED parameters. Through experiments based on noisy simulated data and even real world experimental data, we empirically show that given limited observation budget, PIED significantly outperforms existing ED methods in solving IPs, including for challenging settings where the inverse parameters are unknown functions rather than just finite-dimensional.
EMNLP

Waterfall: Framework for Robust and Scalable Text Watermarking of Original Text

Gregory Kang Ruey Lau*, Niu Xinyuan*, Hieu Dao, Chen Jiangwei, Foo Chuan Sheng, and Bryan Kian Hsiang Low

In 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP 2024), 2024

Abs arXiv Code

Protecting intellectual property (IP) of text such as articles and code is increasingly important, especially as sophisticated attacks become possible, such as paraphrasing by large language models (LLMs) or even unauthorized training of LLMs on copyrighted text to infringe such IP. However, existing text watermarking methods are not robust enough against such attacks nor scalable to millions of users for practical implementation. In this paper, we propose Waterfall, the first training-free framework for robust and scalable text watermarking applicable across multiple text types (e.g., articles, code) and languages supportable by LLMs, for general text and LLM data provenance. Waterfall comprises several key innovations, such as being the first to use LLM as paraphrasers for watermarking along with a novel combination of techniques that are surprisingly effective in achieving robust verifiability and scalability. We empirically demonstrate that Waterfall achieves significantly better scalability, robust verifiability, and computational efficiency compared to SOTA article-text watermarking methods, and also showed how it could be directly applied to the watermarking of code.
EMNLP

Dipper: Diversity in Prompts for Producing Large Language Model Ensembles in Reasoning tasks

Gregory Kang Ruey Lau*, Wenyang Hu*, Liu Diwen, Chen Jizhuo, See-Kiong Ng, and Bryan Kian Hsiang Low

In 2025 Conference on Empirical Methods in Natural Language Processing (EMNLP 2025), 2025

Abs arXiv PDF

Large Language Models (LLMs), particularly smaller variants, still struggle with complex reasoning tasks. While inference-time prompting can guide reasoning, existing methods often rely on sequential queries. Ensemble approaches offer a promising path to performance gains, especially given recent batch inference speed-ups. This work introduces DIPPER, a novel, training-free framework that transforms a single LLM into an effective inference-time ensemble. By feeding the model an optimized and diverse set of prompts in parallel, DIPPER elicits varied reasoning paths, leading to performance gains. We empirically demonstrate significant improvements on mathematical reasoning benchmarks, such as MATH, where a DIPPER ensemble of three Qwen2-MATH-1.5B instances (via parallel prompting of a single model) outperforms a larger Qwen2-MATH-7B model.
ICLR

WaterDrum: Watermarking for Data-centric Unlearning Metric

Xinyang Lu*, Xinyuan Niu*, Gregory Kang Ruey Lau*, Bui Thi Cam Nhung, Rachael Hwee Ling Sim, John Russell Himawan, Fanyu Wen, Chuan-Sheng Foo, and Bryan Kian Hsiang Low

In 14th International Conference on Learning Representations (ICLR 2026), 2026

Abs arXiv

Large language model (LLM) unlearning is critical in real-world applications where it is necessary to efficiently remove the influence of private, copyrighted, or harmful data from some users. Existing utility-centric unlearning metrics (based on model utility) may fail to accurately evaluate the extent of unlearning in realistic settings such as when the forget and retain sets have semantically similar content and/or retraining the model from scratch on the retain set is impractical. This paper presents the first data-centric unlearning metric for LLMs called WaterDrum that exploits robust text watermarking to overcome these limitations. We introduce new benchmark datasets (with different levels of data similarity) for LLM unlearning that can be used to rigorously evaluate unlearning algorithms via WaterDrum.
ICLR

DUET: Optimizing Training Data Mixtures via Feedback from Unseen Evaluation Tasks

Zhiliang Chen, Gregory Kang Ruey Lau, Foo Chuan Sheng, and Bryan Kian Hsiang Low

In 14th International Conference on Learning Representations (ICLR 2026), 2026

Abs PDF

The performance of an LLM depends heavily on the relevance of its training data to the downstream evaluation task. However, in practice, we do not have fine-grained knowledge of the data in the evaluation task (e.g., conversations between an LLM and a user are end-to-end encrypted). Hence, it is unclear what data is relevant for fine-tuning the LLM. Instead, we can only deploy the LLM on the unseen task to gather multiple rounds of coarse, noisy feedback on how well the model performs (e.g., user ratings). Our paper presents DUET, a novel global-to-local algorithm that optimizes training data mixtures by interleaving data selection with Bayesian optimization to exploit coarse and noisy feedback from a downstream evaluation task. DUET is flexible enough to incorporate different data selection methods, each with different performance-compute tradeoffs. By analyzing DUET’s cumulative regret, we theoretically show that DUET converges to the optimal training data mixture even without any fine-grained data information from an unseen task. Finally, our experiments across a variety of language tasks demonstrate that DUET attains substantial performance improvement over existing data selection and mixing methods in the unseen-task setting. Our anonymized code can be found at https://github.com/pmsdapfmbf/DUET.
EMNLP Findings

Data-Centric AI in the Age of Large Language Models

Xinyi Xu, Zhaoxuan Wu, Rui Qiao, Arun Verma, Yao Shu, Jingtan Wang, Xinyuan Niu, Zhenfeng He, Jiangwei Chen, Zijian Zhou, Gregory Kang Ruey Lau, Hieu Dao, and 7 more authors

In Findings of the Association for Computational Linguistics (EMNLP 2024), 2024

Abs arXiv PDF

This position paper proposes a data-centric viewpoint of AI research, focusing on large language models (LLMs). We start by making the key observation that data is instrumental in the developmental (e.g., pretraining and fine-tuning) and inferential stages (e.g., in-context learning) of LLMs, and yet it receives disproportionally low attention from the research community. We identify four specific scenarios centered around data, covering data-centric benchmarks and data curation, data attribution, knowledge transfer, and inference contextualization. In each scenario, we underscore the importance of data, highlight promising research directions, and articulate the potential impacts on the research community and, where applicable, the society as a whole. For instance, we advocate for a suite of data-centric benchmarks tailored to the scale and complexity of data for LLMs. These benchmarks can be used to develop new data curation methods and document research efforts and results, which can help promote openness and transparency in AI and LLM research.
EMNLP Findings

Position Paper: Uncover Scaling Laws for Large Language Models via Inverse Problems

Arun Verma, Zhaoxuan Wu, Zijian Zhou, Xiaoqiang Lin, Zhiliang Chen, Rachael Hwee Ling Sim, Rui Qiao, Jingtan Wang, Bui Thi Cam Nhung, Xinyuan Niu, Wenyang Hu, Gregory Kang Ruey Lau, and 6 more authors

In Findings of the Association for Computational Linguistics (EMNLP 2025), 2025

Abs PDF

Large Language Models (LLMs) are large-scale pretrained models that have achieved remarkable success across diverse domains. These successes have been driven by unprecedented complexity and scale in both data and computations. However, due to the high costs of training such models, brute-force trial-and-error approaches to improve LLMs are not feasible. Inspired by the success of inverse problems in uncovering fundamental scientific laws, this position paper advocates that inverse problems can also be used to efficiently uncover scaling laws that guide the building of LLMs to achieve a desirable performance with significantly better cost-effectiveness.
ICML Workshop

Protecting Text IP in the Era of LLMs with Robust and Scalable Watermarking

Gregory Kang Ruey Lau*, Niu Xinyuan*, Hieu Dao, Chen Jiangwei, Foo Chuan Sheng, and Bryan Kian Hsiang Low

In ICML2024 Workshop on Generative AI and Law, 2024

Abs

In this paper, we propose the first training-free framework for robust and scalable text watermarking applicable across multiple text types (e.g., articles, code) and languages, for general as well as LLM text training data provenance. We highlight perspectives on text IP protection, such as using LLMs to enable better IP protection rather than viewing them as just sources of IP infringement, not relying on just major LLM providers, and the benefits of having a general framework that can be easily adapted to defend against new threats.
ICLR Workshop

Uncertainty Quantification for MLLM

Gregory Kang Ruey Lau*, Hieu Dao*, and Bryan Kian Hsiang Low

In ICLR 2025 Quantify Uncertainty and Hallucination in Foundation Models (QUESTION) Workshop, 2025

Abs arXiv

Despite their capabilities, Multimodal Large Language Models (MLLMs) may produce plausible but erroneous outputs, hindering reliable deployment. Accurate uncertainty metrics could enable escalation of unreliable queries to human experts or larger models for improved performance. However, existing uncertainty metrics have practical constraints, such as being designed only for specific modalities, reliant on external tools, or computationally expensive. We introduce UMPIRE, a training-free uncertainty quantification framework for MLLMs that works efficiently across various input and output modalities without external tools, relying only on the models’ own internal modality features. UMPIRE computes the incoherence-adjusted semantic volume of sampled MLLM responses for a given task instance, effectively capturing both the global semantic diversity of samples and the local incoherence of responses based on internal model confidence. We propose uncertainty desiderata for MLLMs and provide theoretical analysis motivating UMPIRE’s design. Extensive experiments show that UMPIRE consistently outperforms baseline metrics in error detection and uncertainty calibration across image, audio, and video-text benchmarks, including adversarial and out-of-distribution settings. We also demonstrate UMPIRE’s generalization to non-text output tasks, including image and audio generation.
ICML Workshop

README: Rapid Equation Discovery Using Multimodal Encoders

Gregory Kang Ruey Lau*, Yue Ran Kang*, Zi-Yu Khoo, Apivich Hemachandra, Ruth Wan Theng Chew, and Bryan Kian Hsiang Low

In ICML 2025 AI4Math Workshop, 2025

Abs PDF

Discovering scientific laws or interpretable symbolic equations from data rapidly is important in many setting, such as decision-making in time-sensitive high-stake scenarios or applications involving interactive or iterative experimentation such as in scientific or machine learning workflows. However, existing methods, generally known as symbolic regression (SR), typically require long computational time to achieve good performance and have to run from scratch for each dataset. Recent methods that use pre-training SR foundation models for faster inference also suffer from performance limitations and require large training datasets. In this work, we propose README, a framework for rapid equation discovery that can generate performant, interpretable equations from limited, noisy data in just a few seconds, and requires significantly less training data compared to past SR foundation model approaches. We achieve this by being the first to (1) work with image representations of datasets to efficiently capture their key properties, (2) combine the capabilities of open-sourced pre-trained text and image encoders to produce an informative SR embedding space, and (3) develop a novel Grey Wolf Optimizer with Bayesian Optimization (GWOBO) algorithm to rapidly optimize for the best symbolic expression within seconds. We empirically show that README outperforms benchmarks on a wide range of realistic datasets, including real experimental data from various domains and noisy video-extracted dynamics.
ICML Workshop

TIGER: Bridging the Multimodal Reasoning-Access Gap via Modality Counterfactuals

Gregory Kang Ruey Lau*, Nguyen Huynh Minh*, and Bryan Kian Hsiang Low

In ICML 2026 Foundations of Deep Generative Models Workshop (FoGen), 2026

Abs

While Multimodal Large Language Models (MLLMs) exhibit strong reasoning on text inputs, they often fail on semantically equivalent visual inputs. By rendering text problems as images, we isolate this failure and identify a reasoning-access gap: models correctly perceive visual content but fail to route that content into the latent reasoning mechanisms used for text-based tasks. To address this, we propose TIGER (Text-to-Image Gap-targeted Training for Enhanced Reasoning). TIGER automatically transforms text-only corpora into multimodal training data by mining modality counterfactuals, instances where a model succeeds on text but fails on the equivalent image, providing targeted supervision without manually curated datasets. Implemented via image-conditioned Group Relative Policy Optimization (GRPO), TIGER consistently narrows the modality gap and improves visual reasoning on benchmarks like MathVerse and EMMA. We further show that even RLVR-based models exhibit modality-dependent reasoning gaps, and that TIGER effectively reduces them. Furthermore, activation analyses reveal that TIGER helps visual representations better engage reasoning-relevant subspaces within the language backbone. Our results emphasize that robust multimodal reasoning requires reliable visual access to existing reasoning machinery, moving beyond better perception.
ICML Workshop

Watershed: A Unified Benchmark for End-to-End Data Provenance Evaluation

John Russel Himawan*, Gregory Kang Ruey Lau*, and Bryan Kian Hsiang Low

In ICML 2026 AI4Good Workshop, 2026

Abs PDF

Data provenance aims to determine whether and how a data source has influenced a downstream LLM. Despite growing interest in data provenance research, current methods tend to specialize on specific settings and suffer from fragmented evaluation standards. To address this, we introduce WATERSHED, a unified benchmark and toolkit for end-to-end provenance evaluation. WATERSHED structures data provenance into stage-wise tests spanning data preparation, LLM training, black-box auditing, and downstream applications such as membership audit, multi-owner source attribution, and unlearning verification. We evaluate existing provenance methods such as watermarking and membership inference attacks on WATERSHED, across a wide range of datasets, model families and attacks. Our results confirm that methods vary in effectiveness across different stages and tasks. By providing a unified framework and exposing these failure modes, WATERSHED establishes a rigorous basis for evaluating data provenance methods.