Gregory Kang Ruey Lau

Department of Computer Science, National University of Singapore

profile_pic_close.png

I am a PhD student in the School of Computing at NUS, advised by Bryan Kian Hsiang Low and supported by the AI Singapore-CNRS@Create Descartes Joint PhD Scholarship.

My research adopts a data-centric approach to tackling critical bottlenecks in the practical deployment of AI systems. I anchor my work around a basic question: what is the impact of each data point on model behavior? By developing principled methods in algorithmic data selection and data provenance, I aim to lay the data-centric foundations for autonomous AI systems capable of driving the next generation of scientific discovery.

Previously, I completed my Bachelor of Science in Physics and in Economics at MIT, where I had worked with Wolfgang Ketterle, Eric Hudson and Dave Donaldson . I also obtained my Master of Finance at MIT Sloan and Master of Business Administration at Quantic. Before starting my PhD, I was a policymaker in the Singapore government, leading efforts in diverse areas such as data strategy, labour market policy, industry development, and social policy. I also spent some time as an entrepreneur, working on tech start-ups focused on education and career development.

Here is my CV. Please reach out if you are interested in collaborating!

news

Jun 2, 2026 My co-first authored paper, Watershed: A Unified Benchmark for End-to-End Data Provenance Evaluation, got accepted to the ICML 2026 AI4Good Workshop workshop.
May 26, 2026 My co-first authored paper, TIGER: Bridging the Multimodal Reasoning-Access Gap via Modality Counterfactuals, got accepted to the ICML 2026 Foundations of Deep Generative Models Workshop (FoGen) workshop.
May 26, 2026 The paper Rethinking Bayesian Optimization for Co-Optimizing LLM Training Configurations which I co-authored has been accepted to the ICML 2026 Decision-making From Offline Datasets to Online Adaptation (DEMO) Workshop as an oral paper.
Jan 26, 2026 My co-first authored paper WaterDrum: Watermark-based Data-centric Unlearning Metric got accepted to ICLR 2026.
Jan 26, 2026 The paper DUET: Optimizing Training Data Mixtures via Feedback from Unseen Evaluation Tasks which I co-authored has been accepted to ICLR 2026.
Dec 25, 2025 My co-first authored paper README: Rapid Equation Discovery with Multimodal Encoders got accepted to the NeurIPS2025-AI4Science workshop.
Sep 20, 2025 The position paper Position Paper: Uncover Scaling Laws for Large Language Models via Inverse Problems which I co-authored is accepted to Findings of EMNLP 2025 .
Sep 20, 2025 My co-first authored paper, Dipper: Diversity in Prompts for Producing Large Language Model Ensembles in Reasoning tasks, got accepted to EMNLP 2025.
Sep 6, 2025 I am visiting the University of Washington from Sep-Dec 2025.
Jul 30, 2025 I received the NUS School of Computing Research Achievement Award, which is awarded to PhD students who have achieved outstanding research performance over the past academic year.
Jul 9, 2025 My co-first authored paper, README: Rapid Equation Discovery with Multimodal Encoders, got accepted to the ICML 2025 AI4Math Workshop workshop.
Jul 1, 2025 My co-first authored paper, Uncertainty Quantification for MLLM, got accepted to the ICML 2025 Workshop on Reliable and Responsible Foundaation Models (R2-FM’25) workshop.
Jun 11, 2025 My co-first authored paper, WaterDrum: Watermarking for Data-centric Unlearning Metric, got accepted to the ICML 2025 Workshop on Machine Unlearning for Generative AI (MUGen’25) workshop.
Jun 6, 2025 I am visiting the University of Oxford Department of Statistics from Jun-Aug 2025.
Apr 9, 2025 My co-first authored paper, PIED: Physics-Informed Experimental Design For Inverse Problems got accepted to the AI4X 2025 conference for oral presentation.
Mar 6, 2025 The paper DUET: Optimizing Training Data Mixtures via Feedback from Unseen Evaluation Tasks which I co-authored has been accepted to the ICLR 2025 Workshop on Data Problems for Foundation Models (DATA-FM).
Mar 5, 2025 My co-first authored paper, Uncertainty Quantification for MLLMs, got accepted to the ICLR 2025 Quantify Uncertainty and Hallucination in Foundation Models (QUESTION) workshop.
Jan 21, 2025 My co-first authored paper PIED: Physics-Informed Experimental Design for Inverse Problems got accepted to ICLR 2025.
Oct 18, 2024 I received the EMNLP 2024 D&I Award.
Oct 9, 2024 My co-first authored paper, Dipper: Diversity in Prompts for Producing Large Language Model Ensembles in Reasoning tasks, got accepted to the NeurIPS MINT 2024 workshop.
Sep 20, 2024 My co-first authored paper, Waterfall: Framework for Robust and Scalable Text Watermarking, got accepted to EMNLP 2024.
Sep 20, 2024 The position paper Data-centric AI in the Age of Large Language Models which I co-authored is accepted to Findings of EMNLP 2024.
Aug 5, 2024 I received the NUS School of Computing Research Achievement Award, which is awarded to PhD students who have achieved outstanding research performance over the past academic year.
Jul 26, 2024 PINNACLE was awarded the Best Paper award (out of 225 submissions) at the ICML2024 AI4Science workshop.
Jul 3, 2024 My co-first authored paper, Waterfall: Framework for Robust and Scalable Text Watermarking, got accepted to the ICML2024-FM-Wild workshop.
Jun 27, 2024 I was one of the 3 CS PhD students selected for the NUS School of Computing Teaching Fellowship Scheme award, which is given to those with excellent performance as a tutor.
Jun 19, 2024 My co-first authored paper, Protecting Text IP in the Era of LLMs with Robust and Scalable Watermarking, got accepted to the ICML2024-GenLaw workshop.
Jun 17, 2024 Two of my co-first authored papers got accepted to the ICML2024-AI4Science workshop: PINNACLE: PINN Adaptive ColLocation and Experimental points selection (oral) and PIED: Physics-Informed Experimental Design For Inverse Problems.
Jan 15, 2024 My co-first authored paper PINNACLE: PINN Adaptive ColLocation and Experimental points selection got accepted to ICLR 2024 for spotlight presentation.
Dec 22, 2023 I passed my PhD Qualifying Examinations.

selected works

  1. NeurIPS
    Quantum Bayesian Optimization
    Zhongxiang Dai*, Gregory Kang Ruey Lau*, Arun Verma, Yao Shu, Bryan Kian Hsiang Low, and Patrick Jaillet
    In Advances in Neural Information Processing Systems 2023, 2023
  2. ICLR (Spotlight)ICML Workshop
    (Best Paper)
    PINNACLE: PINN Adaptive ColLocation and Experimental points selection
    Gregory Kang Ruey Lau*, Apivich Hemachandra*, See-Kiong Ng, and Bryan Kian Hsiang Low
    In 12th International Conference on Learning Representations (ICLR 2024), 2024
  3. ICLRAI4X (Oral)
    PIED: Physics-Informed Experimental Design For Inverse Problem
    Apivich Hemachandra*, Gregory Kang Ruey Lau*, See-Kiong Ng, and Bryan Kian Hsiang Low
    In 13th International Conference on Learning Representations (ICLR 2025), 2025
  4. EMNLP
    Waterfall: Framework for Robust and Scalable Text Watermarking of Original Text
    Gregory Kang Ruey Lau*, Niu Xinyuan*, Hieu Dao, Chen Jiangwei, Foo Chuan Sheng, and Bryan Kian Hsiang Low
    In 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP 2024), 2024
  5. EMNLP
    Dipper: Diversity in Prompts for Producing Large Language Model Ensembles in Reasoning tasks
    Gregory Kang Ruey Lau*, Wenyang Hu*, Liu Diwen, Chen Jizhuo, See-Kiong Ng, and Bryan Kian Hsiang Low
    In 2025 Conference on Empirical Methods in Natural Language Processing (EMNLP 2025), 2025
  6. ICLR
    WaterDrum: Watermarking for Data-centric Unlearning Metric
    Xinyang Lu*, Xinyuan Niu*, Gregory Kang Ruey Lau*, Bui Thi Cam Nhung, Rachael Hwee Ling Sim, John Russell Himawan, Fanyu Wen, Chuan-Sheng Foo, and Bryan Kian Hsiang Low
    In 14th International Conference on Learning Representations (ICLR 2026), 2026
  7. ICLR
    DUET: Optimizing Training Data Mixtures via Feedback from Unseen Evaluation Tasks
    Zhiliang Chen, Gregory Kang Ruey Lau, Foo Chuan Sheng, and Bryan Kian Hsiang Low
    In 14th International Conference on Learning Representations (ICLR 2026), 2026
  8. EMNLP Findings
    Data-Centric AI in the Age of Large Language Models
    Xinyi Xu, Zhaoxuan Wu, Rui Qiao, Arun Verma, Yao Shu, Jingtan Wang, Xinyuan Niu, Zhenfeng He, Jiangwei Chen, Zijian Zhou, Gregory Kang Ruey Lau, Hieu Dao, and 7 more authors
    In Findings of the Association for Computational Linguistics (EMNLP 2024), 2024
  9. EMNLP Findings
    Position Paper: Uncover Scaling Laws for Large Language Models via Inverse Problems
    Arun Verma, Zhaoxuan Wu, Zijian Zhou, Xiaoqiang Lin, Zhiliang Chen, Rachael Hwee Ling Sim, Rui Qiao, Jingtan Wang, Bui Thi Cam Nhung, Xinyuan Niu, Wenyang Hu, Gregory Kang Ruey Lau, and 6 more authors
    In Findings of the Association for Computational Linguistics (EMNLP 2025), 2025
  10. ICML Workshop
    Protecting Text IP in the Era of LLMs with Robust and Scalable Watermarking
    Gregory Kang Ruey Lau*, Niu Xinyuan*, Hieu Dao, Chen Jiangwei, Foo Chuan Sheng, and Bryan Kian Hsiang Low
    In ICML2024 Workshop on Generative AI and Law, 2024
  11. ICLR Workshop
    Uncertainty Quantification for MLLM
    Gregory Kang Ruey Lau*, Hieu Dao*, and Bryan Kian Hsiang Low
    In ICLR 2025 Quantify Uncertainty and Hallucination in Foundation Models (QUESTION) Workshop, 2025
  12. ICML Workshop
    README: Rapid Equation Discovery Using Multimodal Encoders
    Gregory Kang Ruey Lau*, Yue Ran Kang*, Zi-Yu Khoo, Apivich Hemachandra, Ruth Wan Theng Chew, and Bryan Kian Hsiang Low
    In ICML 2025 AI4Math Workshop, 2025
  13. ICML Workshop
    TIGER: Bridging the Multimodal Reasoning-Access Gap via Modality Counterfactuals
    Gregory Kang Ruey Lau*, Nguyen Huynh Minh*, and Bryan Kian Hsiang Low
    In ICML 2026 Foundations of Deep Generative Models Workshop (FoGen), 2026
  14. ICML Workshop
    Watershed: A Unified Benchmark for End-to-End Data Provenance Evaluation
    John Russel Himawan*, Gregory Kang Ruey Lau*, and Bryan Kian Hsiang Low
    In ICML 2026 AI4Good Workshop, 2026