NUS Presidential Young Professor Yang You Receives Google Research Award to Build Foundations for Next-Generation AI
NUS Presidential Young Professor Yang You from NUS Computing's Department of Computer Science has been selected for the Google 2026 Awards for Machine Learning Research and Education with TPUs. His project will develop the systems infrastructure for diffusion-based large language models – a promising new approach that could make AI dramatically faster and more accessible.
Every major AI chatbot today – ChatGPT, Claude, Gemini – generates text one word at a time, like writing a sentence from left to right. It works, but it is slow. The longer the response, the longer you wait. And much of the powerful hardware running these models sits idle between each word.
Diffusion-based large language models (dLLMs) take a different approach. Instead of producing text sequentially, they refine all words in parallel – similar to how AI image generators like Stable Diffusion create pictures by gradually sharpening an entire canvas at once. Recent models such as LLaDA, Mercury, and Google's own Gemini Diffusion have demonstrated speeds five to ten times faster than conventional methods, while matching them in quality on standard benchmarks. They even resolve long-standing weaknesses of sequential models, such as the so-called "reversal curse."
"If diffusion LLMs become the next dominant paradigm, they could fundamentally reshape what is possible with generative AI," said Prof You. "We are talking about truly real-time conversational agents, dramatically lower serving costs, and stronger reasoning."
But the entire infrastructure powering today's AI – the optimisers, attention mechanisms, parallelism strategies, and inference engines – has been built over the past decade specifically for the sequential approach. Almost none of it works for diffusion-based models.
"There is no diffusion-native optimiser. There is no equivalent of FlashAttention for bidirectional attention. KV-caching – a key technique for speeding up today's models – is fundamentally incompatible," Prof You explained. "Without dedicated systems foundations, the promise of dLLMs cannot be fully realised on modern hardware."
His project, Pioneering the Systems Foundations for Diffusion-Based Large Language Models on TPUs, will build precisely that foundation – leveraging JAX, Pallas, MaxText, and vLLM to create the core tools and frameworks needed to train and run dLLMs efficiently on Google's Tensor Processing Units (TPUs). All resulting tools will be released as open source so the wider community can benefit.
The team's first multimodal diffusion LLM work has already been released: DiffuSpeech: Silent Thought, Spoken Answer via Unified Speech-Text Diffusion, which combines speech and text generation within a single diffusion framework: https://arxiv.org/abs/2601.22889
Three further works are expected over the coming months:
- AsyncLane – an asynchronous inference framework for accelerating block-diffusion LLM inference, designed to map naturally onto TPU pod architectures and exploit their massive parallel compute
- ECUpcycle – an upcycling pipeline that converts a dense autoregressive LLM into a diffusion Mixture-of-Experts LLM with expert choice routing
- OrScale – an optimiser using Orthogonalised Optimisation with Layer-wise Trust Ratio Scaling, tailored for diffusion LLM training and engineered to fit TPU hardware
Prof You sees the most compelling near-term applications in areas currently bottlenecked by sequential generation: low-latency interactive coding assistants and chatbots, cost-efficient large-scale serving that makes capable LLMs more accessible to researchers, students, and smaller organisations, and reasoning tasks that benefit from bidirectional context.
For more information, visit Prof You's research group page at: https://ai.comp.nus.edu.sg/.
