From Frustrated Commands to Cooperative Partners:
Rethinking AI Through Intent Inference
Have you ever found yourself repeating a command to a virtual assistant, tweaking your phrasing endlessly, only to give up and complete the task yourself? Whether it’s telling your phone to play a song or getting help from a chatbot, interacting with AI systems often feels like shouting into a void. The problem isn’t just technical; it’s conceptual. Most current AI systems aren’t built to cooperate; they’re built to follow instructions. That’s a huge difference.
A new wave of research, by NUS Presidential Young Professor (PYP) Tan Zhi Xuan, one of our newest faculty members at NUS Computing, is flipping the script. Instead of focusing on systems that blindly execute orders, her work tackles a much more ambitious question: What would it take to build AI that can truly understand our intentions and act as collaborative partners? The implications stretch far beyond virtual assistants, to factory robots, self-driving cars, web agents, and beyond. This is the frontier of cooperative intelligence.
Why Cooperation, Not Just Computation, Matters
Humans are naturally cooperative. From a young age, we learn to read each other’s actions, understand goals, and offer help without being explicitly told what to do. A famous psychology study showed that even 18-month-old toddlers can infer someone’s goal; say, putting books into a cabinet; and act to help, such as opening the cabinet doors. No words are exchanged. Just observation, understanding, and action.
Now compare that to your average AI assistant. It doesn’t infer. It executes; oftentimes quite poorly, if your command is even slightly ambiguous. As AI systems are deployed in more critical contexts, such as assisting drivers, helping workers on assembly lines, or managing online transactions, the stakes of this misunderstanding can grow substantially. A helpful AI agent that misinterprets your intent can be merely annoying or sometimes catastrophically harmful.
That’s where Dr. Tan’s research steps in, with a framework that draws on insights from cognitive science, planning theory, and probabilistic reasoning to scale cooperative intelligence in machines.
The First Breakthrough: Inverse Planning with SIPS
The first major innovation is a system called Sequential Inverse Plan Search, or “SIPS” in short. It tackles a fundamental problem in AI: How do you infer a person’s goal from their actions, especially when you don’t know the goal ahead of time and the actions might not be perfectly efficient?
SIPS flips traditional AI planning on its head. Instead of mapping a path to a known goal, it observes actions and works backward to guess what the goal might be that produced the sequence of observed actions. Imagine watching someone walk across a room, pick up a red key, and unlock a door. A traditional AI might see unrelated steps. SIPS sees the pattern, the probable objective: perhaps retrieving a hidden item beyond the door.
What makes SIPS powerful is its combination of bounded rationality and particle filtering. It models humans as planning only a few steps ahead, not as flawless strategists. And it uses an approximation technique called sequential Monte Carlo (SMC) to keep a manageable number of goal hypotheses, updating their probabilities in real time as new actions are observed. Like a detective refining theories as clues emerge, SIPS homes in on the right goal fast – over 30 actions per second, often faster than real-time human decision-making.
In tasks like key-and-door puzzles or block stacking (where a person might be spelling a word like “INK” or “PINK”), SIPS doesn’t just match human accuracy in goal inference; it often exceeds it, all while running 12 to 85 times faster than exact Bayesian approaches.
Adding Language: The Power of CLIPS
But humans don’t just act; they speak. And language, while powerful, is notoriously ambiguous. “Can you get the forks and knives?” could mean one fork or three, depending on context. To address this, Dr. Tan’s second key innovation is Cooperative Language-guided Inverse Plan Search, or CLIPS in short.
CLIPS builds on SIPS by incorporating natural language cues, using a language model to estimate the probability of a certain utterance given a hypothesized goal. If someone says “Can you pass me the red key?” CLIPS uses that to strengthen hypotheses that involve goals requiring the red key, merging language and action to disambiguate intent more precisely.
This hybrid approach, combining structured probabilistic reasoning with large language model (LLM) capabilities, offers a striking contrast to relying on LLMs alone. In head-to-head tests with GPT-4V, a state-of-the-art multimodal LLM, CLIPS was dramatically more accurate and reliable. Where GPT-4V hallucinated objects that didn’t exist or made confident but incorrect guesses, CLIPS maintained nearly 97% accuracy with a much smaller model. It not only inferred goals correctly but acted on them safely and robustly.
Safety Through Uncertainty Awareness
A standout feature of this research is its sensitivity to uncertainty – a quality most current AI systems lack. For instance, if a user vaguely says “Open the red door,” and multiple red doors exist, GPT-4V might pick one arbitrarily, potentially triggering harmful side effects. CLIPS, on the other hand, will take a conservative action, e.g., by choosing the door that keeps future options open or asking for clarification. That kind of caution isn’t hesitation; it’s safety by design.
This probabilistic awareness is critical in high-stakes applications. Whether navigating a warehouse, assisting in surgery, or interpreting financial commands, an AI that’s uncertain, and knows it, is often more trustworthy than one that barrels ahead with misplaced confidence.
Real-World Applications: From Games to Excel
These breakthroughs aren’t just theoretical. They point to clear, impactful use cases:
- Productivity tools: Imagine an AI assistant in Microsoft Excel that doesn’t just execute commands but infers your intentions based on prior actions. Instead of clicking through menus or repeating yourself, the assistant could say, “Do you want me to apply the same style to these totals?” and get it right.
- Smart NPCs in games: Non-player characters (NPCs) that can infer your strategy and respond cooperatively, not just follow rigid scripts. Think Skyrim companions that genuinely help rather than stand in your way.
- Web agents: AI tools that safely shop, schedule, or navigate websites on your behalf – because they understand your intent, not just the literal clicks or phrases you use.
- Robotic collaboration: On factory floors or in hospitals, robots that anticipate human needs based on gestures, partial commands, or past routines. Less friction, more fluid teamwork.
Toward a Cooperative Society: AI That Learns Social Norms
Dr. Tan’s work doesn’t stop at individual interactions. It scales upward, asking a bold question: What would it take for AI to not just cooperate with individuals—but align with whole societies?
This opens the door to Bayesian norm learning. Picture a robot in a community that learns, by observing others, that overconsuming shared resources or polluting a river isn’t acceptable. Without being explicitly told, the robot updates its behavior to conform to cooperative norms. Simulations showed this learning to be six to seven orders of magnitude faster than older methods – an astonishing leap that could make socially intelligent AI a reality.
As AI becomes more embedded in daily life, learning not just what to do but how to behave will be essential. Whether in self-driving cars yielding appropriately in traffic, or financial agents acting ethically in markets, norm-aware AI offers a path forward that blends intelligence with responsibility.
The Vision Ahead
Dr. Tan’s research proposes a fundamental shift, from command-following machines to cooperative partners. It does so not by adding more data or inflating model size but by building smarter architectures grounded in how humans think, act, and collaborate.
By combining goal inference, language grounding, probabilistic reasoning, and norm learning, these systems represent a blueprint for safer, more capable AI. They’re not perfect yet; but they point toward an ecosystem where AI doesn’t just work for us, but with us.
And that future feels not only more effective; but more human.
Final Thought: Rethinking the Foundations of Human-AI Interaction
What if, instead of having to master the quirks of AI tools, the AI learned to adapt to us?
This is the fundamental promise of Dr. Tan Zhi Xuan’s research and it could radically reshape the way we think about human-AI collaboration in the years ahead. Today’s dominant models of artificial intelligence, especially large language models (LLMs), are powerful but brittle. They are trained on massive datasets, generate confident-sounding outputs, and perform impressively in many tasks. But they often lack one crucial ingredient: true understanding of intent. They don’t reason about goals, they don’t ask when they’re unsure, and they don’t consider whether their action could be premature or harmful.
What Dr. Tan’s work offers is a blueprint for a different kind of AI: one that is not just intelligent, but cooperative, cautious, and context-aware. And that distinction matters deeply as AI moves from the lab into high-stakes, real-world environments.
We can envision a few areas where these new approaches could shape the future.
- AI That Truly Collaborates—Not Just Obeys
Most current AI assistants are command-executors. They interpret inputs literally and do what they’re told; sometimes to a fault. But real collaboration requires more than obedience. It requires inference, initiative, and a sense of what the user is really trying to do, even if the request is incomplete, ambiguous, or evolving.
The intent inference methods developed here (SIPS and CLIPS) give AI systems the ability to look past surface forms and into underlying goals. In the future, this could power tools that anticipate your needs and fill in the blanks in cooperative, safe ways. Imagine an AI designer that notices you’re aligning objects symmetrically and offers to mirror the rest. Or a scheduling assistant that sees your afternoon filling up and proactively blocks time for lunch, understanding your routine without being told.
This isn’t science fiction. It’s a next step, enabled by better inference, smarter planning, and architectures that are more aligned with human behavior.
- Safer AI Through Uncertainty Awareness
A standout feature of these approaches is that the AI doesn’t just guess your goal; it estimates its confidence in that guess. This makes it vastly more reliable in uncertain or risky situations.
In domains like healthcare, finance, or logistics, where mistakes have real consequences, an AI that’s willing to say “I’m not sure yet” and act conservatively could be the difference between success and disaster. For example, in robotic surgery, an assistant system using CLIPS could hold back on an action if it’s not confident it understood the surgeon’s command, preventing a costly or dangerous move.
This kind of principled uncertainty modeling stands in contrast to black-box systems that generate an answer no matter what—even if it’s based on hallucinated data. As generative AI becomes more embedded in critical systems, safety through introspection will become not just beneficial, but essential.
- AI That Learns Norms and Earns Trust
The final layer of Dr. Tan’s research points to something even more ambitious: AI that doesn’t just serve individuals, but participates meaningfully in shared social contexts.
By observing how humans behave, cooperate, and enforce norms, AI systems could learn what’s appropriate in different situations, without needing every rule explicitly programmed. That opens the door to AI systems that don’t just optimize for efficiency or reward, but for social compatibility.
In the long term, this kind of norm-sensitive intelligence could be a pillar of trustworthy AI governance. Imagine autonomous agents that know how to respect community standards, share resources fairly, or mediate conflicts; all because they’ve learned how we expect others to behave.
This also suggests a profound shift in how we measure AI’s success – not just by task performance, but by how well it aligns with human values, as expressed in lived practice. It’s not hard to see this being critical for AI adoption in areas like eldercare, education, public services, or diplomacy.
- A More Human-Compatible AI Future
Ultimately, this research helps move us away from brittle, monolithic models and toward AI systems that adapt intelligently to the people they serve.
It’s a vision of AI that meets us halfway – systems that infer our intentions, express when they’re unsure, align with our norms, and collaborate fluidly without constant micro-management. That’s the kind of AI that people will actually want to use; not because it dazzles with novelty, but because it feels like a capable, trustworthy partner.
As policymakers, designers, and technologists wrestle with questions about AI ethics, safety, and long-term impact, these kinds of cooperative architectures offer a concrete path forward. They suggest that the future of AI may not be about building ever-larger models; but about designing systems that understand how to live and work with us.
If we take that path seriously, we may find that the real revolution in AI isn’t in how much data it consumes or how fast it runs, but in how well it cooperates, reasons, and aligns with the world we want to build. In that future, AI won’t just execute our instructions. It will understand our goals, share our values, and help us achieve them, together.