Jiajun Wu is an Assistant Professor of Computer Science at Stanford University, working on computer vision, machine learning, and computational cognitive science. Before joining Stanford, he was a Visiting Faculty Researcher at Google Research. He received his PhD in Electrical Engineering and Computer Science from the Massachusetts Institute of Technology. Wu's research has been recognized through the AFOSR Young Investigator Research Program (YIP), the ACM Doctoral Dissertation Award Honorable Mention, the AAAI/ACM SIGAI Doctoral Dissertation Award, the MIT George M. Sprowls PhD Thesis Award in Artificial Intelligence and Decision-Making, the 2020 Samsung AI Researcher of the Year, the IROS Best Paper Award on Cognitive Robotics, and faculty research awards from JPMC, Samsung, Amazon, and Meta.
Understanding the Visual World Through Naturally Supervised Code
The visual world has its inherent structure: scenes are made of multiple identical objects; different objects may have the same color or material, with a regular layout; each object can be symmetric and have repetitive parts. How can we infer, represent, and use such structure from raw data, without hampering the expressiveness of neural networks? In this talk, I will demonstrate that such structure, or code, can be learned from natural supervision. Here, natural supervision can be from pixels, where neuro-symbolic methods automatically discover repetitive parts and objects for scene synthesis. It can also be from objects, where humans during fabrication introduce priors that can be leveraged by machines to infer regular intrinsics such as texture and material. When solving these problems, structured representations and neural nets play complementary roles: it is more data-efficient to learn with structured representations, and they generalize better to new scenarios with robustly captured high-level information; neural nets effectively extract complex, low-level features from cluttered and noisy visual data.