Registration is closed. How to get there!
COM1 Seminar Room 1 (COM1 02-06)


We are running the Singapore Vision Day as part of the NUS School of Computing 25th anniversary celebration.

Singapore has several strong pockets of computer vision researchers spread across the island in both academia and industry. We wish to bring everyone together with the following motivation:

  • Research: Foster exchange of research ideas
  • Community: Community building, bringing together everyone who works in computer vision
  • Image: A stronger community or ensemble movement helps to cultivate Singapore's image as an AI hub for Southeast Asia
  • Recruitment: Students working in computer vision get exposure to companies that could be their future employers
  • Industrial Development: Raise awareness with companies interested in putting computer vision into their R&D plans






Day 1: 24 May 2023

Registration and Coffee 8.30am - 9.00am
Welcome and Opening: Opening Speech by Prof. Kian Lee Tan (Dean of School of Computing, NUS) 9.00am - 9.15am
Keynote Talk 1: Understanding the Visual World Through Naturally Supervised Code ( Jiajun Wu ; Stanford Uni) 9.15am - 10.15am
Guest Talk: AI As An Artistic Collaborator ( Wyn-Lyn Tan ; Visual Artist) 10.15am - 11.00am
Lightning Session 1:
Robby Tan (NUS); Ngai-Man Cheung (SUTD); Mengmi Zhang (NTU/ASTAR); Shengfeng HE (SMU); Guosheng Lin (NTU); Gim Hee Lee (NUS)
11.00am - 11.45am
Lunch and Poster Session 1 11:45am - 1.30pm
Keynote Talk 2: Towards 3D Representation Learning at Scale (Vincent Sitzmann ; MIT) 1.30pm - 2.30pm
Lightning Session 2:
Angela Yao (NUS); Malika Meghjani (SUTD); Tat-Jen Cham (NTU); Chong Hwa Ngo (SMU); Mohan Kankanhalli (NUS); Na Zhao (SUTD)
2.30pm - 3.15pm
Teabreak + Poster Session 2 3.15pm - 3.45pm
Keynote Talk 3: Data driven simulation for AV (Or Litany; Nvidia, Incoming Technion) 3.45pm - 4.45pm
Lightning Session 3:
Ziwei Liu (NTU); Xinchao Wang (NUS); Peng Song (SUTD); Jun Liu (SUTD); Daniel Lin (SMU); Mike Shou (NUS)
4.45pm - 5.30pm

Day 2: 25 May 2023

Coffee 8.30am - 9.00am
Keynote Talk 4: Towards View-consistent and Photorealistic 3D Generation (Niloy Mitra; UCL) 9.00am - 10.00am
Huawei Talk: Technical Challenges from Huawei Central Media Technology Institute ( Jia Cai; Huawei) 10.00am - 10.30am
Coffee + Poster Session 3 10.30am - 11.00am
Keynote Talk 5: The Rise of Neural Priors (Simon Lucey; Uni of Adelaide) 11.00am - 12.00pm

Poster Presentation Schedule

Poster Board Size: 1000mm x 1000mm (A1 size fits either way)
Please present your poster according to the poster session

Session 1: 11:45pm - 1.30pm
Poster Board ID Name Institution
1 Jonathan Burton Barr NTU and A*STAR
2 Cheng Chen NTU
3 Mengqi Guo NUS
4 Alvin Heng NUS
5 Wenmiao Hu NUS
6 Hengguan Huang NUS
7 Yeying Jing NUS
8 Yi Li NUS
9 Xiao Liu ASTAR
10 Weng Fei Low NUS
11 QRiddhi Mandal G H Raisoni College of Engineering
12 Sanjay Saha NUS
13 Burak Satar NTU
14 Terence Sim NUS
15 Qiuxia Lin NUS
16 Conghui Hu NUS
17 Chen Li NUS
18 Zhiqi Shen NUS
Session 2: 3.15pm - 3.45pm
Poster Board ID Name Institution
1 Sutthiphong Srigrarom Temasek Lab NUS
2 Peng Wang The University of Hong Kong / NTU
3 Shida Wang NUS
4 Xin Wang SUTD
5 Jiadong Wang NUS
6 Xun Xu I2R, ASTAR
7 Ziwei Yu NUS
8 Yifan Zhang NUS
9 Ao Zhang NUS
10 Shihao Zhang NUS
11 Yuyang Zhao NUS
12 Zhedong Zheng NUS
13 Guodong Ding NUS
14 Dibyadip Chatterjee NUS
15 Zhuo Chen NUS
16 Anju Gopinath Ohio State
17 Heyuan Li NUS
18 Mikhail Kennerley Mohamed NUS

Keynote Speakers

Jiajun Wu is an Assistant Professor of Computer Science at Stanford University, working on computer vision, machine learning, and computational cognitive science. Before joining Stanford, he was a Visiting Faculty Researcher at Google Research. He received his PhD in Electrical Engineering and Computer Science from the Massachusetts Institute of Technology. Wu's research has been recognized through the AFOSR Young Investigator Research Program (YIP), the ACM Doctoral Dissertation Award Honorable Mention, the AAAI/ACM SIGAI Doctoral Dissertation Award, the MIT George M. Sprowls PhD Thesis Award in Artificial Intelligence and Decision-Making, the 2020 Samsung AI Researcher of the Year, the IROS Best Paper Award on Cognitive Robotics, and faculty research awards from JPMC, Samsung, Amazon, and Meta.

Understanding the Visual World Through Naturally Supervised Code

Abstract: The visual world has its inherent structure: scenes are made of multiple identical objects; different objects may have the same color or material, with a regular layout; each object can be symmetric and have repetitive parts. How can we infer, represent, and use such structure from raw data, without hampering the expressiveness of neural networks? In this talk, I will demonstrate that such structure, or code, can be learned from natural supervision. Here, natural supervision can be from pixels, where neuro-symbolic methods automatically discover repetitive parts and objects for scene synthesis. It can also be from objects, where humans during fabrication introduce priors that can be leveraged by machines to infer regular intrinsics such as texture and material. When solving these problems, structured representations and neural nets play complementary roles: it is more data-efficient to learn with structured representations, and they generalize better to new scenarios with robustly captured high-level information; neural nets effectively extract complex, low-level features from cluttered and noisy visual data.

Vincent Sitzmann is an Assistant Professor at MIT EECS, where he is leading the Scene Representation Group. Previously, he did his Ph.D. at Stanford University as well as a Postdoc at MIT CSAIL. His research interest lies in neural scene representations - the way neural networks learn to represent information on our world. His goal is to allow independent agents to reason about our world given visual observations, such as inferring a complete model of a scene with information on geometry, material, lighting etc. from only few observations, a task that is simple for humans, but currently impossible for AI.

Towards 3D Representation Learning at Scale

Abstract: Given only a single picture, people are capable of inferring a mental representation that encodes rich information about the underlying 3D scene. We acquire this skill not through massive labeled datasets of 3D scenes, but through self-supervised observation and interaction. Building machines that can infer similarly rich neural scene representations is critical if they are to one day parallel people’s ability to understand, navigate, and interact with their surroundings. In my talk, I will discuss how this motivates a 3D approach to self-supervised learning for vision. I will then present recent advances of my research group towards enabling us to train self-supervised scene representation learning methods at scale, on uncurated video without pre-computed camera poses. I will further present recent advances towards modeling of uncertainty in 3D scenes, as well as progress on endowing neural scene representations with more semantic, high-level information.

Or Litany is a senior research scientist at NVIDIA and an incoming assistant professor at the Technion. Before that he was a postdoc at Stanford University working under Prof. Leonidas Guibas, and FAIR hosted by Prof. Jitendra Malik. He received his PhD from Tel-Aviv University, where he was advised by Prof. Alex Bronstein. He received my B.Sc. in Physics and Mathematics from the Hebrew University under the auspices of “Talpiot”. His research interests include: Deep learning for 3D vision and geometry, and learning with reduced supervision.

Data driven simulation for AV

Abstract: Simulation is a critical tool for ensuring the safety of autonomous driving. However, traditional simulation methods can be labor-intensive and struggle to scale. In this talk, I will discuss an innovative neural simulation approach that learns to simulate driving scenarios from data. Specifically, I will focus on my latest research in three key areas: scene reconstruction in both appearance and geometry, motion generation of humans and vehicles, and LiDAR synthesis.

Niloy Mitra is a Professor of Geometry Processing in the Department of Computer Science, University College London (UCL). He received his MS and PhD in Electrical Engineering from Stanford University under the guidance of Leonidas Guibas and Marc Levoy, and was a postdoctoral scholar with Helmut Pottmann at Technical University Vienna. His research interests include shape analysis, computational design and fabrication, and geometry processing. For details, please visit the SmartGeometryProcessing page. Niloy received the 2013 ACM Siggraph Significant New Researcher Award for "his outstanding work in discovery and use of structure and function in 3D objects" (UCL press release), the BCS Roger Needham award (BCS press release) in 2015, and the Eurographics Outstanding Technical Contributions Award in 2019. He received the ERC Starting Grant on SmartGeometry in 2013. His work has twice been featured as research highlights in the Communications of the ACM, twice been selected by ACM Siggraph/Siggraph Asia (both in 2017) for press release as research highlight. Niloy was elected as an Eurographics Fellow in 2021. Besides research, Niloy is an active DIYer and loves reading, bouldering, and cooking.

Towards View-consistent and Photorealistic 3D Generation

Abstract: A long-standing dream has been to develop a scalable 3D content creation workflow. In this talk, I will describe our recent steps towards this goal. Diffusion models have recently emerged as the best approach for generative modeling for 2D images. Part of their success is the possibility of training them on millions, if not billions, of images with a stable learning objective. However, extending these models to 3D remains difficult for two reasons. First, finding a large quantity of 3D training data is much more challenging than for 2D images, and in practice, one has access to only tens of thousands of 3D training samples. Second, while extending the models to operate on 3D rather than 2D grids is conceptually simple, the associated cubic growth in memory and compute complexity makes this infeasible. We address the first challenge by introducing a new diffusion setup that can be trained end-to-end, with only posed 2D images for supervision; and the second challenge by proposing an image formation model that decouples model memory from spatial memory. In this talk, I will also describe results using synthetic and real data and discuss how we can extend these models to produce high-quality photorealistic outputs. I will also discuss the relative merits of diffusion models compared to GAN-based counterparts.

Simon Lucey Ph.D. is the Director of the Australian Institute for Machine Learning (AIML) and a professor in the School of Computer Science (SCS), at the University of Adelaide. Prior to this he was an associate research professor at Carnegie Mellon University's Robotics Institute (RI) in Pittsburgh USA; where he spent over 10 years as an academic. He was also Principal Research Scientist at the autonomous vehicle company Argo AI from 2017-2022. He has received various career awards, including an Australian Research Council Future Fellowship (2009-2013). Simon’s research interests span computer vision, machine learning, and robotics. He enjoys drawing inspiration from AI researchers of the past to attempt to unlock computational and mathematic models that underlie the processes of visual perception.

The Rise of Neural Priors

Abstract: The performance of an AI is nearly always associated with the amount of data you have at your disposal. Self-supervised machine learning can help – mitigating tedious human supervision – but the need for massive training datasets in modern AI seems unquenchable. Sometimes it is not the amount of data, but the mismatch of statistics between the train and test sets – commonly referred to as bias ¬– that limits the utility of an AI. In this talk I will explore a new direction based on the concept of a “neural prior” that relies on no training dataset whatsoever. A neural prior speaks to the remarkable ability of neural networks to both memorise training and generalise to unseen testing examples. Though never explicitly enforced, the chosen architecture of a neural network applies an implicit neural prior to regularise its predictions. It is this property we will leverage for problems that historically suffer from a paucity of training data or out-of-distribution bias. We will demonstrate the practical application of neural priors to augmented reality, autonomous driving and noisy signal recovery – with many of these outputs already being taken up in industry.

Guest Speaker

Wyn-Lyn Tan Wyn-Lyn Tan is a visual artist working with painting. Her practice centres around her observations of the natural world and phenomena, and is driven by a visual language shaped through rhythm and intuition. For nearly two decades, she has worked in a range of media, from canvas to plexiglass, metal and wood. Her continuous push to discover space, light and new dimensions in painting has led to installations, sculptural objects, and most recently, with generative AI videos. Seen as an extension of her painting practice, the surreal generative AI videos she creates are the result of machine training on her own physical paintings. Wyn-Lyn has been the recipient of the Kunstnerstipend scholarship (2017) and Statens utstillingsstipend grant (2017), as well as artist residences with VerticalCrypto Art (2022), the Inside-Out Art Museum, Beijing (2014) and The Arctic Circle Residency (2011). Wyn-Lyn has exhibited widely, including in Singapore, New York, Norway and Art Basel Hong Kong. Her work can be found in the permanent collection of the Singapore Art Museum, as well as numerous other public and private collections. She is represented by FOST Gallery (Singapore) and Sapar Contemporary (NYC).

AI As An Artistic Collaborator

Abstract: Artist Wyn-Lyn Tan talks about her journey from traditional art-making to working with technology, as she explores the use of AI in her artistic practice.

Huawei Talk

Cai Jia a technical expert in perception of Autonomous Driving from HUAWEI. His major research interests are Visual perception of general obstacles, End-to-End Autonomous Driving and 3D reconstruction (high geometrical accuracy), etc.

Technical Challenges from Huawei Central Media Technology Institute

Abstract: First a brief introduction of Huawei Central Media Technology Institute, 2012 Labs, will be given. Second, five technical challenges from Huawei Central Media Technology Institute will be released, including Real-Time Rendering with Ray-Tracing Effects for Dynamic CG Scenes, Real-Time Parallel Solver for High-Precision Physics-based Simulation, Lightweight AI Image Encoding and Decoding Technology, High-Compression-Ratio Lossless Codec Algorithm for Audio Transmission over Bluetooth Channels and 4D Occupancy Grid Map (OGM) Perception Technology. We invite academia to join our Challenge, and stand a chance to win a prize and funding support for a research project.



Angela Yao
National University of Singapore


Gim Hee Lee
National University of Singapore


Thanks to ScanNet Indoor Scene Understanding Challenge for the webpage format.