Tsinghua-Cardiff workshop on Visual Computing

Invited Speakers and Abstracts

Speaker 1: Paul Rosin,Cardiff University

Bio: Paul L. Rosin is a Professor at the School of Computer Science & Informatics, Cardiff University. He has worked on many aspects of computer vision over the last 40 years, covering both fundamental algorithms in areas such as low level image processing, performance evaluation, shape analysis, facial analysis, medical image analysis, surveillance, 3D mesh processing, cellular automata and non-photorealistic rendering, as well as multidisciplinary collaborations such as: determining the effectiveness of surgery from facial morphology and temporal dynamics, the perception of trustworthiness from smiles, segmentation of 3D OCT scans of retinas, interpreting lava flows, identification of desmids and otoliths, analysing the effects of alcohol on crowd dynamics and violence, and digitally unrolling of fragile parchments from 3D X-ray scans.

Title: Analysing Scrolls (and Films)

Abstract: Historical parchment scrolls are fragile, and consequently many such scrolls cannot be unrolled, so that their contents have remained hidden for centuries. I will describe several stages in the development of our method to perform a "virtual unrolling" of such documents from their X-ray tomographic scans. A critical element is the segmentation of the images in order to separate the parchment from the background, which is complicated by both holes in the parchment and the fusing of adjacent rolled layers. Since this causes standard segmentation algorithms to produce errors, we have devised several new algorithms to cope with such data. Once segmentation is achieved, the parchment is virtually flattened, and the ink density recovered to produce a reconstruction of the parchment surface. An example of our results are shown on the previously unseen fifteenth century Bressingham scroll. In addition, results will be shown of a modified version of the algorithm which has been applied to recover frames from X-ray scans of film

Speaker 2: Yu-Kun Lai,Cardiff University

Bio: Yu-Kun Lai is a Professor in the School of Computer Science & Informatics, Cardiff University, UK, where he was the Director of Research for 2019-2022. He is currently the Cardiff lead for EPSRC AI Hub in Generative Models where he co-leads Multimodal Models working group. He received his bachelor's and PhD degrees from Tsinghua University in 2003 and 2008, respectively. His PhD thesis received National Excellent Doctoral Thesis of China award. He has published over 100 research papers in top venues in Computer Graphics, Computer Vision and related areas. He is an associated editor for international journals IEEE Transactions on Visualization and Computer Graphics, Computers and Graphics and The Visual Computer.

Title: Deep Generative Models for Controllable Visual Content Generation and Editing

Abstract: In recent years, deep generative models have significantly reduced the effort needed for visual content creation, with only minimal user input such as text prompt. However, there are often ambiguities and automatically generated visual content may not follow users’ intention. While it is possible to re-generate the content, e.g. with modified text prompts, it cannot ensure unedited regions are fully retained, and it cannot achieve detailed control. In this talk, I will present some of our recent works that aim to provide intuitive, detailed control for visual content generation and editing, including image colorization, 3D contents generation and editing, etc. where intuitive user control such as scribbles and sketches are used to ensure fine-grained control, while maintaining quality and efficiency. I will also discuss some challenges for future research.

Speaker 3: Fanglue Zhang,Victoria University of Wellington

Bio: Fanglue Zhang is a Senior Lecturer at Victoria University of Wellington, New Zealand. He received his Ph.D. in Computer Science from Tsinghua University in 2015. Since 2009, Dr. Zhang has focused on research in computer graphics and intelligent image/video editing methods. He has proposed numerous innovative approaches in the structured representation, analysis, and synthesis of images and videos, as well as in perception-based visual media analysis and editing. He has published over 100 papers in international conferences and journals in the fields of computer graphics and artificial intelligence, including more than 30 papers in top-tier publications such as IEEE TPAMI, ACM SIGGRAPH/SIGGRAPH Asia, ACM TOG, IEEE TVCG, IEEE TIP, and AAAI. Dr. Zhang has received the Victoria University of Wellington Early-Career Research Excellence Award (2019) and the Royal Society of New Zealand Fast-Start Marsden Grant (2020). He has served as the Program Chair of Pacific Graphics 2020 and 2021, and as the Program Chair of CVM 2024. He is currently a member of the IEEE Central New Zealand Section and serves on the editorial boards of several international journals in computer graphics.

Title: Gaze-Driven 360-degree Scene Analysis and Enhancement

Abstract: Panoramic images and videos can present 360-degree real-world scenes, providing users with a highly immersive experience. Compared to traditional virtual reality (VR) scenes generated through complex modeling and rendering, panoramic images and videos are directly captured from the real world, offering a more intuitive and comprehensive representation of the scene. Visual perception in panoramic environments plays a crucial role in the quality of user experience, and understanding and analyzing users' visual perception is one of the core challenges in this field. This report focuses on a key perceptual feature in 360-degree images and videos—the user scanpath—and explores how deep learning techniques can be applied to predict scanpaths and enhance image quality based on such user gaze trajectories.

Speaker 4: Tianjia Shao,Zhejiang University

Bio: Tianjia Shao is a ZJU100 Young Professor in the State Key Laboratory of CAD&CG, Zhejiang University. Previsouly He was a Lecturer in the School of Computing, University of Leeds, UK. He received his PhD in Computer Science from Institute for Advanced Study, Tsinghua University, under the guidance of Prof. Baining Guo, and his B.S. from the Department of Automation, Tsinghua University. During his PhD, he spent one year as a visiting researcher working with Prof. Niloy Mitra in University College London and four years as a research intern in Microsoft Research Asia. After that, he was an Assistant Researcher in the State Key Laboratory of CAD&CG, Zhejiang University.

Title: When Gaussian Meets Surfel: Ultra-fast High-fidelity Radiance Field Rendering

Abstract: We introduce Gaussian-enhanced Surfels (GESs), a bi-scale representation for radiance field rendering, wherein a set of 2D opaque surfels with view dependent colors represent the coarse-scale geometry and appearance of scenes, and a few 3D Gaussians surrounding the surfels supplement fine-scale appearance details. The entirely sorting free rendering of GESs not only achieves very fast rates, but also produces view-consistent images, successfully avoiding popping artifacts under view changes. Experimental results show that GESs advance the state-of-the-arts as a compelling representation for ultra-fast high-fidelity radiance field rendering.

Speaker 5: Zhibin Niu,Tianjin University

Bio: Zhibin Niu is an associate professor with the College of Intelligence and Computing, Tianjin University. He received his Ph.D. in Computer Science from Cardiff University, his M.S. from Shanghai Jiao Tong University, and his B.S. from Tianjin University. Before joining Tianjin University, he was a Marie Skłodowska-Curie Fellow supported by the European Commission, with visiting experience in UC Berkeley (USA) and Johannes Kepler University (Austria). His research includes work on feature recognition methods that significantly improve computational efficiency, and have been endorsed by Airbus and Rolls-Royce. His current research interests include intelligent visual analytics, data mining, and their applications.

Title: BMap: Preserving Spatial Structure and Temporal Coherence in Dynamic High-Dimensional Data Visualization

Abstract: In many scientific and engineering fields, we face the challenge of visualizing high-dimensional data that evolves over time. Traditional dimensionality reduction techniques, such as t-SNE or UMAP, are powerful for static datasets, but when applied frame by frame to dynamic data, they often cause layouts to shift unpredictably. This disrupts the user’s mental map and makes it difficult to track changes reliably. To address this, we introduce BMap, a new framework designed specifically for dynamic data visualization. The key idea is to augment existing nonlinear dimensionality reduction methods with multi-scale spatial constraints and efficient interpolation mechanisms. By combining features from clustering and spatial pyramid techniques with weighted constraints based on frame-to-frame differences, BMap keeps stable points steady while highlighting meaningful changes. In addition, an incremental projection strategy allows new data to be integrated smoothly without high computational cost. Through extensive experiments, including both quantitative metrics and qualitative evaluations, we show that BMap achieves better spatial quality, stronger temporal stability, and higher efficiency compared to state-of-the-art methods. Overall, BMap provides a balanced and versatile solution for visualizing dynamic high-dimensional data.

Speaker 6: Shuyu Chen,Institute of Computing Technology, Chinese Academy of Sciences

Bio: Shu-Yu Chen is an Associate Professor in the Institute of Computing Technology, Chinese Academy of Sciences. She received the PhD degree in computer science and technology from the University of Chinese Academy of Sciences. Her research interests include computer graphics, computer vision and deep learning. She was selected for the Shi Qingyun Female Scientist Award of the Chinese Society of Image and Graphics (QingYing Group), and CCF CAD&CG Outstanding Graphics Open Software Award.

Title: Feature-Disentangled Facial Animation Generation

Abstract: High-quality facial synthesis and modeling are crucial areas within computer graphics. With the rapid advancement of digital and intelligent technologies, there is a growing demand from industry for realistic digital humans. Traditional production methods are highly specialized, costly, and time-consuming. Furthermore, existing techniques are often limited by low reconstruction efficiency, rigid expressions, and a lack of facial detail, which restricts their application. This presentation will introduce a novel method for 3D face reconstruction and motion generation based on Neural Radiance Fields (NeRF). The core of our approach involves disentangling coupled facial features into geometry, motion and shading, in order to enable the synthesis of highly realistic and controllable 3D faces.

Speaker 7: Xiao-Lei Li,Tsinghua University

Bio: Xiao-Lei Li is a Ph.D. candidate in the CSCG Lab at Tsinghua University in Beijing, China, where he conducts research in 3D Vision, AIGC, and Computer Graphics under the supervision of Prof. Shi-Min Hu. Prior to this, he completed his Master's degree in the ITML Lab at Tsinghua University, advised by Prof. Shu-Tao Xia. He received his bachelor's degree from the Department of Physics at Jilin University. He spent time as a research intern in the Visual Computing Group at Microsoft Research Asia, where he worked with Dr. Xin Tong and Dr. Jiaolong Yang. He was also a visiting student in the IIIS Department at Tsinghua University, under the supervision of Prof. Li Yi.

Title: From Objects to Scenes: Structured and Controllable 3D Generation with Diffusion Models

Abstract: Recent progress in 3D generation has opened the door to automating traditionally labor-intensive asset creation, but practical deployment requires more than producing isolated shapes—it demands controllability, editability, and scalability from individual objects to complex scenes. In this talk, I will present two complementary directions toward this goal. First, we study the native latent space of 3D diffusion models and reveal its decomposable, low-rank structure. Building on this understanding, we propose RELATE3D, which enables precise part-level editing and local enhancement of generated geometry through a refocusing adapter and part-to-latent correspondence guided training. Second, we tackle the challenge of scalable 3D scene generation by proposing DIScene, a method that distills knowledge from pretrained 2D diffusion models into a structured scene-graph representation. Each node models an object with geometry, appearance, and text alignment, while edges capture object interactions, enabling flexible input modalities, consistent style, and interaction-aware optimization. Together, these works illustrate a path toward structured, controllable, and semantically aligned 3D generation—from fine-grained object editing to full-scene synthesis—providing a foundation for more efficient and expressive 3D content creation workflows.

The 5th Tsinghua-Cardiff workshop on Visual Computing