Research Projects

  • Pamoramic vision data analysis, processing and VR interaction, Key Program, National Natural Science Foundation, PI: Song-Hai Zhang, Project Number: 6213000127, 2022-2026.
  • Deep learning algorithm and framework for computational visual media, Key International Joint Research Program, National Natural Science Foundation, PI: Shi-Min Hu, Project number: 62220106003, 2023-2027.
  • Narrative Visual Content Creation and Immersive Interaction of Panoramic Video, International Cooperation and Exchange Programs (NSFC-ISF), National Natural Science Foundation, PI: Song-Hai Zhang, Project number: 62361146854, 2024-2026.
  • Deep Learning Framework and Large Model Application Verification for Complex Heterogeneous Computing Systems, Majar Program, National Natural Science Foundation, PI: Shi-Min Hu, Project number: 62495060, 2025-2029.


  • 2025





    Implicit Bonded Discrete Element Method with Manifold Optimization
    ACM Transactions on Graphics, 2025, Vol. 43.    
    Jia-Ming Lu, Geng-Chen Cao, Chenfeng Li, Shi-Min Hu

    This paper proposes a novel simulation approach that combines implicit integration with the Bonded Discrete Element Method (BDEM) to achieve faster, more stable and more accurate fracture simulation. The new method leverages the eiciency of implicit schemes in dynamic simulation and the versatility of BDEM in fracture modelling. Speciically, an optimization-based integrator for BDEM is introduced and combined with a manifold optimization approach to accelerate the solution process of the quaternion-constrained system. Our comparative experiments indicate that our method ofers better scale consistency and more realistic collision efects than FEM and MPM fragmentation approaches. Additionally, our method achieves a computational speedup of 2.1 ~ 9.8 times over explicit BDEM methods.



    2024





    Tuning Vision-Language Models With Multiple Prototypes Clustering
    IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024, Vol. 46, No. 12, 11186-11199.    
    Meng-Hao Guo, Yi Zhang, Tai-Jiang Mu, Sharon X. Huang, Shi-Min Hu

    Attention mechanisms, especially self-attention, have played an increasingly important role in deep feature representation for visual tasks. Self-attention updates the feature at each position by computing a weighted sum of features using pair-wise affinities across all positions to capture the long-range dependency within a single sample. However, self-attention has quadratic complexity and ignores potential correlation between different samples. This article proposes a novel attention mechanism which we call external attention , based on two external, small, learnable, shared memories, which can be implemented easily by simply using two cascaded linear layers and two normalization layers; it conveniently replaces self-attention in existing popular architectures. External attention has linear complexity and implicitly considers the correlations between all data samples. We further incorporate the multi-head mechanism into external attention to provide an all-MLP architecture, external attention MLP (EAMLP), for image classification. Extensive experiments on image classification, object detection, semantic segmentation, instance segmentation, image generation, and point cloud analysis reveal that our method provides results comparable or superior to the self-attention mechanism and some of its variants, with much lower computational and memory costs.




    DIScene: Object Decoupling and Interaction Modeling for Complex Scene Generation
    ACM SIGGRAPH Asia 2024 Conference Papers, 2024, Article No.101, 1-12.    
    Xiao-Lei Li, Haodong Li, Hao-Xiang Chen, Tai-Jiang Mu, and Shi-Min Hu

    This paper reconsiders how to distill knowledge from pretrained 2D diffusion models to guide 3D asset generation, in particular to generate complex 3D scenes: it should accept varied inputs, i.e., texts or images, to allow for flexible expression of requirement; objects in the scene should be style-consistent and decoupled with clearly modeled interactions, benefiting downstream tasks. We propose DIScene, a novel method for this task. It represents the entire 3D scene with a learnable structured scene graph: each node explicitly models an object with its appearance, textual description, transformation, geometry as a mesh attached with surface-aligned Gaussians; the graph's edges model object interactions. With this new representation, objects are optimized in the canonical space and interactions between objects are optimized by object-aware rendering to avoid wrong back-propagation. Extensive experiments demonstrate the significant utility and superiority of our approach and that DIScene can greatly facilitate 3D content creation tasks.




    FragmentDiff: A Diffusion Model for Fractured Object Assembly
    ACM SIGGRAPH Asia 2024 Conference Papers, 2024, Article No. 58, Pages 1 - 12.    
    Qun-Ce Xu, Hao-Xiang Chen, Jiacheng Hua, Xiaohua Zhan, Yong-Liang Yang, Tai-Jiang Mu

    Fractured object reassembly is a challenging problem in computer vision and graphics with applications in industrial manufacturing and archaeology. Traditional methods based on shape descriptors and geometric registration often struggle with ambiguous features, resulting in lower accuracy. Recent data-driven methods are inherently affected by the representation and learning ability of the trained models. To address this, we propose a novel approach inspired by diffusion models and transformers. Our method applies diffusion denoising via a transformer to predict the pose parameter of each fragment, taking advantage of their global feature correlation and pose prior learning abilities. We evaluate our approach on a fractured object dataset and demonstrate superior performance compared to state-of-the-art methods. Our method offers a promising solution for accurate and robust fractured object reassembly, advancing the field in complex shape analysis and assembly tasks.




    EVSplitting: An Efficient and Visually Consistent Splitting Algorithm for 3D Gaussian Splatting
    ACM SIGGRAPH Asia 2024 Conference Papers, 2024, Article No. 35, Pages 1 - 11.    
    Qi-Yuan Feng, Geng-Chen Cao, Hao-Xiang Chen, Qun-Ce Xu, Tai-Jiang Mu, Ralph Martin, Shi-Min Hu

    This paper presents EVSplitting, an efficient and visually consistent splitting algorithm for 3D Gaussian Splatting (3DGS). It is designed to make operating 3DGS as easy and effective as other 3D explicit representations, readily for industrial productions. The challenges of above target are: 1) The huge number and complex attributes of 3DGS make it tough to explicitly operate on 3DGS in a real-time and learning-free manner; 2) The visual effect of 3DGS is very difficult to maintain during explicit operations and 3) The anisotropism of Gaussian always leads to blurs and artifacts. As far as we know, no prior work can address these challenges well. In this work, we introduce a direct and efficient 3DGS splitting algorithm to solve them. Specifically, we formulate the 3DGS splitting as two minimization problems that aim to ensure visual consistency and reduce Gaussian overflow across boundary (splitting plane), respectively. Firstly, we impose conservations on the zero-, first- and second-order moments of the weighted Gaussian distribution to guarantee visual consistency. Secondly, we reduce the boundary overflow with a special constraint on the aforementioned conservations. With these conservations and constraints, we derive a closed-form solution for the 3DGS splitting problem. This yields an easy-to-implement, plug-and-play, efficient and fundamental tool, benefiting various downstream applications of 3DGS.




    CharacterGen: Efficient 3D Character Generation from Single Images with Multi-View Pose Canonicalization
    ACM Transactions on Graphics, 2024, Vol. 43, No. 4, article number: 84, 1-13, ACM SIGGRAPH.    
    Hao-Yang Peng, Jia-Peng Zhang, Meng-Hao Guo, Yan-Pei Cao, Shi-Min Hu

    In the field of digital content creation, generating high-quality 3D characters from single images is challenging, especially given the complexities of various body poses and the issues of self-occlusion and pose ambiguity. In this paper, we present CharacterGen, a framework developed to efficiently generate 3D characters. CharacterGen introduces a streamlined generation pipeline along with an image-conditioned multi-view diffusion model. This model effectively calibrates input poses to a canonical form while retaining key attributes of the input image, thereby addressing the challenges posed by diverse poses. A transformer-based, generalizable sparse-view reconstruction model is the other core component of our approach, facilitating the creation of detailed 3D models from multi-view images. We also adopt a texture-back-projection strategy to produce high-quality texture maps. Additionally, we have curated a dataset of anime characters, rendered in multiple poses and views, to train and evaluate our model. Our approach has been thoroughly evaluated through quantitative and qualitative experiments, showing its proficiency in generating 3D characters with high-quality shapes and textures, ready for downstream applications such as rigging and animation.




    LC-NeRF: Local Controllable Face Generation in Neural Radiance Field
    IEEE Transactions on Visualization and Computer Graphics, 2024, Vol. 30, No. 8, 5437-5448.    
    Wen-Yang Zhou, Lu Yuan, Shu-Yu Chen, Lin Gao, Shi-Min Hu

    3D face generation has achieved high visual quality and 3D consistency thanks to the development of neural radiance fields (NeRF). However, these methods model the whole face as a neural radiance field, which limits the controllability of the local regions. In other words, previous methods struggle to independently control local regions, such as the mouth, nose, and hair. To improve local controllability in NeRF-based face generation, we propose LC-NeRF, which is composed of a Local Region Generators Module (LRGM) and a Spatial-Aware Fusion Module (SAFM) , allowing for geometry and texture control of local facial regions. The LRGM models different facial regions as independent neural radiance fields and the SAFM is responsible for merging multiple independent neural radiance fields into a complete representation. Finally, LC-NeRF enables the modification of the latent code associated with each individual generator, thereby allowing precise control over the corresponding local region. Qualitative and quantitative evaluations show that our method provides better local controllability than state-of-the-art 3D-aware face generation methods. A perception study reveals that our method outperforms existing state-of-the-art methods in terms of image quality, face consistency, and editing effects. Furthermore, our method exhibits favorable performance in downstream tasks, including real image editing and text-driven facial image editing.




    SceneDirector: Interactive Scene Synthesis by Simultaneously Editing Multiple Objects in Real-Time
    IEEE Transactions on Visualization and Computer Graphics, 2024, Vol. 30, No. 8, 4558-4569,.    
    Shao-Kui Zhang, Hou Tam, Yike Li, Ke-Xin Ren, Hongbo Fu, Song-Hai Zhang

    3D face generation has achieved high visual quality and 3D consistency thanks to the development of neural radiance fields (NeRF). However, these methods model the whole face as a neural radiance field, which limits the controllability of the local regions. In other words, previous methods struggle to independently control local regions, such as the mouth, nose, and hair. To improve local controllability in NeRF-based face generation, we propose LC-NeRF, which is composed of a Local Region Generators Module (LRGM) and a Spatial-Aware Fusion Module (SAFM) , allowing for geometry and texture control of local facial regions. The LRGM models different facial regions as independent neural radiance fields and the SAFM is responsible for merging multiple independent neural radiance fields into a complete representation. Finally, LC-NeRF enables the modification of the latent code associated with each individual generator, thereby allowing precise control over the corresponding local region. Qualitative and quantitative evaluations show that our method provides better local controllability than state-of-the-art 3D-aware face generation methods. A perception study reveals that our method outperforms existing state-of-the-art methods in terms of image quality, face consistency, and editing effects. Furthermore, our method exhibits favorable performance in downstream tasks, including real image editing and text-driven facial image editing.




    Mesh Neural Networks Based on Dual Graph Pyramids
    IEEE Transactions on Visualization and Computer Graphics, 2024, Vol. 30, No. 7, 4211-4224.    
    Xiang-Li Li, Zheng-Ning Liu, Tuo Chen, Tai-Jiang Mu, Ralph R. Martin, Shi-Min Hu

    Deep neural networks (DNNs) have been widely used for mesh processing in recent years. However, current DNNs can not process arbitrary meshes efficiently. On the one hand, most DNNs expect 2-manifold, watertight meshes, but many meshes, whether manually designed or automatically generated, may have gaps, non-manifold geometry, or other defects. On the other hand, the irregular structure of meshes also brings challenges to building hierarchical structures and aggregating local geometric information, which is critical to conduct DNNs. In this paper, we present DGNet, an efficient, effective and generic deep neural mesh processing network based on dual graph pyramids; it can handle arbitrary meshes. First, we construct dual graph pyramids for meshes to guide feature propagation between hierarchical levels for both downsampling and upsampling. Second, we propose a novel convolution to aggregate local features on the proposed hierarchical graphs. By utilizing both geodesic neighbors and euclidean neighbors, the network enables feature aggregation both within local surface patches and between isolated mesh components. Experimental results demonstrate that DGNet can be applied to both shape analysis and large-scale scene understanding. Furthermore, it achieves superior performance on various benchmarks, including ShapeNetCore, HumanBody, ScanNet and Matterport3D. Code and models will be available at https://github.com/li-xl/DGNet .




    Theoretically Achieving Continuous Representation of Oriented Bounding Boxes
    IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, 16912-16922.    
    Zi-Kai Xiao, Guo-Ye Yang, Xue Yang, Tai-Jiang Mu, Junchi Yan, Shi-Min Hu

    Considerable efforts have been devoted to Oriented Ob-ject Detection (OOD). However, one lasting issue regarding the discontinuity in Oriented Bounding Box (OBB) rep-resentation remains unresolved, which is an inherent bot-tleneck for extant OOD methods. This paper endeavors to completely solve this issue in a theoretically guaranteed manner and puts an end to the ad-hoc efforts in this di-rection. Prior studies typically can only address one of the two cases of discontinuity: rotation and aspect ratio, and often inadvertently introduce decoding discontinuity, e.g. Decoding Incompleteness (DI) and Decoding Ambi-guity (DA) as discussed in literature. Specifically, we pro-pose a novel representation method called Continuous OBB (COBB), which can be readily integrated into existing de-tectors e.g. Faster-RCNN as a plugin. It can theoreti-cally ensure continuity in bounding box regression which to our best knowledge, has not been achieved in literature for rectangle-based object representation. For fairness and transparency of experiments, we have developed a modu-larized benchmark based on the open-source deep learning framework Jittor's detection toolbox JDetfor OOD evaluation. On the popular DOTA dataset, by integrating Faster-RCNN as the same baseline model, our new method out-performs the peer method Gliding Vertex by 1.13% mAP 50 (relative improvement 1.54%), and 2.46% mAP 75 (relative improvement 5.91%), without any tricks.




    Wonder3D: Single Image to 3D Using Cross-Domain Diffusion
    IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, 9970-9980.    
    Xiaoxiao Long, Yuan-Chen Guo, Cheng Lin, Yuan Liu, Zhiyang Dou, Lingjie Liu, Yuexin Ma, Song-Hai Zhang, Marc Habermann, Christian Theobalt, Wenping Wang

    In this work, we introduce Wonder3D, a novel method for efficiently generating high-fidelity textured meshes from single-view images. Recent methods based on Score Distillation Sampling (SDS) have shown the potential to recover 3D geometry from 2D diffusion priors, but they typically suffer from time-consuming per-shape optimization and inconsistent geometry. In contrast, certain works di-rectly produce 3D information via fast network inferences, but their results are often of low quality and lack geometric details. To holistically improve the quality, consistency, and efficiency of single-view reconstruction tasks, we pro-pose a cross-domain diffusion model that generates multi-view normal maps and the corresponding color images. To ensure the consistency of generation, we employ a multi-view cross-domain attention mechanism that facilitates information exchange across views and modalities. Lastly, we introduce a geometry-aware normal fusion algorithm that extracts high-quality surfaces from the multi-view 2D representations in only 2 ~ 3 minutes. Our extensive evaluations demonstrate that our method achieves high-quality reconstruction results, robust generalization, and good efficiency compared to prior works.




    BiRD: Using Bidirectional Rotation Gain Differences to Redirect Users during Back-and-forth Head Turns in Walking
    IEEE Transactions on Visualization and Computer Graphics, Vol. 30, No. 4, 1916-1926.    
    Sen-Zhe Xu, Fiona Xiao Yu Chen, Ran Gong, Fang-Lue Zhang, Song-Hai Zhang

    Redirected walking (RDW) facilitates user navigation within expansive virtual spaces despite the constraints of limited physical spaces. It employs discrepancies between human visual-proprioceptive sensations, known as gains, to enable the remapping of virtual and physical environments. In this paper, we explore how to apply rotation gain while the user is walking. We propose to apply a rotation gain to let the user rotate by a different angle when reciprocating from a previous head rotation, to achieve the aim of steering the user to a desired direction. To apply the gains imperceptibly based on such a Bidirectional Rotation gain Difference (BiRD), we conduct both measurement and verification experiments on the detection thresholds of the rotation gain for reciprocating head rotations during walking. Unlike previous rotation gains which are measured when users are turning around in place (standing or sitting), BiRD is measured during users' walking. Our study offers a critical assessment of the acceptable range of rotational mapping differences for different rotational orientations across the user's walking experience, contributing to an effective tool for redirecting users in virtual environments.




    Spatial Contraction Based on Velocity Variation for Natural Walking in Virtual Reality
    IEEE Transactions on Visualization and Computer Graphics, Vol. 30, No. 5, 2444-2453.    
    Sen-Zhe Xu, Kui Huang, Cheng-Wei Fan, Song-Hai Zhang

    Virtual Reality (VR) offers an immersive 3D digital environment, but enabling natural walking sensations without the constraints of physical space remains a technological challenge. Previous VR locomotion methods, including game controller, teleportation, treadmills, walking-in-place, and redirected walking (RDW), have made strides towards overcoming this challenge. However, these methods also face limitations such as possible unnaturalness, additional hardware requirements, or motion sickness risks. This paper introduces ¡°Spatial Contraction (SC)¡±, an innovative VR locomotion method inspired by the phenomenon of Lorentz contraction in Special Relativity. Similar to the Lorentz contraction, our SC contracts the virtual space along the user's velocity direction in response to velocity variation. The virtual space contracts more when the user's speed is high, whereas minimal or no contraction happens at low speeds. We provide a virtual space transformation method for spatial contraction and optimize the user experience in smoothness and stability. Through SC, VR users can effectively traverse a longer virtual distance with a shorter physical walking. Different from locomotion gains, the spatial contraction effect is observable by the user and aligns with their intentions, so there is no inconsistency between the user's proprioception and visual perception. SC is a general locomotion method that has no special requirements for VR scenes. The experimental results of our live user studies in various virtual scenarios demonstrate that SC has a significant effect in reducing both the number of resets and the physical walking distance users need to cover. Furthermore, experiments have also demonstrated that SC has the potential for integration with existing locomotion techniques such as RDW.




    Multi-User Redirected Walking in Separate Physical Spaces for Online VR Scenarios
    IEEE Transactions on Visualization and Computer Graphics, Vol. 30, No. 4, 1916-1926.    
    Sen-Zhe Xu, Jia-Hong Liu, Miao Wang, Fang-Lue Zhang, Song-Hai Zhang

    With the recent rise of Metaverse, online multiplayer VR applications are becoming increasingly prevalent worldwide. However, as multiple users are located in different physical environments, different reset frequencies and timings can lead to serious fairness issues for online collaborative/competitive VR applications. We propose a novel multi-user RDW method that is able to significantly reduce the overall reset number and give users a better immersive experience by providing a fair exploration. Our key idea is to first find out the ¡±bottleneck¡± user that may cause all users to be reset and estimate the time to reset given the users¡¯ next targets, and then redirect all the users to favorable poses during that maximized bottleneck time to ensure the subsequent resets can be postponed as much as possible. More particularly, we develop methods to estimate the time of possibly encountering obstacles and the reachable area for a specific pose to enable the prediction of the next reset caused by any user. Our experiments and user study found that our method outperforms existing RDW methods in online VR applications.



    Other publications in 2024

    1. Tai-Jiang Mu, Ming-Yuan Shen, Yu-Kun Lai, Shi-Min Hu, Learning Virtual View Selection for 3D Scene Semantic Segmentation, IEEE Transactions on Image Processing, 2024, Vol. 33, 4159-4172.   
    2. Guo-Ye Yang, George Kiyohiro Nakayama, Zi-Kai Xiao, Tai-Jiang Mu, Xiaolei Huang, Shi-Min Hu, Semantic-Aware Transformation-Invariant RoI Align, AAAI 2024: 6486-6493.   
    3. Yi Zhang, Meng-Hao Guo, Miao Wang, Shi-Min Hu, Exploring Regional Clues in CLIP for Zero-Shot Semantic Segmentation, IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, 3270-3280.    
    4. Xin Yu, Yuan-Chen Guo, Yangguang Li, Ding Liang, Song-Hai Zhang, Xiaojuan Qi, Text-to-3D with Classifier Score Distillation, ICLR 2024: 6486-6493.   



    2023





    Visual attention network
    Computational Visual Media, 2023, Vol. 9, No. 4, 733-752.    
    Meng-Hao Guo, Cheng-Ze Lu, Zheng-Ning Liu, Ming-Ming Cheng & Shi-Min Hu

    While originally designed for natural language processing tasks, the self-attention mechanism has recently taken various computer vision areas by storm. However, the 2D nature of images brings three challenges for applying self-attention in computer vision: (1) treating images as 1D sequences neglects their 2D structures; (2) the quadratic complexity is too expensive for high-resolution images; (3) it only captures spatial adaptability but ignores channel adaptability. In this paper, we propose a novel linear attention named large kernel attention (LKA) to enable self-adaptive and long-range correlations in self-attention while avoiding its shortcomings. Furthermore, we present a neural network based on LKA, namely Visual Attention Network (VAN). While extremely simple, VAN achieves comparable results with similar size convolutional neural networks (CNNs) and vision transformers (ViTs) in various tasks, including image classification, object detection, semantic segmentation, panoptic segmentation, pose estimation, etc. For example, VAN-B6 achieves 87.8% accuracy on ImageNet benchmark, and sets new state-of-the-art performance (58.2 PQ) for panoptic segmentation. Besides, VAN-B2 surpasses Swin-T 4 mIoU (50.1 vs. 46.1) for semantic segmentation on ADE20K benchmark, 2.6 AP (48.8 vs. 46.2) for object detection on COCO dataset. It provides a novel method and a simple yet strong baseline for the community. The code is available at https://github.com/Visual-Attention-Network.




    StructNeRF: Neural Radiance Fields for Indoor Scenes With Structural Hints
    IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, Vol. 45, No. 12, 15694-15705.    
    Zheng Chen, Chen Wang, Yuan-Chen Guo, Song-Hai Zhang

    Neural Radiance Fields (NeRF) achieve photo-realistic view synthesis with densely captured input images. However, the geometry of NeRF is extremely under-constrained given sparse views, resulting in significant degradation of novel view synthesis quality. Inspired by self-supervised depth estimation methods, we propose StructNeRF, a solution to novel view synthesis for indoor scenes with sparse inputs. StructNeRF leverages the structural hints naturally embedded in multi-view inputs to handle the unconstrained geometry issue in NeRF. Specifically, it tackles the texture and non-texture regions respectively: a patch-based multi-view consistent photometric loss is proposed to constrain the geometry of textured regions; for non-textured ones, we explicitly restrict them to be 3D consistent planes. Through the dense self-supervised depth constraints, our method improves both the geometry and the view synthesis performance of NeRF without any additional training on external data. Extensive experiments on several real-world datasets demonstrate that StructNeRF shows superior or comparable performance compared to state-of-the-art methods (e.g. NeRF, DSNeRF, RegNeRF, Dense Depth Priors, MonoSDF, etc.) for indoor scenes with sparse inputs both quantitatively and qualitatively.




    Recursive-NeRF: An Efficient and Dynamically Growing NeRF
    IEEE Transactions on Visualization and Computer Graphics, 2023, Vol. 29, No. 12, 5124-5136.    
    Guo-Wei Yang, Wen-Yang Zhou, Hao-Yang Peng, Dun Liang, Tai-Jiang Mu, Shi-Min Hu

    Neural Radiance Fields (NeRF) achieve photo-realistic view synthesis with densely captured input images. However, the geometry of NeRF is extremely under-constrained given sparse views, resulting in significant degradation of novel view synthesis quality. Inspired by self-supervised depth estimation methods, we propose StructNeRF, a solution to novel view synthesis for indoor scenes with sparse inputs. StructNeRF leverages the structural hints naturally embedded in multi-view inputs to handle the unconstrained geometry issue in NeRF. Specifically, it tackles the texture and non-texture regions respectively: a patch-based multi-view consistent photometric loss is proposed to constrain the geometry of textured regions; for non-textured ones, we explicitly restrict them to be 3D consistent planes. Through the dense self-supervised depth constraints, our method improves both the geometry and the view synthesis performance of NeRF without any additional training on external data. Extensive experiments on several real-world datasets demonstrate that StructNeRF shows superior or comparable performance compared to state-of-the-art methods (e.g. NeRF, DSNeRF, RegNeRF, Dense Depth Priors, MonoSDF, etc.) for indoor scenes with sparse inputs both quantitatively and qualitatively.




    DiffFacto: Controllable Part-Based 3D Point Cloud Generation with Cross Diffusion
    IEEE/CVF International Conference on Computer Vision, 2023, 14211-14221.    
    George Kiyohiro Nakayama; Mikaela Angelina Uy; Jiahui Huang; Shi-Min Hu; Ke Li; Leonidas Guibas

    While the community of 3D point cloud generation has witnessed a big growth in recent years, there still lacks an effective way to enable intuitive user control in the generation process, hence limiting the general utility of such methods. Since an intuitive way of decomposing a shape is through its parts, we propose to tackle the task of controllable part-based point cloud generation. We introduce DiffFacto, a novel probabilistic generative model that learns the distribution of shapes with part-level control. We propose a factorization that models independent part style and part configuration distributions, and present a novel cross diffusion network that enables us to generate coherent and plausible shapes under our proposed factorization. Experiments show that our method is able to generate novel shapes with multiple axes of control. It achieves state-of-the-art part-level generation quality and generates plausible and coherent shape while enabling various downstream editing applications such as shape interpolation, mixing, and transformation editing. Please visit our project webpage at https://difffacto.github.io/




    Beyond Self-Attention: External Attention Using Two Linear Layers for Visual Tasks
    IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, Vol. 45, No. 5, 5436-5447.    
    Meng-Hao Guo, Zheng-Ning Liu, Tai-Jiang Mu, Shi-Min Hu

    Attention mechanisms, especially self-attention, have played an increasingly important role in deep feature representation for visual tasks. Self-attention updates the feature at each position by computing a weighted sum of features using pair-wise affinities across all positions to capture the long-range dependency within a single sample. However, self-attention has quadratic complexity and ignores potential correlation between different samples. This article proposes a novel attention mechanism which we call external attention , based on two external, small, learnable, shared memories, which can be implemented easily by simply using two cascaded linear layers and two normalization layers; it conveniently replaces self-attention in existing popular architectures. External attention has linear complexity and implicitly considers the correlations between all data samples. We further incorporate the multi-head mechanism into external attention to provide an all-MLP architecture, external attention MLP (EAMLP), for image classification. Extensive experiments on image classification, object detection, semantic segmentation, instance segmentation, image generation, and point cloud analysis reveal that our method provides results comparable or superior to the self-attention mechanism and some of its variants, with much lower computational and memory costs.




    Adaptive Optimization Algorithm for Resetting Techniques in Obstacle-Ridden Environments
    IEEE Transactions on Visualization and Computer Graphics, 2023, Vol. 29, No. 4, 1977-1991.    
    Song-Hai Zhang, Chia-Hao Chen, Fu Zheng, Yong-Liang Yang, Shi-Min Hu

    Redirected Walking (RDW) algorithms aim to impose several types of gains on users immersed in Virtual Reality and distort their walking paths in the real world, thus enabling them to explore a larger space. Since collision with physical boundaries is inevitable, a reset strategy needs to be provided to allow users to reset when they hit the boundary. However, most reset strategies are based on simple heuristics by choosing a seemingly suitable solution, which may not perform well in practice. In this article, we propose a novel optimization-based reset algorithm adaptive to different RDW algorithms. Inspired by the approach of finite element analysis, our algorithm splits the boundary of the physical world by a set of endpoints. Each endpoint is assigned a reset vector to represent the optimized reset direction when hitting the boundary. The reset vectors on the edge will be determined by the interpolation between two neighbouring endpoints. We conduct simulation-based experiments for three RDW algorithms with commonly used reset algorithms to compare with. The results demonstrate that the proposed algorithm significantly reduces the number of resets.




    Real-Time Globally Consistent 3D Reconstruction With Semantic Priors
    IEEE Transactions on Visualization and Computer Graphics, 2023, Vol. 29, No. 4, 1977-1991.    
    Shi-Sheng Huang, Haoxiang Chen, Jiahui Huang, Hongbo Fu, Shi-Min Hu

    Maintaining global consistency continues to be critical for online 3D indoor scene reconstruction. However, it is still challenging to generate satisfactory 3D reconstruction in terms of global consistency for previous approaches using purely geometric analysis, even with bundle adjustment or loop closure techniques. In this article, we propose a novel real-time 3D reconstruction approach which effectively integrates both semantic and geometric cues. The key challenge is how to map this indicative information, i.e., semantic priors, into a metric space as measurable information, thus enabling more accurate semantic fusion leveraging both the geometric and semantic cues. To this end, we introduce a semantic space with a continuous metric function measuring the distance between discrete semantic observations. Within the semantic space, we present an accurate frame-to-model semantic tracker for camera pose estimation, and semantic pose graph equipped with semantic links between submaps for globally consistent 3D scene reconstruction. With extensive evaluation on public synthetic and real-world 3D indoor scene RGB-D datasets, we show that our approach outperforms the previous approaches for 3D scene reconstruction both quantitatively and qualitatively, especially in terms of global consistency.




    Multiway Non-Rigid Point Cloud Registration via Learned Functional Map Synchronization
    IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, Vol. 45, No. 2, 2038 - 2053.    
    Jiahui Huang, Tolga Birdal, Zan Gojcic, Leonidas J. Guibas, Shi-Min Hu

    We present SyNoRiM, a novel way to jointly register multiple non-rigid shapes by synchronizing the maps that relate learned functions defined on the point clouds. Even though the ability to process non-rigid shapes is critical in various applications ranging from computer animation to 3D digitization, the literature still lacks a robust and flexible framework to match and align a collection of real, noisy scans observed under occlusions. Given a set of such point clouds, our method first computes the pairwise correspondences parameterized via functional maps. We simultaneously learn potentially non-orthogonal basis functions to effectively regularize the deformations, while handling the occlusions in an elegant way. To maximally benefit from the multi-way information provided by the inferred pairwise deformation fields, we synchronize the pairwise functional maps into a cycle-consistent whole thanks to our novel and principled optimization formulation. We demonstrate via extensive experiments that our method achieves a state-of-the-art performance in registration accuracy, while being flexible and efficient as we handle both non-rigid and multi-body cases in a unified framework and avoid the costly optimization over point-wise permutations by the use of basis function maps.



    2022





    A Neural Galerkin Solver for Accurate Surface Reconstruction
    ACM Transactions on Graphics, 2022, Vol. 41, No. 6, article no. 229.    
    Jiahui Huang, Hao-Xiang Chen, Shi-Min Hu

    We present SyNoRiM, a novel way to jointly register multiple non-rigid shapes by synchronizing the maps that relate learned functions defined on the point clouds. Even though the ability to process non-rigid shapes is critical in various applications ranging from computer animation to 3D digitization, the literature still lacks a robust and flexible framework to match and align a collection of real, noisy scans observed under occlusions. Given a set of such point clouds, our method first computes the pairwise correspondences parameterized via functional maps. We simultaneously learn potentially non-orthogonal basis functions to effectively regularize the deformations, while handling the occlusions in an elegant way. To maximally benefit from the multi-way information provided by the inferred pairwise deformation fields, we synchronize the pairwise functional maps into a cycle-consistent whole thanks to our novel and principled optimization formulation. We demonstrate via extensive experiments that our method achieves a state-of-the-art performance in registration accuracy, while being flexible and efficient as we handle both non-rigid and multi-body cases in a unified framework and avoid the costly optimization over point-wise permutations by the use of basis function maps.




    Context-Consistent Generation of Indoor Virtual Environments based on Geometry Constraints
    IEEE Transactions on Visualization and Computer Graphics, 2022, Vol. 28, No. 12, 3986-3999.   
    Yu He, Yingtian Liu, Yihan Jin, Song-Hai Zhang, Yu-Kun Lai, Shi-Min Hu

    In this paper, we propose a system that can automatically generate immersive and interactive virtual reality (VR) scenes by taking real-world geometric constraints into account. Our system can not only help users avoid real-world obstacles in virtual reality experiences, but also provide context-consistent contents to preserve their sense of presence. To do so, our system first identifies the positions and bounding boxes of scene objects as well as a set of interactive planes from 3D scans. Then context-compatible virtual objects that have similar geometric properties to the real ones can be automatically selected and placed into the virtual scene, based on learned object association relations and layout patterns from large amounts of indoor scene configurations. We regard virtual object replacement as a combinatorial optimization problem, considering both geometric and contextual consistency constraints. Quantitative and qualitative results show that our system can generate plausible interactive virtual scenes that highly resemble real environments, and have the ability to keep the sense of presence for users in their VR experiences.




    SegNeXt: rethinking convolutional attention design for semantic segmentation
    The 36th International Conference on Neural Information Processing Systems, 2022, article No. 84, 1140-1156.   
    Meng-Hao Guo, Cheng-Ze Lu, Qibin Hou, Zheng-Ning Liu, Ming-Ming Cheng, Shi-Min Hu

    We present SegNeXt, a simple convolutional network architecture for semantic segmentation. Recent transformer-based models have dominated the field of semantic segmentation due to the efficiency of self-attention in encoding spatial information. In this paper, we show that convolutional attention is a more Efficient and effective way to encode contextual information than the self-attention mechanism in transformers. By re-examining the characteristics owned by successful segmentation models, we discover several key components leading to the performance improvement of segmentation models. This motivates us to design a novel convolutional attention network that uses cheap convolutional operations. Without bells and whistles, our SegNeXt significantly improves the performance of previous state-of-the-art methods on popular benchmarks, including ADE20K, Cityscapes, COCO-Stuff, Pascal VOC, Pascal Context, and iSAID. Notably, SegNeXt outperforms EfficientNet-L2 w/ NAS-FPN and achieves 90.6% mIoU on the Pascal VOC 2012 test leaderboard using only 1/10 parameters of it. On average, SegNeXt achieves about 2.0% mIoU improvements compared to the state-of-the-art methods on the ADE20K datasets with the same or fewer computations.




    NeRF-SR: High Quality Neural Radiance Fields using Supersampling
    Proceedings of the 30th ACM International Conference on Multimedia, 2022, 6445-6454.   
    Chen Wang, Xian Wu, Yuan-Chen Guo, Song-Hai Zhang, Yu-Wing Tai, Shi-Min Hu

    We present NeRF-SR, a solution for high-resolution (HR) novel view synthesis with mostly low-resolution (LR) inputs. Our method is built upon Neural Radiance Fields (NeRF) that predicts per-point density and color with a multi-layer perceptron. While producing images at arbitrary scales, NeRF struggles with resolutions that go beyond observed images. Our key insight is that NeRF benefits from 3D consistency, which means an observed pixel absorbs information from nearby views. We first exploit it by a super-sampling strategy that shoots multiple rays at each image pixel, which further enforces multi-view constraint at a sub-pixel level. Then, we show that NeRF-SR can further boost the performance of super-sampling by a refinement network that leverages the estimated depth at hand to hallucinate details from related patches on only one HR reference image. Experiment results demonstrate that NeRF-SR generates high-quality results for novel view synthesis at HR on both synthetic and real-world datasets without any external information. Project page: https://cwchenwang.github.io/NeRF-SR




    Attention mechanisms in computer vision: A survey
    Computational Visual Media, 2022, Vol. 8, No. 3, 331-368.   
    Meng-Hao Guo, Tian-Xing Xu, Jiang-Jiang Liu, Zheng-Ning Liu, Peng-Tao Jiang, Tai-Jiang Mu, Song-Hai Zhang, Ralph R. Martin, Ming-Ming Cheng & Shi-Min Hu

    Humans can naturally and effectively find salient regions in complex scenes. Motivated by this observation, attention mechanisms were introduced into computer vision with the aim of imitating this aspect of the human visual system. Such an attention mechanism can be regarded as a dynamic weight adjustment process based on features of the input image. Attention mechanisms have achieved great success in many visual tasks, including image classification, object detection, semantic segmentation, video understanding, image generation, 3D vision, multimodal tasks, and self-supervised learning. In this survey, we provide a comprehensive review of various attention mechanisms in computer vision and categorize them according to approach, such as channel attention, spatial attention, temporal attention, and branch attention; a related repository https://github.com/MenghaoGuo/Awesome-Vision-Attentions is dedicated to collecting related work. We also suggest future directions for attention mechanism research.




    Fast 3D Indoor Scene Synthesis by Learning Spatial Relation Priors of Objects
    IEEE Transactions on Visualization and Computer Graphics, 2022, Vol. 28, No. 9, 3082-3092.   
    Song-Hai Zhang, Shao-Kui Zhang, Wei-Yu Xie, Cheng-Yang Luo, Yongliang Yang, Hongbo Fu

    We present a framework for fast synthesizing indoor scenes, given a room geometry and a list of objects with learnt priors.Unlike existing data-driven solutions, which often learn priors by co-occurrence analysis and statistical model fitting, our methodmeasures the strengths of spatial relations by tests for complete spatial randomness (CSR), and learns discrete priors based onsamples with the ability to accurately represent exact layout patterns. With the learnt priors, our method achieves both acceleration andplausibility by partitioning the input objects into disjoint groups, followed by layout optimization using position-based dynamics (PBD)based on the Hausdorff metric. Experiments show that our framework is capable of measuring more reasonable relations amongobjects and simultaneously generating varied arrangements in seconds compared with the state-of-the-art works.




    Subdivision-Based Mesh Convolution Networks
    ACM Transactions on Graphics, 2022, Vol. 41, No. 3, article no. 25.    
    Shi-Min Hu, Zheng-Ning Liu, Meng-Hao Guo, Jun-Xiong Cai, Jiahui Huang, Tai-Jiang Tai, Ralph R. Martin

    Convolutional neural networks (CNNs) have made great breakthroughs in 2D computer vision. However, their irregular structure makes it hard to harness the potential of CNNs directly on meshes. A subdivision surface provides a hierarchical multi-resolution structure, in which each face in a closed 2-manifold triangle mesh is exactly adjacent to three faces. Motivated by these two observations, this paper presents SubdivNet, an innovative and versatile CNN framework for 3D triangle meshes with Loop subdivision sequence connectivity. Making an analogy between mesh faces and pixels in a 2D image allows us to present a mesh convolution operator to aggregate local features from nearby faces. By exploiting face neighborhoods, this convolution can support standard 2D convolutional network concepts, e.g. variable kernel size, stride, and dilation. Based on the multi-resolution hierarchy, we make use of pooling layers which uniformly merge four faces into one and an upsampling method which splits one face into four. Thereby, many popular 2D CNN architectures can be easily adapted to process 3D meshes. Meshes with arbitrary connectivity can be remeshed to have Loop subdivision sequence connectivity via self-parameterization, making SubdivNet a general approach. Extensive evaluation and various applications demonstrate SubdivNet¡¯s effectiveness and efficiency.



    2021





    Fast and accurate spherical harmonics products
    ACM Transactions on Graphics, 2021, Vol. 40, No. 6. article no. 280.    
    Hanggao Xin, Zhiqian Zhou, Di An, Ling-Qi Yan, Kun Xu, Shi-Min Hu, Shing-Tung Yau

    Spherical Harmonics (SH) have been proven as a powerful tool for rendering, especially in real-time applications such as Precomputed Radiance Transfer (PRT). Spherical harmonics are orthonormal basis functions and are efficient in computing dot products. However, computations of triple product and multiple product operations are often the bottlenecks that prevent moderately high-frequency use of spherical harmonics. Specifically state-of-the-art methods for accurate SH triple products of order n have a time complexity of O(n5), which is a heavy burden for most real-time applications. Even worse, a brute-force way to compute k-multiple products would take O(n2k) time. In this paper, we propose a fast and accurate method for spherical harmonics triple products with the time complexity of only O(n3), and further extend it for computing k-multiple products with the time complexity of O(kn3 + k2n2 log(kn)). Our key insight is to conduct the triple and multiple products in the Fourier space, in which the multiplications can be performed much more efficiently. To our knowledge, our method is theoretically the fastest for accurate spherical harmonics triple and multiple products. And in practice, we demonstrate the efficiency of our method in rendering applications including mid-frequency relighting and shadow fields.




    DI-Fusion: Online Implicit 3D Reconstruction with Deep Priors
    IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, 8932-8941.    
    Jiahui Huang, Shi-Sheng Huang, Haoxuan Song, Shi-Min Hu

    Previous online 3D dense reconstruction methods struggle to achieve the balance between memory storage and surface quality, largely due to the usage of stagnant underlying geometry representation, such as TSDF (truncated signed distance functions) or surfels, without any knowledge of the scene priors. In this paper, we present DI-Fusion (Deep Implicit Fusion), based on a novel 3D representation, i.e. Probabilistic Local Implicit Voxels (PLIVoxs), for online 3D reconstruction with a commodity RGB-D camera. Our PLIVox encodes scene priors considering both the local geometry and uncertainty parameterized by a deep neural network. With such deep priors, we are able to perform online implicit 3D reconstruction achieving state-ofthe-art camera trajectory estimation accuracy and mapping quality, while achieving better storage efficiency compared with previous online 3D reconstruction approaches. Our implementation is available at https://www.github. com/huangjh-pub/di-fusion.




    MultiBodySync: Multi-Body Segmentation and Motion Estimation via 3D Scan Synchronization
    IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, 7108-7118.    
    Jiahui Huang, He Wang, Tolga Birdal, Minhyuk Sung, Federica Arrigoni, Shi-Min Hu, Leonidas Guibas

    We present MultiBodySync, a novel, end-to-end trainable multi-body motion segmentation and rigid registration framework for multiple input 3D point clouds. The two non-trivial challenges posed by this multi-scan multibody setting that we investigate are: (i) guaranteeing correspondence and segmentation consistency across multiple input point clouds capturing different spatial arrangements of bodies or body parts; and (ii) obtaining robust motion-based rigid body segmentation applicable to novel object categories. We propose an approach to address these issues that incorporates spectral synchronization into an iterative deep declarative network, so as to simultaneously recover consistent correspondences as well as motion segmentation. At the same time, by explicitly disentangling the correspondence and motion segmentation estimation modules, we achieve strong generalizability across different object categories. Our extensive evaluations demonstrate that our method is effective on various datasets ranging from rigid parts in articulated objects to individually moving objects in a 3D scene, be it single-view or full point clouds. Code at https://github.com/ huangjh-pub/multibody-sync.




    Sketch2Model: View-Aware 3D Modeling from Single Free-Hand Sketches
    IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, 6012-6021.    
    Song-Hai Zhang, Yuan-Chen Guo, Qing-Wen Gu

    We investigate the problem of generating 3D meshes from single free-hand sketches, aiming at fast 3D modeling for novice users. It can be regarded as a single-view reconstruction problem, but with unique challenges, brought by the variation and conciseness of sketches. Ambiguities in poorly-drawn sketches could make it hard to determine how the sketched object is posed. In this paper, we address the importance of viewpoint specification for overcoming such ambiguities, and propose a novel view-aware generation approach. By explicitly conditioning the generation process on a given viewpoint, our method can generate plausible shapes automatically with predicted viewpoints, or with specified viewpoints to help users better express their intentions. Extensive evaluations on various datasets demonstrate the effectiveness of our view-aware design in solving sketch ambiguities and improving reconstruction quality.




    ChoreoMaster: Choreography-Oriented Music-Driven Dance Synthesis
    ACM Transactions on Graphics, 2021, Vol. 40, No.4, artice no. 145, pages 1-13.   
    Kang Chen, Zhipeng Tan, Jin Lei, Song-Hai Zhang, Yuan-Chen Guo, Weidong Zhang, Shi-Min Hu

    Despite strong demand in the game and film industry, automatically synthesizing high-quality dance motions remains a challenging task. In this paper, we present ChoreoMaster, a production-ready music-driven dance motion synthesis system. Given a piece of music, ChoreoMaster can automatically generate a high-quality dance motion sequence to accompany the input music in terms of style, rhythm and structure. To achieve this goal, we introduce a novel choreography-oriented choreomusical embedding framework, which successfully constructs a unified choreomusical embedding space for both style and rhythm relationships between music and dance phrases. The learned choreomusical embedding is then incorporated into a novel choreography-oriented graph-based motion synthesis framework, which can robustly and efficiently generate high-quality dance motions following various choreographic rules. Moreover, as a production-ready system, ChoreoMaster is sufficiently controllable and comprehensive for users to produce desired results. Experimental results demonstrate that dance motions generated by ChoreoMaster are accepted by professional artists.




    MoCap-Solver: A Neural Solver for Optical Motion Capture Data
    ACM Transactions on Graphics, 2021, Vol. 40, No.4, artice no. 84, pages 1-11.   
    Kang Chen, Yupan Wang, Song-Hai Zhang, Sen-Zhe Xu, Weidong Zhang, Shi-Min Hu

    In a conventional optical motion capture (MoCap) workflow, two processes are needed to turn captured raw marker sequences into correct skeletal animation sequences. Firstly, various tracking errors present in the markers must be fixed (cleaning or refining). Secondly, an agent skeletal mesh must be prepared for the actor/actress, and used to determine skeleton information from the markers (re-targeting or solving). The whole process, normally referred to as solving MoCap data, is extremely time-consuming, labor-intensive, and usually the most costly part of animation production. Hence, there is a great demand for automated tools in industry. In this work, we present MoCap-Solver, a production-ready neural solver for optical MoCap data. It can directly produce skeleton sequences and clean marker sequences from raw MoCap markers, without any tedious manual operations. To achieve this goal, our key idea is to make use of neural encoders concerning three key intrinsic components: the template skeleton, marker configuration and motion, and to learn to predict these latent vectors from imperfect marker sequences containing noise and errors. By decoding these components from latent vectors, sequences of clean markers and skeletons can be directly recovered. Moreover, we also provide a novel normalization strategy based on learning a pose-dependent marker reliability function, which greatly improves system robustness. Experimental results demonstrate that our algorithm consistently outperforms the state-of-the-art on both synthetic and real-world datasets




    MageAdd: Real-Time Interaction Simulation for Scene Synthesis
    Proceedings of the 29th ACM International Conference on Multimedia, October 2021, 965-973.    (click for project webpage in Github)
    Shao-Kui Zhang, Yi-Xiao Li, Yu He, Yong-Liang Yang, Song-Hai Zhang

    While recent researches on computational 3D scene synthesis have achieved impressive results, automatically synthesized scenes do not guarantee satisfaction of end users. On the other hand, manual scene modelling can always ensure high quality, but requires a cumbersome trial-and-error process. In this paper, we bridge the above gap by presenting a data-driven 3D scene synthesis framework that can intelligently infer objects to the scene by incorporating and simulating user preferences with minimum input. While the cursor is moved and clicked in the scene, our framework automatically selects and transforms suitable objects into scenes in real time. This is based on priors learnt from the dataset for placing different types of objects, and updated according to the current scene context. Through extensive experiments we demonstrate that our framework outperforms the state-of-the-art on result aesthetics, and enables effective and efficient user interactions.




    Supervoxel Convolution for Online 3D Semantic Segmentation
    ACM Transactions on Graphics, 2021, Vol. 40, No. 3, article No. 34, pages 1-15.   
    Shi-Sheng Huang, Ze-Yu Ma, Tai-Jiang Ma, Hongbo FU, Shi-Min Hu

    Online 3D semantic segmentation, which aims to perform real-time 3D scene reconstruction along with semantic segmentation, is an important but challenging topic. A key challenge is to strike a balance between efficiency and segmentation accuracy. There are very few deep learning based solutions to this problem, since the commonly used deep representations based on volumetric-grids or points do not provide efficient 3D representation and organization structure for online segmentation. Observing that on-surface supervoxels, i.e., clusters of on-surface voxels, provide a compact representation of 3D surfaces and brings efficient connectivity structure via supervoxel clustering, we explore a supervoxel-based deep learning solution for this task. To this end, we contribute a novel convolution operation (SVConv) directly on supervoxels. SVConv can efficiently fuse the multi-view 2D features and 3D features projected on supervoxels during the online 3D reconstruction, and leads to an effective supervoxel-based convolutional neural network, termed as Supervoxel-CNN, enabling 2D-3D joint learning for 3D semantic prediction. With the Supervoxel-CNN, we propose a clustering-then-prediction online 3D semantic segmentation approach. The extensive evaluations on the public 3D indoor scene datasets show that our approach significantly outperforms the existing online semantic segmentation systems in terms of efficiency or accuracy.




    PCT: Point cloud transformer
    Computational Visual Media, 2021, Vol. 7, No. 2, 187-199.   
    Meng-Hao Guo, Jun-Xiong Cai, Zheng-Ning Liu, Tai-Jiang Mu, Ralph R. Martin & Shi-Min Hu

    The irregular domain and lack of ordering make it challenging to design deep neural networks for point cloud processing. This paper presents a novel framework named Point Cloud Transformer (PCT) for point cloud learning. PCT is based on Transformer, which achieves huge success in natural language processing and displays great potential in image processing. It is inherently permutation invariant for processing a sequence of points, making it well-suited for point cloud learning. To better capture local context within the point cloud, we enhance input embedding with the support of farthest point sampling and nearest neighbor search. Extensive experiments demonstrate that the PCT achieves the state-of-the-art performance on shape classification, part segmentation, semantic segmentation, and normal estimation tasks.




    Prominent Structures for Video Analysis and Editing
    IEEE Transactions on Visualization and Computer Graphics, 2021, Vol. 27, No. 7, 3305-3317.   
    Miao Wang, Xiao-Nan Fang, Guo-Wei Yang, Ariel Shamir, Shi-Min Hu

    We present prominent structures in video, a representation of visually strong, spatially sparse and temporally stable structural units, for use in video analysis and editing. With a novel quality measurement of prominent structures in video, we develop a general framework for prominent structure computation, and an ef?cient hierarchical structure alignment algorithm between a pair of videos. The prominent structural unit map is proposed to encode both binary prominence guidances and numerical strength and geometry details for each video frame. Even though the detailed appearance of videos could be visually different, the proposed alignment algorithm can ?nd candidate matched prominent structure sub-volumes. Prominent structures in video support a wide range of video analysis and editing applications including graphic match-cut between successive videos, instant cut editing, finding transition portals from a video collection, structure-aware video re-ranking, visualizing human action differences, etc.




    High-quality Textured 3D Shape Reconstruction with Cascaded Fully Convolutional Networks
    IEEE Transactions on Visualization and Computer Graphics, 2021, Vol. 27, No.1, 83-97.   
    Zheng-Ning Liu, Yan-Pei Cao, Zheng-Fei Kuang, Leif Kobbelt, Shi-Min Hu

    We present a learning-based approach to reconstructing high-resolution three-dimensional (3D) shapes with detailed geometry and high-?delity textures. Albeit extensively studied, algorithms for 3D reconstruction from multi-view depth-and-color (RGB-D) scans are still prone to measurement noise and occlusions; limited scanning or capturing angles also often lead to incomplete reconstructions. Propelled by recent advances in 3D deep learning techniques, in this paper, we introduce a novel computation and memory efficient cascaded 3D convolutional network architecture, which learns to reconstruct implicit surface representations as well as the corresponding color information from noisy and imperfect RGB-D maps. The proposed 3D neural network performs reconstruction in a progressive and coarse-to-?ne manner, achieving unprecedented output resolution and ?delity. Meanwhile, an algorithm for end-to-end training of the proposed cascaded structure is developed. We further introduce Human10, a newly created dataset containing both detailed and textured full body reconstructions as well as corresponding raw RGB-D scans of 10 subjects. Qualitative and quantitative experimental results on both synthetic and real-world datasets demonstrate that the presented approach outperforms existing state-of-the-art work regarding visual quality and accuracy of reconstructed models.




    Other publications in 2021

    1. Haoxuan Song, Jiahui Huang, Yan-Pei Cao, Tai-Jiang Mu, HDR-Net-Fusion: Real-time 3D dynamic scene reconstruction with a hierarchical deep reinforcement network, Computational Visual Media, 2021, Vol. 7, No. 4, 419-435.   
    2. Xian Wu, Chen Li, Shi-Min Hu, Yu-Wing Tai, Hierarchical Generation of Human Pose With Part-Based Layer Representation, IEEE Transactions on Image Processing, 2021, Vol. 30, 7856-7866.   
    3. Meng-Hao Guo, Zheng-Ning Liu, Tai-Jiang Mu, Dun Liang, Ralph R. Martin, Shi-Min Hu, Can attention enable MLPs to catch up with CNNs? Computational Visual Media, 2021, Vol. 7, No. 3, 283-288.   
    4. Shaokui Zhang, Wei-Yu Xie, Song-Hai Zhang, Geometry-Based Layout Generation with Hyper-Relations AMONG Objects, Graphical Models, 2021, Vol. 116, article no. 101104.  
    5. Hanchao Liu, Tai-Jiang Mu, Xiaolei Huang, Detecting human - object interaction with multi-level pairwise feature network, Computational Visual Media, 2021, Vol. 7, No. 2, 229-239.   
    6. Wen-Yang Zhou, Guo-Wei Yang, Shi-Min Hu, Jittor-GAN: A fast-training generative adversarial network model zoo based on Jittor, Computational Visual Media, 2021, Vol. 7, No. 1, 153-157.   
    7. Jiahui Huang, Sheng Yang, Zishuo Zhao, Yu-Kun Lai, Shi-Min Hu, ClusterSLAM: A SLAM backend for simultaneous rigid body clustering and motion estimation, Computational Visual Media, 2021, Vol. 7, No. 1, 87-101 (Extended version of ICCV 2019 paper).   
    8. Junxiong Cai, Tai-Jiang Mu, Yu-Kun Lai, Shi-Min Hu, LinkNet: 2D-3D linked multi-modal network for online semantic segmentation of RGB-D videos, Computer & Graphics, 2021, Vol. 98, 37-47.    



    2020





    Jittor: a novel deep learning framework with meta-operators and unified graph execution
    Science China Information Science, 2020, Vol. 63, Article No. 222103, 1-21.    (click for project webpage in Github)
    Shi-Min Hu, Dun Liang, Guo-Ye Yang, Guo-Wei Yang & Wen-Yang Zhou

    This paper introduces Jittor, a fully just-in-time (JIT) compiled deep learning framework. With JIT compilation, we can achieve higher performance while making systems highly customizable. Jittor provides classes of Numpy-like operators, which we call meta-operators. A deep learning model built upon these meta-operators is compiled into high-performance CPU or GPU code in real-time. To manage metaoperators, Jittor uses a highly optimized way of executing computation graphs, which we call unified graph execution. This approach is as easy to use as dynamic graph execution yet has the efficiency of static graph execution. It also provides other improvements, including operator fusion, cross iteration fusion, and unified memory.




    A Moving Least Square Reproducing Kernel Particle Method for Unified Multiphase Continuum Simulation
    ACM Transactions on Graphics, 2020, Vol. 39, No.6, Article No. 150, ACM SIGGRAPH ASIA 2020.   
    Xiao-Song Chen, Chen-Feng Li, Geng-Chen Cao, Yun-Tao Jiang and Shi-Min Hu

    In physically based-based animation, pure particle methods are popular due to their simple data structure, easy implementation, and convenient parallelization. As a pure particle-based method and using Galerkin discretization, the Moving Least Square Reproducing Kernel Method(MLSRK) was developed in engineering computation as a general numerical tool for solving PDEs. The basic idea of Moving Least Square(MLS) has also been used in computer graphics to estimatede formation gradient for deformable solids. Based on the seprevious studies, we propose a multiphase MLSRK framework that animates complex and coupled fluids and solids in a unified manner. Specifically, we use the Cauchy momentum equation and phase field model to uniformly capture the momentum balance and phase evolution/interaction in a multiphase system, and systematically formulate the MLSRK discretization to support general multiphase constitutive models. A series of animation examples are presented to demonstrate the performance of our new multiphase MLSRK framework,including hyperelastic, elastoplastic, viscous, fracturing and multiphase coupling behaviours etc.




    HeteroFusion: Dense Scene Reconstruction Integrating Multi-sensors
    IEEE Transactions on Visualization and Computer Graphics, 2020, Vol. 26, No. 11, 3217-3230.   
    Sheng Yang, Beichen Li, Minghua Liu, Yu-Kun Lai, Leif Kobbelt, Shi-Min Hu

    We present a novel approach to integrate data from multiple sensor types for dense 3D reconstruction of indoor scenes in realtime. Existing algorithms are mainly based on a single RGBD camera and thus require continuous scanning of areas with sufficient geometric features. Otherwise, tracking may fail due to unreliable frame registration. Inspired by the fact that the fusion of multiple sensors can combine their strengths towards a more robust and accurate self-localization, we incorporate multiple types of sensors which are prevalent in modern robot systems, including a 2D range sensor, an inertial measurement unit (IMU), and wheel encoders. We fuse their measurements to reinforce the tracking process and to eventually obtain better 3D reconstructions. Specifically, we develop a 2D truncated signed distance field (TSDF) volume representation for the integration and ray-casting of laser frames, leading to a unified cost function in the pose estimation stage. For validation of the estimated poses in the loop-closure optimization process, we train a classifier for the features extracted from heterogeneous sensors during the registration progress. To evaluate our method on challenging use case scenarios, we assembled a scanning platform prototype to acquire real-world scans. We further simulated synthetic scans based on high-fidelity synthetic scenes for quantitative evaluation. Extensive experimental evaluation on these two types of scans demonstrate that our system is capable of robustly acquiring dense 3D reconstructions and outperforms state-of-the-art RGBD and LiDAR systems.




    Noise-Resilient Reconstruction of Panoramas and 3D Scenes Using Robot-Mounted Unsynchronized Commodity RGB-D Cameras
    ACM Transactions on Graphics, 2020, Vol. 39, No.5, Article 152.   
    Sheng Yang, Beichen Li, Yan-Pei Cao, Hongbo Fu, Yu-Kun Lai, Leif Kobbelt and Shi-Min Hu

    We present prominent structures in video, a representation of visually strong, spatially sparse and temporally stable structural units, for use in video analysis and editing. With a novel quality measurement of prominent structures in video, we develop a general framework for prominent structure computation, and an ef?cient hierarchical structure alignment algorithm between a pair of videos. The prominent structural unit map is proposed to encode both binary prominence guidances and numerical strength and geometry details for each video frame. Even though the detailed appearance of videos could be visually different, the proposed alignment algorithm can ?nd candidate matched prominent structure sub-volumes. Prominent structures in video support a wide range of video analysis and editing applications including graphic match-cut between successive videos, instant cut editing, finding transition portals from a video collection, structure-aware video re-ranking, visualizing human action differences, etc.




    Semantic Labeling and Instance Segmentation of 3D Point Clouds using Patch Context Analysis and Multiscale Processing
    IEEE Transactions on Visualization and Computer Graphics, 2020, Vol. 26, No. 07, 2485-2498.   
    Shi-Min Hu, Jun-Xiong Cai, Yu-Kun Lai

    We present a novel algorithm for semantic segmentation and labeling of 3D point clouds of indoor scenes, where objects in point clouds can have significant variations and complex configurations. Effective segmentation methods decomposing point clouds into semantically meaningful pieces are highly desirable for object recognition, scene understanding, scene modeling, etc. However, existing segmentation methods based on low-level geometry tend to either under-segment or over-segment point clouds. Our method takes a fundamentally different approach, where semantic segmentation is achieved along with labeling. To cope with substantial shape variation for objects in the same category, we first segment point clouds into surface patches and use unsupervised clustering to group patches in the training set into clusters, providing an intermediate representation for effectively learning patch relationships. During testing, we propose a novel patch segmentation and classification framework with multiscale processing, where the local segmentation level is automatically determined by exploiting the learned cluster based contextual information. Our method thus produces robust patch segmentation and semantic labeling results, avoiding parameter sensitivity. We further learn object-cluster relationships from the training set, and produce semantically meaningful object level segmentation. Our method outperforms state-of-the-art methods on several representative point cloud datasets, including S3DIS, SceneNN, Cornell RGB-D and ETH.





    ClusterVO: Clustering Moving Instances and Estimating Visual Odometry for Self and Surroundings
    IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, 2168-2177.   
    Jiahui Huang, Sheng Yang, Tai-Jiang Mu and Shi-Min Hu

    We present ClusterVO, a stereo Visual Odometry which simultaneously clusters and estimates the motion of both ego and surrounding rigid clusters/objects. Unlike previous solutions relying on batch input or imposing priors on scene structure or dynamic object models, ClusterVO is online, general and thus can be used in various scenarios including indoor scene understanding and autonomous driving. At the core of our system lies a multi-level probabilistic association mechanism and a heterogeneous Conditional Random Field (CRF) clustering approach combining semantic, spatial and motion information to jointly infer cluster segmentations online for every frame. The poses of camera and dynamic objects are instantly solved through a sliding-window optimization. Our system is evaluated on Oxford Multimotion and KITTI dataset both quantitatively and qualitatively, reaching comparable results to state-of-the-art solutions on both odometry and dynamic trajectory recovery.





    Unpaired Portrait Drawing Generation via Asymmetric Cycle Mapping
    IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, 8217-8225.   
    Ran Yi, Yong-Jin Liu, Yu-Kun Lai, Paul L. Rosin

    Portrait drawing is a common form of art with high abstraction and expressiveness. Due to its unique characteristics, existing methods achieve decent results only with paired training data, which is costly and time-consuming to obtain. In this paper, we address the problem of automatic transfer from face photos to portrait drawings with unpaired training data. We observe that due to the signi?cant imbalance of information richness between photos and drawings, existing unpaired transfer methods such as CycleGAN tend to embed invisible reconstruction information indiscriminately in the whole drawings, leading to important facial features partially missing in drawings. To address this problem, we propose a novel asymmetric cycle mapping that enforces the reconstruction information to be visible (by a truncation loss) and only embedded in selective facial regions (by a relaxed forward cycle-consistency loss). Along with localized discriminators for the eyes, nose and lips, our method well preserves all important facial features in the generated portrait drawings. By introducing a style classifier and taking the style vector into account, our method can learn to generate portrait drawings in multiple styles using a single network. Extensive experiments show that our model outperforms state-of-the-art methods





    Towards Better Generalization: Joint Depth-Pose Learning without PoseNet
    IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, 9151-9161.   
    Wang Zhao, Shaohui Liu, Yezhi Shu Yong-Jin Liu

    In this work, we tackle the essential problem of scale inconsistency for self-supervised joint depth-pose learning. Most existing methods assume that a consistent scale of depth and pose can be learned across all input samples, which makes the learning problem harder, resulting in degraded performance and limited generalization in indoor environments and long-sequence visual odometry application. To address this issue, we propose a novel system that explicitly disentangles scale from the network estimation. Instead of relying on PoseNet architecture, our method recovers relative pose by directly solving fundamental matrix from dense optical ?ow correspondence and makes use of a two-view triangulation module to recover an up-to-scale 3D structure. Then, we align the scale of the depth prediction with the triangulated point cloud and use the transformed depth map for depth error computation and dense reprojection check. Our whole system can be jointly trained end-to-end. Extensive experiments show that our system not only reaches state-of-the-art performance on KITTI depth and flow estimation, but also significantly improves the generalization ability of existing self-supervised depth-pose learning methods under a variety of challenging scenarios, and achieves state-of-the-art results among self-supervised learning-based methods on KITTI Odometry and NYUv2 dataset. Furthermore, we present some interesting ?ndings on the limitation of PoseNet-based relative pose estimation methods in terms of generalization ability. Code is available at https://github.com/B1ueber2y/TrianFlow.





    A Metric for Video Blending Quality Assessment
    IEEE Transactions on Image Processing, 2020, Vol. 29, 3014-3022.   
    Zhe Zhu, Hantao Liu, Jiaming Lu and Shi-Min Hu

    We propose an objective approach to assess the quality of video blending. Blending is a fundamental operation in video editing, which can smooth the intensity changes of relevant regions. However blending also generates artefacts such as bleeding and ghosting. To assess the quality of the blended videos, our approach considers the illuminance consistency as a positive aspect while regard the artefacts as a negative aspect. Temporal coherence between frames is also considered. We evaluate our metric on a video blending dataset where the results of subjective evaluation are available. Experimental results validate the effectiveness of our proposed metric, and shows that this metric gives superior performance over existing video quality metrics.





    Deep Portrait Image Completion and Extrapolation
    IEEE Transactions on Image Processing, 2020, Vol. 29, 2344-2355.   
    Xian Wu, Rui-Long Li, Fang-Lue Zhang, Jian-Cheng Liu, Jue Wang, Ariel Shamir and Shi-Min Hu

    General image completion and extrapolation methods often fail on portrait images where parts of the human body need to be recovered - a task that requires accurate human body structure and appearance synthesis. We present a twostage deep learning framework for tackling this problem. In the first stage, given a portrait image with an incomplete human body, we extract a complete, coherent human body structure through a human parsing network, which focuses on structure recovery inside the unknown region with the help of full-body pose estimation. In the second stage, we use an image completion network to ?ll the unknown region, guided by the structure map recovered in the first stage. For realistic synthesis the completion network is trained with both perceptual loss and conditionaladversarial loss. We furtherpropose a face re?nement network to improve the fidelity of the synthesized face region. We evaluate our method on publicly-available portrait image datasets, and show that it outperforms other state-of-the-art general image completion methods. Our method enables new portrait image editing applications such as occlusion removal and portrait extrapolation. We further show that the proposed general learning framework can be applied to other types of images, e.g. animal images.





    Poisson Vector Graphics (PVG)
    IEEE Transactions on Visualization and Computer Graphics, 2020, Vol. 26, No.2, 1361-1371.   
    Fei Hou, Qian Sun, Zheng Fang, Yong-Jin Liu, Shi-Min Hu, Hong Qin, Aimin Hao, and Ying He

    This paper presents Poisson vector graphics (PVG), an extension of the popular diffusion curves (DC), for generating smooth-shaded images. Armed with two new types of primitives, called Poisson curves and Poisson regions, PVG can easily produce photorealistic effects such as specular highlights, core shadows, translucency and halos. Within the PVG framework, the users specify color as the Dirichlet boundary condition of diffusion curves and control tone by offsetting the Laplacian of colors, where both controls are simply done by mouse click and slider dragging. PVG distinguishes itself from other diffusion based vector graphics for 3 unique features: 1) explicit separation of colors and tones, which follows the basic drawing principle and eases editing; 2) native support of seamless cloning in the sense that PCs and PRs can automatically fit into the target background; and 3) allowed intersecting primitives (except for DC-DC intersection) so that users can create layers. Through extensive experiments and a preliminary user study, we demonstrate that PVG is a simple yet powerful authoring tool that can produce photo-realistic vector graphics from scratch.





    Temporally Coherent Video Harmonization Using Adversarial Networks
    IEEE Transactions on Image Processing, 2020, Vol. 29, 214-224.   
    Hao-Zhi Huang, Sen-Zhe Xu, Jun-Xiong Cai, Wei Liu, and Shi-Min Hu

    Compositing is one of the most important editing operations for images and videos. The process of improving the realism of composite results is often called harmonization. Previous approaches for harmonization mainly focus on images. In this paper, we take one step further to attack the problem of video harmonization. Speci?cally, we train a convolutional neural network in an adversarial way, exploiting a pixel-wise disharmony discriminator to achieve more realistic harmonized results and introducing a temporal loss to increase temporal consistency between consecutive harmonized frames. Thanks to the pixel-wise disharmony discriminator, we are also able to relieve the need of input foreground masks. Since existing video datasets which have ground-truth foreground masks and optical ?ows are not suf?ciently large, we propose a simple yet ef?cient method to build up a synthetic dataset supporting supervised training of the proposed adversarial network. The experiments show that training on our synthetic dataset generalizes well to the real-world composite dataset. In addition, our method successfully incorporates temporal consistency during training and achieves more harmonious visual results than previous methods.




    Other publications in 2020

    1. Ding-Nan Zou, Song-Hai Zhang, Tai-Jiang Mu & Min Zhang, A new dataset of dog breed images and a benchmark for finegrained classification, Computational Visual Media, 2021, Vol. 7, No. 4, 477-487.   
    2. Xin Wen, Miao Wang, Christian Richardt, Ze-Yin Chen, Shi-Min Hu, Photorealistic Audio-driven Video Portraits, IEEE Transactions on Visualization and Computer Graphics, 2020, Vol. 26, No. 12, 3457-3466.   
    3. Yuntao Jiang, Chen-Feng Li, Shujie Deng, Shi-Min Hu, A Divergence-free Mixture Model for Multiphase Fluids, Computer Graphics Forum, 2020, Vol. 39, No. 8, 69-77.   
    4. Shi-Sheng Huang, Ze-Yu Ma, Tai-Jiang Mu, Hongbo Fu, Shi-Min Hu, Lidar-Monocular Visual Odometry using Point and Line Features, IEEE ICRA, 2020, 1092-1097.  
    5. Minghua Liu, Lu Sheng, Sheng Yang, Jing Shao, Shi-Min Hu, Morphing and Sampling Network for Dense Point Cloud Completion, AAAI, 2020, 11596-11603.   
    6. Xian Wu, Xiao-Nan Fang, Tao Chen & Fang-Lue Zhang, JMNet: A joint matting network for automatic human matting, Computational Visual Media, 2020, Vol. 6, No. 2, 215-224.   
    7. Song-Hai Zhang1, Zheng-Ping Zhou, Bin Liu, Xi Dong, and Peter Hall, What and where: A context-based recommendation system for object insertion, Computational Visual Media, 2020, Vol. 6, No. 1, 79-93.   




    2019





    Write-A-Video: Computational Video Montage from Themed Text
    ACM Transactions on Graphics, 2019, Vol. 38, No. 6, Article 177.    (click for project webpage)
    Miao Wang, Guo-Wei Yang, SHi-Min Hu, Shing-Tung Yau, Ariel Shamir,

    We present Write-A-Video, a tool for the creation of video montage using mostlytext-editing. Given an input themed text and a related video repository either from online websites or personal albums, the tool allows novice users to generate a video montage much more easily than current video editingtools. The resulting video illustrates the given narrative, provides diverse visual content, and follows cinematographic guidelines. The process involves three simple steps: (1) the user provides input,mostly in the form of editing the text, (2) the tool automatically searches for semantically matching candidate shots from the video repository, and (3) an optimization method assembles the video montage. Visual-semantic matching between segmented text and shots is performed by cascaded keyword matching and visual-semantic embedding, that have better accuracy than alternative solutions. The video assembly is formulated as a hybrid optimization problem over a graph of shots, considering temporal constraints, cinematography metrics such as camera movement and tone, and user-specified cinematography idioms. Using our system, users without video editing experience are able to generate appealing videos.




    ClusterSLAM: A SLAM Backend for Simultaneous Rigid Body Clustering and Motion Estimation
    IEEE ICCV, 2019, 5875-5884.   
    Jiahui Huang, Sheng Yang, Zishuo Zhao, Yu-Kun Lai, Shi-Min Hu

    We present a practical backend for stereo visual SLAM which can simultaneously discover individual rigid bodies and compute their motions in dynamic environments. While recent factor graph based state optimization algorithms have shown their ability to robustly solve SLAM problems by treating dynamic objects as outliers, the dynamic motions are rarely considered. In this paper, we exploit the consensus of 3D motions among the landmarks extracted from the same rigid body for clustering and estimating static and dynamic objects in a uni?ed manner. Specifically, our algorithm builds a noise-aware motion af?nity matrix upon landmarks, and uses agglomerative clustering for distinguishing those rigid bodies. Accompanied by a decoupled factor graph optimization for revising their shape and trajectory, we obtain an iterative scheme to update both cluster assignments and motion estimation reciprocally. Evaluations on both synthetic scenes and KITTI demonstrate the capability of our approach, and further experiments considering online ef?ciency also show the effectiveness of our method for simultaneous tracking of egomotion and multiple objects.




    Two-Layer QR Codes
    IEEE Transactions on Image Processing, 2019, Vol. 28, No. 9, 4413-4428. .   
    Tailing Yuan, Yili Wang, Kun Xu, Ralph R. Martin, Shi-Min Hu

    A quick-response code (QR code) is a twodimensional code akin to a barcode which encodes a message of limited length. In this paper, we present a variant of QR code, a two-layer QR code. Its two-layer structure can display two alternative messages when scanned from two different directions. We propose a method to generate such two-layer QR codes encoding two given messages in a few seconds. We also demonstrate the robustness of our method on both synthetic and fabricated examples. All source code will be made publicly available.




    Deep inverse rendering for high-resolution SVBRDF estimation from an arbitrary number of images
    ACM Transations on Graphics, Vol. 38, No. 4, article No. 134, (ACM SIGGRAPH 2019). .   
    Duan Gao, Xiao Li, Yue Dong, Pieter Peers, Kun Xu, Xin Tong

    In this paper we present a unified deep inverse rendering framework for estimating the spatially-varying appearance properties of a planar exemplar from an arbitrary number of input photographs, ranging from just a single photograph to many photographs. The precision of the estimated appearance scales from plausible when the input photographs fails to capture all the reflectance information, to accurate for large input sets. A key distinguishing feature of our framework is that it directly optimizes for the appearance parameters in a latent embedded space of spatially-varying appearance, such that no handcrafted heuristics are needed to regularize the optimization. This latent embedding is learned through a fully convolutional auto-encoder that has been designed to regularize the optimization. Our framework not only supports an arbitrary number of input photographs, but also at high resolution. We demonstrate and evaluate our deep inverse rendering solution on a wide variety of publicly available datasets.




    Deep Online Video Stabilization With Multi-Grid Warping Transformation Learning
    IEEE Transactions on Image Processing, 2019, Vol. 28, No. 5, 2283-2292.   
    Miao Wang, Guo-Ye Yang, Jin-Kun Lin, Song-Hai Zhang, Ariel Shamir, Shao-Ping Lu, Shi-Min Hu

    Video stabilization techniques are essential for most hand-held captured videos due to high-frequency shakes. Several 2D-, 2.5D-, and 3D-based stabilization techniques have been presented previously, but to the best of our knowledge, no solutions based on deep neural networks had been proposed to date. The main reason for this omission is shortage in training data as well as the challenge of modeling the problem using neural networks. In this paper, we present a video stabilization technique using a convolutional neural network. Previous works usually propose an off-line algorithm that smoothes a holistic camera path based on feature matching. Instead, we focus on low-latency, real-time camera path smoothing that does not explicitly represent the camera path and does not use future frames. Our neural network model, called StabNet, learns a set of mesh-grid transformations progressively for each input frame from the previous set of stabilized camera frames and creates stable corresponding latent camera paths implicitly. To train the network, we collect a dataset of synchronized steady and unsteady video pairs via a specially designed hand-held hardware. Experimental results show that our proposed online method performs comparatively to the traditional off-line video stabilization methods without using future frames while running about 10 times faster. More importantly, our proposed StabNet is able to handle low-quality videos, such as night-scene videos, watermarked videos, blurry videos, and noisy videos, where the existing methods fail in feature extraction or matching.




    S4Net: SingleStageSalient-InstanceSegmentation
    IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019.   
    Ruochen Fan, Ming-Ming Cheng, Qibin Hou, Tai-Jiang Mu, Jingdong Wang, Shi-Min Hu

    We consider an interesting problem - salient instance segmentation in this paper. Other than producing bounding boxes, our network also outputs high-quality instance-level segments. Taking into account the category-independent property of each target, we design a single stage salient instance segmentation framework, with a novel segmentation branch. Our new branch regards not only local context inside each detection window but also its surrounding context, enabling us to distinguish the instances in the same scope even with obstruction. Our network is end-toend trainable and runs at a fast speed (40 fps when processing an image with resolution 320 x 320). We evaluate our approach on a public available benchmark and show that it outperforms other alternative solutions. We also provide a thorough analysis of the design choices to help readers better understand the functions of each part of our network. The source code can be found at https: //github.com/RuochenFan/S4Net.




    APDrawingGAN: Generating Artistic Portrait Drawings from Face Photo swith Hierarchical GANs
    IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019.    supplemental:  
    Ran Yi, Yong-Jin Liu, Yu-Kun Lai, Paul L. Rosin

    Significant progress has been made with image stylization using deep learning, especially with generative adversarial networks (GANs). However, existing methods fail to produce high quality artistic portrait drawings. Such drawings have a highly abstract style, containing a sparse set of continuous graphical elements such as lines, and so small artifacts are more exposed than for painting styles. Moreover, artists tend to use different strategies to draw different facial features and the lines drawn are only loosely related to obvious image features. To address these challenges, we propose APDrawingGAN, a novel GAN based architecture that builds upon hierarchical generators and discriminators combining both a global network (for images as a whole) and local networks (for individual facial regions). This allows dedicated drawing strategies to be learned for different facial features. To train APDrawingGAN, we construct an artistic drawing dataset containing high-resolution portrait photos and corresponding professional artistic drawings.




    Pose2Seg: DetectionFreeHumanInstanceSegmentation
    IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019.   
    Song-Hai Zhang, Ruilong Li, Xin Dong, Paul Rosin, Zixi Cai, Xi Han, Dingcheng Yang, Haozhi Huang and Shi-Min Hu

    In this paper, we present a brand new pose-based instance segmentation framework for humans which separates instances based on human pose, rather than proposal region detection. We demonstrate that our pose-based framework can achieve better accuracy than the state-of-art detectionbased approach on the human instance segmentation problem, and can moreover better handle occlusion. Furthermore,there are few public datasets containing many heavily occluded humans along with comprehensive annotations, which makes this a challenging problem seldom noticed by researchers. Therefore, in this paper we introduce a new benchmark "Occluded Human (OCHuman)", which focusesonoccludedhumanswithcomprehensiveannotations including bounding-box, human pose and instance masks. This dataset contains 8110 detailed annotated human instances within 4731 images. With an average 0.67 MaxIoU for each person, OCHuman is the most complex and challenging dataset related to human instance segmentation. Through this dataset, we want to emphasize occlusion as a challenging problem for researchers to study.




    Example-Guided Style-Consistent Image Synthesis from Semantic Labeling
    IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019.   
    Miao Wang, Guo-Ye Yang, Ruilong Li, Run-Ze Liang, Song-Hai Zhang, Peter M. Hall, Shi-Min Hu

    Example-guided image synthesis aims to synthesize an image from a semantic label map and an exemplary image indicating style. We use the term "style" in this problem to refer to implicit characteristics of images, for example: in portraits "style" includes gender, racial identity, age, hairstyle; in full body pictures it includes clothing; in street scenes it refers to weather and time of day and such like. A semantic label map in these cases indicates facial expression, full body pose, or scene segmentation. We propose a solution to the example-guided image synthesis problem using conditional generative adversarial networks with style consistency. Our key contributions are(i)anovelstylecon sistency discriminator to determine whether a pair of images are consistent in style; (ii) an adaptive semantic consistency loss; and (iii) a training data sampling strategy, for synthesizing style-consistent results to the exemplar. We demonstrate the efficiency of our method on face, danceand street view synthesis tasks.




    Probabilistic Projective Association and Semantic Guided Relocalization for Dense Reconstruction
    International Conference on Robotics and Automation (ICRA), 2019.   
    Sheng Yang, Zheng-Fei Kuang, Yan-Pei Cao, Yu-Kun Lai, and Shi-Min Hu

    We present a real-time dense mapping system which uses the predicted 2D semantic labels for optimizing the geometric quality of reconstruction. With a combination of Convolutional Neural Networks (CNNs) for 2D labeling and a Simultaneous Localization and Mapping (SLAM) system for camera trajectory estimation, recent approaches have succeeded in incrementally fusing and labeling 3D scenes. However, the geometric quality of the reconstruction can be further improved by incorporating such semantic prediction results, which is not sufficiently exploited by existing methods. In this paper, we propose to use semantic information to improve two crucial modules in the reconstruction pipeline, namely tracking and loop detection, for obtaining mutual benefits in geometric reconstruction and semantic recognition. Specifically for tracking, we use a novel probabilistic projective association approach to efficiently pick out candidate correspondences, where the confidence of these correspondences is quantified concerning similarities on all available short-term invariant features. For the loop detection, we incorporate these semantic labels into the original encoding through Randomized Ferns to generate a more comprehensive representation for retrieving candidate loop frames.




    LineUp: Computing Chain-Based Physical Transformation
    ACM Transactions on Graphics, 2019, Vol. 38, No.1, article No. 11    
    Minjing Yu, Zipeng Ye, Yong-Jin Liu, Ying He, Charlie Wang

    In this article, we introduce a novel method that can generate a sequence of physical transformations between 3D models with different shape and topology. Feasible transformations are realized on a chain structure with connected components that are 3D printed. Collision-free motions are computed to transform between different configurations of the 3D printed chain structure. To realize the transformation between different 3D models, we first voxelize these input models into a similar number of voxels. The challenging part of our approach is to generate a simple path¡ªas a chain configuration to connect most voxels. A layer-based algorithm is developed with theoretical guarantee of the existence and the path length. We find that collision-free motion sequence can always be generated when using a straight line as the intermediate configuration of transformation. The effectiveness of our method is demonstrated by both the simulation and the experimental tests taken on 3D printed chains.



    Other publications in 2019

    1. Yili Wang, Yifan Liu, Kun Xu, An Improved Geometric Approach for Palette©\based Image Decomposition and Recoloring, Computer Graphics Forum, 2019, Vol. 38, No. 7, 11-22 (PG 2019).   
    2. Xiao-Nan Fang, Miao Wang, Ariel Shamir, Shi-Min Hu, Learning Explicit Smoothing Kernels for Joint Image Filtering, Computer Graphics Forum, 2019, Vol. 38, No. 7, 181-190 (PG 2019).   
    3. Jiaming Lu, Xiao-Song Chen, Xiao Yan, Chen-Feng Li, Ming Lin, Shi-Min Hu, A Rigging-Skinning Scheme to Control Fluid Simulation, Computer Graphics Forum, 2019, Vol. 38, No. 7, 501-512 (PG 2019).   
    4. Junxiong Cai, Tai-Jiang Mu, Yu-Kun Lai, Shi-Min Hu, Deep point-based scene labeling with depth mapping and geometric patch feature encoding, Graphical Models, 2019, Vol. 104, 101033.   
    5. Bing Xu, Junfei Zhang, Rui Wang, Kun Xu, Yong-Liang Yang, Chuan Li, Rui Tang, Adversarial Monte Carlo denoising with conditioned auxiliary feature modulation, ACM Transactions on Graphics, 2019, Vol. 38, No.6, article No. 224.    (click for project webpage)
    6. Yifan Liu, Kun Xu, Ling-Qi Yan, Adaptive BRDF-Aware Multiple Importance Sampling of Many Lights, Computer Graphics Forum, 2019, Vol. 38, No. 4, 123-133 (EGSR 2019).   
    7. Tai-Ling Yuan, Zhe Zhu, Kun Xu, Cheng-Jun Li, Tai-Jiang Mu, Shi-Min Hu, A Large Chinese Text Dataset in the Wild, Journal of Computer Science and Technology, 2019, Vol. 34, No. 3, 509-521.   
    8. Qian Fu, Ying He, Fei Hou, Juyong Zhang, Anxiang Zeng, Yong-Jin Liu, Vectorization Based Color Transfer for Portrait Images, Computer-Aided Design, 2019, Vol. 115, 111-121.  
    9. Zipeng Ye, Yong-Jin Liu, Jianmin Zheng, Kai Hormann, Ying He, DE-Path: A Differential-Evolution-Based Method for Computing Energy-Minimizing Paths on Surfaces, Computer-Aided Design, 2019, Vol. 114, 73-81.  
    10. Chenming Wu, Chengkai Dai, Xiaoxi Gong, Yong-Jin Liu, et al., Energy Efficient Coverage Path Planning for General Terrain Surfaces, IEEE Robotics and Automation Letters, 2019, Vol. 4, No. 3, 2584-2591.  
    11. Chenming Wu, Rui Zeng, Jia Pan, Charlie C. L. Wang, Yong-Jin Liu, Plant Phenotyping by Deep-Learning-Based Planner for Multi-Robots, IEEE Robotics and Automation Letters, 2019, Vol. 4, No. 4, 3113-3120  
    12. Zipeng Ye, Minjing Yu, Yong-Jin Liu, NP-completeness of optimal planning problem for modular robots, Autonomous Robots, 2019, Vol. 43, No. 8, 2261-2270.  
    13. Shuyang Zhang, Runze Liang, and Miao Wang, ShadowGAN: Shadow synthesis for virtual objects with conditional adversarial networks, Computational Visual Media, 2019, Vol. 5, No. 1, 105-115.   
    14. Ruochen Fan, Xuanrun Wang, Qibin Hou, Hanchao Liu, and Tai-Jiang Mu SpinNet: Spinning convolutional network for lane boundary detection, Computational Visual Media, 2019, Vol. 5, No. 4, 417-428.   



    2018





    BiggerSelfie: Selfie Video Expansion with Hand-held Camera
    IEEE Transactions on Image Processing, 2018, Vol. 27, No. 12, 5854-5865.   
    Miao Wang, Ariel Shamir,Guo-Ye Yang, Jin-Kun Lin, Guo-Wei Yang, Shao-Ping Lu and Shi-Min Hu

    Selfie photography from hand-held camera is becoming a popular media type. Although being convenient and flexible, it suffers from low camera motion stability, small field of view and limited background content. These limitations can annoy users, especially when touring a place of interest and taking selfie videos. In this paper, we present a novel method to create what we call a BiggerSelfie that deals with these shortcomings. Using a video of the environment that has partial content overlap with the selfie video, we stitch plausible frames selected from the environment video to the original selfie frames, and stabilize the composed video content with a portrait-preserving constraint. Using the proposed method, one can easily obtain a stable selfie video with expanded background content by merely capturing some background shots. We show various results and several evaluations to demonstrate the applicability of our method.




    Delaunay Mesh Simplification with Differential Evolution
    ACM Transactions on Graphics, 2018, Vol. 37, No.6, Article No. 263.   
    RAN YI, Yong-Jin Liu, Ying He

    Delaunay meshes (DM) are a special type of manifold triangle meshes where the local Delaunay condition holds everywhere ¡ª and find important applications in digital geometry processing. This paper addresses the general DM simplification problem: given an arbitrary manifold triangle mesh M with n vertices and the user-specified resolution m (< n), compute a Delaunay mesh M* with m vertices that has the least Hausdorf distance to M. To solve the problem, we abstract the simplification process using a 2D Cartesian grid model, in which each grid point corresponds to triangle meshes with a certain number of vertices and a simplification process is a monotonic path on the grid.We develop a novel diffierential-evolution-based method to compute a low-cost path, which leads to a high quality Delaunay mesh. Extensive evaluation shows that our method consistently outperforms the existing methods in terms of approximation error. In particular, our method is highly effective for small-scale CAD models and man-made objects with sharp features but less details. Moreover, our method is fully automatic and can preserve sharp features well and deal with models with multiple components, whereas the existing methods often fail.




    Real-time High-accuracy 3D Reconstruction with Consumer RGB-D Cameras
    ACM Transactions on Graphics, 2018, Vol. 37, No.5, Article No. 171.   
    Yan-Pei Cao, Leif Kobbelt, Shi-Min Hu

    We present an integrated approach for reconstructing high-fidelity 3D models using consumer RGB-D cameras. RGB-D registration and reconstruction algorithms are prone to errors from scanning noise, making it hard to perform 3D reconstruction accurately. The key idea of our method is to assign a probabilistic uncertainty model to each depth measurement, which then guides the scan alignment and depth fusion. This allows us to effectively handle inherent noise and distortion in depth maps while keeping the overall scan registration procedure under the iterative closest point (ICP) frame- work for simplicity and efficiency. We further introduce a local-to-global, submap-based, and uncertainty-aware global pose optimization scheme to improve scalability and guarantee global model consistency. Finally, we have implemented the proposed algorithm on the GPU, achieving real-time 3D scanning frame rates and updating the reconstructed model on-the-fly. Experimental results on simulated and real-world data demonstrate that the proposed method outperforms state-of-the-art systems in terms of the accuracy of both recovered camera trajectories and reconstructed models.




    PhotoRecomposer: Interactive Photo Recomposition by Cropping (Spotlight paper)  
    IEEE Transactions on Visualization and Computer Graphics, 2018, Vol. 24, No. 10, 2728-2742.   
    Yuan Liang, Xiting Wang, Song-Hai Zhang, Shi-Min Hu and Shixia Liu

    We present a visual analysis method for interactively recomposing a large number of photos based on example photos with high-quality composition. The recomposition method is formulated as a matching problem between photos. The key to this formulation is a new metric for accurately measuring the composition distance between photos. We have also developed an earth-mover-distancebased online metric learning algorithm to support the interactive adjustment of the composition distance based on user preferences. To better convey the compositions of a large number of example photos, we have developed a multi-level, example photo layout method to balance multiple factors such as compactness, aspect ratio, composition distance, stability, and overlaps. By introducing an EulerSmooth-based straightening method, the composition of each photos is clearly displayed. The effectiveness and usefulness of the method has been demonstrated by the experimental results, user study, and case studies.




    Learning to Reconstruct High-quality 3D Shapes with Cascaded Fully Convolutional Networks
    Proceedings of the European Conference on Computer Vision (ECCV), 2018, 616-633.   
    Yan-Pei Cao, Zheng-Ning Liu, Zheng-Fei Kuang, Leif Kobbelt, Shi-Min Hu

    We present a data-driven approach to reconstructing highresolution and detailed volumetric representations of 3D shapes. Although well studied, algorithms for volumetric fusion from multi-view depth scans are still prone to scanning noise and occlusions, making it hard to obtain high-fidelity 3D reconstructions. In this paper, inspired by recent advances in efficient 3D deep learning techniques, we introduce a novel cascaded 3D convolutional network architecture, which learns to reconstruct implicit surface representations from noisy and incomplete depth maps in a progressive, coarse-to-fine manner. To this end, we also develop an algorithm for end-to-end training of the proposed cascaded structure. Qualitative and quantitative experimental results on both simulated and real-world datasets demonstrate that the presented approach outperforms existing state-of-the-art work in terms of quality and fidelity of reconstructed models.




    Associating Inter-Image Salient Instances for Weakly Supervised Semantic Segmentation
    Proceedings of the European Conference on Computer Vision (ECCV), 2018, 367-383.   
    Ruochen Fan, Qibin Hou, Ming-Ming Cheng, Gang Yu, Ralph R. Martin, and Shi-Min Hu

    Effectively bridging between image level keyword annotations and corresponding image pixels is one of the main challenges in weakly supervised semantic segmentation. In this paper, we use an instance-level salient object detector to automatically generate salient instances (candidate objects) for training images. Using similarity features extracted from each salient instance in the whole training set, we build a similarity graph, then use a graph partitioning algorithm to separate it into multiple subgraphs, each of which is associated with a single keyword (tag). Our graph-partitioning-based clustering algorithm allows us to consider the relationships between all salient instances in the training set as well as the information within them. We further show that with the help of attention information, our clustering algorithm is able to correct certain wrong assignments, leading to more accurate results. The proposed framework is general, and any state-of-the-art fully-supervised network structure can be incorporated to learn the segmentation network. When working with DeepLab for semantic segmentation, our method outperforms state-of-the-art weakly supervised alternatives by a large margin, achieving 65.6% mIoU on the PASCAL VOC 2012 dataset. We also combine our method with Mask R-CNN for instance segmentation, and demonstrated for the first time the ability of weakly supervised instance segmentation using only keyword annotations.




    Detecting and Removing Visual Distractors for Video Aesthetic Enhancement
    IEEE Transactions on Multimedia, 2018, Vol. 20, No. 8, 1987-1999.    demo: More examples:  
    Fang-Lue Zhang, Xian Wu, Rui-Long Li, Jue Wang,Zhao-Heng Zheng and Shi-Min Hu

    Personal videos often contain visual distractors, which are objects that are accidentally captured that can distract viewers from focusing on the main subjects. We propose a method to automatically detect and localize these distractors through learning from a manually labeled dataset. To achieve spatially and temporally coherent detection, we propose extracting features at the Temporal-Superpixel (TSP) level using a traditional SVM-based learning framework. We also experiment with end-to-end learning using Convolutional Neural Networks (CNNs), which achieves slightly higher performance than other methods. The classification result is further refined in a post-processing step based on graph-cut optimization. Experimental results show that our method achieves an accuracy of 81% and a recall of 86%. We demonstrate several ways of removing the detected distractors to improve the video quality, including video hole filling; video frame replacement; and camera path re-planning. The user study results show that our method can significantly improve the aesthetic quality of videos.




    Real-time High-fidelity Surface Flow Simulation
    IEEE Transactions on Visualization and Computer Graphics, 2018, Vol. 24, No. 8, 2411-2423.   
    Bo Ren, Tailing Yuan, Chenfeng Li, Kun Xu, and Shi-Min Hu

    Surface flow phenomena, such as rain water flowing down a tree trunk and progressive water front in a shower room, are common in real life. However, compared with the 3D spatial fluid flow, these surface flow problems have been much less studied in the graphics community. To tackle this research gap, we present an efficient, robust and high-fidelity simulation approach based on the shallow-water equations. Specifically, the standard shallow-water flow model is extended to general triangle meshes with a feature-based bottom friction model, and a series of coherent mathematical formulations are derived to represent the full range of physical effects that are important for real-world surface flow phenomena. In addition, by achieving compatibility with existing 3D fluid simulators and by supporting physically realistic interactions with multiple fluids and solid surfaces, the new model is flexible and readily extensible for coupled phenomena. A wide range of simulation examples are presented to demonstrate the performance of the new approach.




    A Comparative Study of Algorithms for Realtime Panoramic Video Blending
    IEEE Transactions on Image Processing, 2018, Vol. 27, No. 6, 2952-2965.   
    Zhe Zhu, Jiaming Lu, Minxuan Wang, Songhai Zhang, Ralph R. Martin, Hantao Liu, and Shi-Min Hu

    Unlike image blending algorithms, video blending algorithms have been little studied. In this paper, we investigate 6 popular blending algorithms¡ªfeather blending, multi-band blending, modified Poisson blending, mean value coordinate blending, multi-spline blending and convolution pyramid blending. We consider their application to blending realtime panoramic videos, a key problem in various virtual reality tasks. To evaluate the performances and suitabilities of the 6 algorithms for this problem, we have created a video benchmark with several videos captured under various conditions. We analyze the time and memory needed by the above 6 algorithms, for both CPU and GPU implementations (where readily parallelizable). The visual quality provided by these algorithms is also evaluated both objectively and subjectively. The video benchmark and algorithm implementations are publicly available.




    CartoonGAN: Generative Adversarial Networks for Photo Cartoonization
    IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2018, 9465-9474.   
    Yang Chen, Yu-Kun Lai, Yong-Jin Liu

    In this paper, we propose CartoonGAN, a generative adversarial network (GAN) framework for cartoon stylization. Our method takes unpaired photos and cartoon images for training, which is easy to use. Two novel losses suitable for cartoonization are proposed: (1) a semantic content loss, which is formulated as a sparse regularization in the high-level feature maps of the VGG network to cope with substantial style variation between photos and cartoons, and (2) an edge-promoting adversarial loss for preserving clear edges. We further introduce an initialization phase, to improve the convergence of the network to the target manifold. Our method is also much more efficient to train than existing methods. Experimental results show that our method is able to generate high-quality cartoon images from real-world photos (i.e., following specific artists¡¯ styles and with clear edges and smooth shading) and outperforms state-of-the-art methods.




    Content-Sensitive Supervoxels via Uniform Tessellations on Video Manifolds
    IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2018, 646-655.   
    Ran Yi, Yong-Jin Liu, Yu-Kun Lai

    In this paper, we propose content-sensitive supervoxels (CSS), which are regularly-shaped 3D primitive volumes that possess the following characteristic: they are typically larger and longer in content-sparse regions (i.e., with homogeneous appearance and motion), and smaller and shorter in content-dense regions (i.e., with high variation of appearance and/or motion). To compute CSS, we map a video $\xi$ to a 3-dimensional manifold M embedded in $R^6$, whose volume elements give a good measure of the content density in We propose an efficient Lloyd-like method with a splitting-merging scheme to compute a uniform tessellation on M, which induces the CSS in $\xi$. Theoretically our method has a good competitive ratio O(1). We also present a simple extension of CSS to stream CSS for processing long videos that cannot be loaded into main memory at once. We evaluate CSS, stream CSS and seven representative supervoxel methods on four video datasets. The results show that our method outperforms existing supervoxel methods.




    Hyper-lapse from Multiple Spatially-overlapping Videos
    IEEE Transactions on Image Processing, 2018, Vol. 27, No. 4, 1735 - 1747.    demo: More examples:  
    Miao Wang, Jun-Bang Liang, Song-Hai Zhang, Shao-Ping Lu, Ariel Shamir and Shi-Min Hu

    Hyper-lapse video with high speed-up rate is an efficient way to overview long videos such as a human activity in first-person view. Existing hyper-lapse video creation methods produce a fast-forward video effect using only one video source. In this work, we present a novel hyper-lapse video creation approach based on multiple spatially-overlapping videos. We assume the videos share a common view or location, and find transition points where jumps from one video to another may occur. We represent the collection of videos using a hyper-lapse transition graph; the edges between nodes represent possible hyper-lapse frame transitions. To create a hyper-lapse video, a shortest path search is performed on this digraph to optimize frame sampling and assembly simultaneously. Finally, we render the hyper-lapse results using video stabilization and appearance smoothing techniques on the selected frames. Our technique can synthesize novel virtual hyper-lapse routes which may not exist originally. We show various application results on both indoor and outdoor video collections with static scenes, moving objects, and crowds.




    Intrinsic Manifold SLIC: A Simple and Efficient Method for Computing Content-Sensitive Superpixels
    IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, Vol. 40, No. 3, 653 - 666.   
    Yong-Jin Liu, Minjing Yu, Bing-Jun Li, and Ying He

    Superpixels are perceptually meaningful atomic regions that can effectively capture image features. Among various methods for computing uniform superpixels, simple linear iterative clustering (SLIC) is popular due to its simplicity and high performance. In this paper, we extend SLIC to compute content-sensitive superpixels, i.e., small superpixels in content-dense regions with high intensity or colour variation and large superpixels in content-sparse regions. Rather than using the conventional SLIC method that clusters pixels in R5, we map the input image I to a 2-dimensional manifoldMR5, whose area elements are a good measure of the content density in I. We propose a simple method, called intrinsic manifold SLIC (IMSLIC), for computing a geodesic centroidal Voronoi tessellation (GCVT)¡ªa uniform tessellation¡ªonM, which induces the content-sensitive superpixels in I. In contrast to the existing algorithms, IMSLIC characterizes the content sensitivity by measuring areas of Voronoi cells onM. Using a simple and fast approximation to a closed-form solution, the method can compute the GCVT at a very low cost and guarantees that all Voronoi cells are simply connected. We thoroughly evaluate IMSLIC and compare it with eleven representative methods on the BSDS500 dataset and seven representative methods on the NYUV2 dataset. Computational results show that IMSLIC outperforms existing methods in terms of commonly used quality measures pertaining to superpixels such as compactness, adherence to boundaries, and achievable segmentation accuracy. We also evaluate IMSLIC and seven representative methods in an image contour closure application, and the results on two datasets, WHD and WSD, show that IMSLIC achieves the best foreground segmentation performance.




    Controllable Dendritic Crystal Simulation Using Orientation Field
    Computer Graphics Forum, Vol. 37, No.2, 485-495, 2018, (Eurographics 2018).    demo:  
    Bo Ren, Jiahui Huang, Ming C. Lin, and Shi-Min Hu

    Real world dendritic growths show charming structures by their exquisite balance between the symmetry and randomness in the crystal formation. Other than the variety in the natural crystals, richer visual appearance of crystals can benefit from artificially controlling of the crystal growth on its growing directions and shapes. In this paper, by introducing one extra dimension of freedom, i.e. the orientation field, into the simulation, we propose an efficient algorithm for dendritic crystal simulation that is able to reproduce arbitrary symmetry patterns with different levels of asymmetry breaking effect on general grids or meshes, including spreading on curved surfaces and growth in 3D. Flexible artistic control is also enabled in a unified manner by exploiting and guiding the orientation field in the visual simulation. We show the effectiveness of our approach by various demonstrations of simulation results.




    Computational Design of Transforming Pop-up Books
    ACM Transactions on Graphics, Vol. 37, No.1, 2018, Article No. 8.    demo:  
    Nan Xiao, Zhe Zhu, Ralph Martin, Kun Xu, Jia-Ming Lu and Shi-Min Hu

    We present the first computational tool to help ordinary users create transforming pop-up books. In each transforming pop-up, when the user pulls a tab, an initial flat 2D pattern, i.e. a 2D shape with a superimposed picture, such as an airplane, turns into a new 2D pattern, such as a robot, standing up from the page. Given the two 2D patterns, our approach automatically computes a 3D pop-up mechanism that transforms one pattern into the other; it also outputs a design blueprint, allowing the user to easily make the final model. We also present a theoretical analysis of basic transformation mechanisms; combining these basic mechanisms allows more flexibility of final designs. Using our approach, inexperienced users can create models in a short time; previously, even experienced artists often took weeks to manually create them. We demonstrate our method on a variety of real world examples.



    Other publications in 2018

    1. Xiao Yan, Cheng-Feng Li, Xiao-Song Chen, Shi-Min Hu, MPM simulation of interacting fluids and solids, Computer Graphics Forum, 2018, Vol. 37, No. 8, 183-193.   
    2. Yu Fang, Yuanming Hu, Shi-Min Hu, Chenfanfu Jiang, A Temporally Adaptive Material Point Method with Regional Time Stepping, Computer Graphics Forum, 2018, Vol. 37, No. 8, 195-204.   
    3. Sen-Zhe Xu, Jun Hu, Miao Wang, Tai-Jiang Mu, Shi-Min Hu, Deep Video Stabilization Using Adversarial Networks, Computer Graphics Forum, 2018, Vol. 37, No. 7, 267-276.   
    4. Yuan Liang, Fei Xu, Song-Hai Zhang, Yu-Kun Lai, and Taijiang Mu£¬ Knowledge graph construction with structure and parameter learning for indoor scene design, Computational Visual Media, 2018, Vol. 4, No. 2, 123-137.   
    5. Yifan Lu, Jiaming Lu, Songhai Zhang, and Peter Hall, Traffic signal detection and classification in street views using an attention model, Computational Visual Media, 2018, Vol. 4, No. 3, 253-266. (2018 Honorable Mention Award)    
    6. Jiahui Huang, Jun Gao, Vignesh Ganapathi-Subramanian, Hao Su, Yin Liu, Chengcheng Tang, Leonidas J. Guibas, DeepPrimitive: Image decomposition by layered primitive detection, Computational Visual Media, 2018, Vol. 4, No. 4, 385-397.   



    2017





    A Unified Particle System Framework for Multi-Phase, Multi-Material Visual Simulations
    ACM Transactions on Graphics, Vol. 36, No. 6. ACM SIGGRAPH ASIA 2017, Article No.224.    demo:  
    Tao Yang, Jian Chang, Ming C. Lin, Ralph R. Martin, Jian J. Zhang, Shi-Min Hu

    We introduce a unified particle framework which integrates the phase-field method with multi-material simulation to allow modeling of both liquids and solids, as well as phase transitions between them. A simple elastoplastic model is used to capture the behavior of various kinds of solids, including deformable bodies, granular materials, and cohesive soils. States of matter or phases, particularly liquids and solids, are modeled using the nonconservative Allen-Cahn equation. In contrast, materials¡ªmade of different substances¡ªare advected by the conservative Cahn-Hilliard equation. The distributions of phases and materials are represented by a phase variable and a concentration variable, respectively, allowing us to represent commonly observed fluid-solid interactions. Our multi-phase, multi-material system is governed by a unified Helmholtz free energy density. This framework provides the first method in computer graphics capable of modeling a continuous interface between phases. It is versatile and can be readily used in many scenarios that are challenging to simulate. Examples are provided to demonstrate the capabilities and effectiveness of this approach.




    An Optimization Approach for Localization Refinement of Candidate Traffic Signs
    IEEE Transactions on Intelligent Transportation System, 2017, Vol. 18, No. 11, 3006-3016.   
    Zhe Zhu, Jiaming Lu, Ralph R. Martin, and Shi-Min Hu

    We propose a localization refinement approach for candidate traffic signs. Previous traffic sign localization approaches, which place a bounding rectangle around the sign, do not always give a compact bounding box, making the subsequent classification task more difficult. We formulate localization as a segmentation problem, and incorporate prior knowledge concerning color and shape of traffic signs. To evaluate the effectiveness of our approach, we use it as an intermediate step between a standard traffic sign localizer and a classifier. Our experiments use the well-known German Traffic Sign Detection Benchmark (GTSDB) as well as our new Chinese Traffic Sign Detection Benchmark. This newly created benchmark is publicly available,1 and goes beyond previous benchmark data sets: it has over 5000 high-resolution images containing more than 14 000 traffic signs taken in realistic driving conditions. Experimental results show that our localization approach significantly improves bounding boxes when compared with a standard localizer, thereby allowing a standard traffic sign classifier to generate more accurate classification results.




    Pairwise Force SPH Model for Real-Time Multi-Interaction Applications
    IEEE Transactions on Visualization and Computer Graphics, 2017, Vol. 23, No. 10, 2235 - 2247.   
    Tao Yang, Ralph R. Martin, Ming C. Lin, Jian Chang, and Shi-Min Hu

    In this paper, we present a novel pairwise-force smoothed particle hydrodynamics (PF-SPH) model to allow modeling of various interactions at interfaces in real time. Realistic capture of interactions at interfaces is a challenging problem for SPH-based simulations, especially for scenarios involving multiple interactions at different interfaces. Our PF-SPH model can readily handle multiple kinds of interactions simultaneously in a single simulation; its basis is to use a larger support radius than that used in standard SPH. We adopt a novel anisotropic filtering term to further improve the performance of interaction forces. The proposed model is stable; furthermore, it avoids the particle clustering problem which commonly occurs at the free surface. We show how our model can be used to capture various interactions. We also consider the close connection between droplets and bubbles, and show how to animate bubbles rising in liquid as well as bubbles in air. Our method is versatile, physically plausible and easy-to-implement. Examples are provided to demonstrate the capabilities and effectiveness of our approach.




    Extracting Sharp Features from RGB-D Images
    Computer Graphics Forum, 2017, Vol.35, No. 8, 138-174.    
    Yan-Pei Cao, Tao Ju, Jie XU and Shi-Min Hu

    Sharp edges are important shape features and their extraction has been extensively studied both on point clouds and surfaces. We consider the problem of extracting sharp edges from a sparse set of colour-and-depth (RGB-D) images. The noise-ridden depth measurements are challenging for existing feature extraction methods that work solely in the geometric domain (e.g. points or meshes). By utilizing both colour and depth information, we propose a novel feature extraction method that produces much cleaner and more coherent feature lines. We make two technical contributions. First, we show that intensity edges can augment the depth map to improve normal estimation and feature localization from a single RGB-D image. Second, we designed a novel algorithm for consolidating feature points obtained from multiple RGB-D images. By utilizing normals and ridge/valley types associated with the feature points, our algorithm is effective in suppressing noise without smearing nearby features.




    Saliency-aware Real-time Volumetric Fusion for Object Reconstruction
    Computer Graphics Forum, 2017, Vol.35, No. 7, 167-174. Pacofic Graphics 2017.   
    Sheng Yang, Kang Chen, Minghua Liu, Hongbo Fu and Shi-Min Hu

    We present a real-time approach for acquiring 3D objects with high fidelity using hand-held consumer-level RGB-D scanning devices. Existing real-time reconstruction methods typically do not take the point of interest into account, and thus might fail to produce clean reconstruction results of desired objects due to distracting objects or backgrounds. In addition, any changes in background during scanning, which can often occur in real scenarios, can easily break up the whole reconstruction process. To address these issues, we incorporate visual saliency into a traditional real-time volumetric fusion pipeline. Salient regions detected from RGB-D frames suggest user-intended objects, and by understanding user intentions our approach can put more emphasis on important targets, and meanwhile, eliminate disturbance of non-important objects. Experimental results on realworld scans demonstrate that our system is capable of effectively acquiring geometric information of salient objects in cluttered real-world scenes, even if the backgrounds are changing.




    Learning to Rank Retargeted Images
    IEEE CVPR, 2017: 4743-4751.    
    Yang Chen, Yong-Jin Liu, Yu-Kun Lai

    Image retargeting techniques that adjust images into different sizes have attracted much attention recently. Existing OQA methods output an absolute score for each retargeted image and use these scores to compare different results. Observing that it is challenging even for human subjects to give consistent scores for retargeting results of different source images, in this paper we propose a learning-based OQA method that predicts the ranking of a set of retargeted images with the same source image. We show that this more manageable task helps achieve more consistent prediction to human preference and is sufficient for most application scenarios. To compute the ranking, we propose a simple yet efficient machine learning framework that uses a General Regression Neural Network (GRNN) to model a combination of seven elaborate OQA metrics. We then propose a simple scheme to transform the relative scores output from GRNN into a global ranking. We train our GRNN model using human preference data collected in the elaborate RetargetMe benchmark and evaluate our method based on the subjective study in RetargetMe.




    PlenoPatch: Patch-based Plenoptic Image Manipulation
    IEEE Transactions on Visualization and Computer Graphics, 2017, Vol.23, No. 5, 1561-1573.   
    Fang-Lue Zhang, Jue Wang, Eli Shechtman, Zi-Ye Zhou, Jia-Xin Shi, and Shi-Min Hu

    Patch-based image synthesis methods have been successfully applied for various editing tasks on still images, videos and stereo pairs. In this work we extend patch-based synthesis to plenoptic images captured by consumer-level lenselet-based devices for interactive, efficient light field editing. In our method the light field is represented as a set of images captured from different viewpoints. We decompose the central view into different depth layers, and present it to the user for specifying the editing goals. Given an editing task, our method performs patch-based image synthesis on all affected layers of the central view, and then propagates the edits to all other views. Interaction is done through a conventional 2D image editing user interface that is familiar to novice users. Our method correctly handles object boundary occlusion with semi-transparency, thus can generate more realistic results than previous methods. We demonstrate compelling results on a wide range of applications such as hole-filling, object reshuffling and resizing, changing object depth, light field upscaling and parallax magnification.




    View suggestion for interactive segmentation of indoor scenes
    Computational Visual Media, 2017, Vol. 3, No. 2, 131-146.   
    Sheng Yang, Jie Xu, Kang Chen, Hongbo Fu

    Point cloud segmentation is a fundamental problem. Due to the complexity of real-world scenes and the limitations of 3D scanners, interactive segmentation is currently the only way to cope with all kinds of point clouds. However, interactively segmenting complex and large-scale scenes is very time-consuming. In this paper, we present a novel interactive system for segmenting point cloud scenes. Our system automatically suggests a series of camera views, in which users can conveniently specify segmentation guidance. In this way, users may focus on specifying segmentation hints instead of manually searching for desirable views of unsegmented objects, thus significantly reducing user effort. To achieve this, we introduce a novel view preference model, which is based on a set of dedicated view attributes, with weights learned from a user study. We also introduce support relations for both graph-cut-based segmentation and finding similar objects. Our experiments show that our segmentation technique helps users quickly segment various types of scenes, outperforming alternative methods.




    Constructing Intrinsic Delaunay Triangulations from the Dual of Geodesic Voronoi Diagrams
    ACM Transactions on Graphics, 2017, Vol. 36, No. 2, 15:1-15:15.   
    Yong-Jin Liu, Dian Fan, Chunxu Xu, Ying He

    Intrinsic Delaunay triangulation (IDT) naturally generalizes Delaunay triangulation from $R^2$ to curved surfaces. Due to many favorable properties, the IDT whose vertex set includes all mesh vertices is of particular interest in polygonal mesh processing. To date, the only way for constructing such IDT is the edge-flipping algorithm, which iteratively flips non-Delaunay edges to become locally Delaunay. Although this algorithm is conceptually simple and guarantees to terminate in finite steps, it has no known time complexity and may also produce triangulations containing faces with only two edges. This article develops a new method to obtain proper IDTs on manifold triangle meshes. We first compute a geodesic Voronoi diagram (GVD) by taking all mesh vertices as generators and then find its dual graph. The sufficient condition for the dual graph to be a proper triangulation is that all Voronoi cells satisfy the so-called closed ball property. To guarantee the closed ball property everywhere, a certain sampling criterion is required. For Voronoi cells that violate the closed ball property, we fix them by computing topologically safe regions, in which auxiliary sites can be added without changing the topology of the Voronoi diagram beyond them. Given a mesh with n vertices, we prove that by adding at most $O(n)$ auxiliary sites, the computed GVD satisfies the closed ball property, and hence its dual graph is a proper IDT. Our method has a theoretical worst-case time complexity $O(n^2 + tn log n)$, where t is the number of obtuse angles in the mesh. Computational results show that it empirically runs in linear time on real-world models.




    A survey of the state-of-the-art in patch-based synthesis
    Computational Visual Media, 2017, Vol. 3, No. 1, 3-20.   
    Connelly Barnes and Fang-Lue Zhang

    This paper surveys the state-of-the-art of research in patch-based synthesis. Patch-based methods synthesize output images by copying small regions from exemplar imagery. This line of research originated from an area called ¡°texture synthesis¡±, which focused on creating regular or semi-regular textures from small exemplars. However, more recently, much research has focused on synthesis of larger and more diverse imagery, such as photos, photo collections, videos, and light fields. Additionally, recent research has focused on customizing the synthesis process for particular problem domains, such as synthesizing artistic or decorative brushes, synthesis of rich materials, and synthesis for 3D fabrication. This report investigates recent papers that follow these themes, with a particular emphasis on papers published since 2009, when the last survey in this area was published. This survey can serve as a tutorial for readers who are not yet familiar with these topics, as well as provide comparisons between these papers, and highlight some open problems in this area.



    Other publications in 2017

    1. Zhao-Heng Zheng, Hao-Tian Zhang, Fang-Lue Zhang, Tai-Jiang Mu, Image-based clothes changing system, Computational Visual Media, 2017, Vol. 3, No. 4, 337-347.   
    2. Han-Chao Liu, Fang-Lue Zhang, David Marshall, Luping Shi, Shi-Min Hu, High-speed video generation with an event camera, The Visual Computer, 2017, Vol. 33, No. 6-8, 749-759.   
    3. Ruochen Fan, Fang-Lue Zhang, Min Zhang, Ralph R. Martin, Robust tracking-by-detection using a selection and completion mechanism, Computational Visual Media, 2017, Vol. 3, No. 3, 285-294.   
    4. Bin Liu, Kun Xu and Ralph Martin, Static Scene Illumination Estimation from Video with Applications, Journal of Computer Science and Technology, 2017, Vol. 32, No. 3, 430-442.   
    5. Haozhi Huang, Xiaonan Fang, Yufei Ye, Songhai Zhang and Paul L. Rosin, Practical automatic background substitution for live video, Computational Visual Media, 2017, Vol. 3, No. 3, 273-284.   



    2016





    Extracting 3D Objects from Photographs Using 3-Sweep
    Communication of ACM, 2016, Vol. 59, No. 12, 121-129.(It's invited highlight paper based on a earlier paper in ACM TOG 2013)    
    Tao Chen, Zhe Zhu, Shi-Min Hu, Daniel Cohen-Or, and Ariel Shamir

    We introduce an interactive technique to extract and manipulate simple 3D shapes in a single photograph. Such extraction requires an understanding of the shape¡¯s components, their projections, and their relationships. These cognitive tasks are simple for humans, but particularly difficult for automatic algorithms. Thus, our approach combines the cognitive abilities of humans with the computational accuracy of the machine to create a simple modeling tool. In our interface, the human draws three strokes over the photograph to generate a 3D component that snaps to the outline of the shape. Each stroke defines one dimension of the component. Such human assistance implicitly segments a complex object into its components, and positions them in space. The computer reshapes the component to fit the image of the object in the photograph as well as to satisfy various inferred geometric constraints between components imposed by a global 3D structure. We show that this intelligent interactive modeling tool provides the means to create editable 3D parts quickly. Once the 3D object has been extracted, it can be quickly edited and placed back into photos or 3D scenes, permitting object-driven photo editing tasks which are impossible to perform in image-space.




    Robust Background Identification for Dynamic Video Editing
    ACM Transactions on Graphics, Vol. 35, No. 6. ACM SIGGRAPH ASIA 2016, Article No. 197.   
    Fang-Lue Zhang, Xian Wu, Hao-Tian Zhang, Jue Wang, Shi-Min Hu

    Extracting background features for estimating the camera path is a key step in many video editing and enhancement applications. Existing approaches often fail on highly dynamic videos that are shot by moving cameras and contain severe foreground occlusion. Based on existing theories, we present a new, practical method that can reliably identify background features in complex video, leading to accurate camera path estimation and background layering. Our approach contains a local motion analysis step and a global optimization step. We first divide the input video into overlapping temporal windows, and extract local motion clusters in each window. We form a directed graph from these local clusters, and identify background ones by finding a minimal path through the graph using optimization. We show that our method significantly outperforms other alternatives, and can be directly used to improve common video editing applications such as stabilization, compositing and background reconstruction.




    Manifold Differential Evolution (MDE): A Global Optimization Method for Geodesic Centroidal Voronoi Tessellations on Meshes
    ACM Transactions on Graphics, Vol. 35, No. 6. ACM SIGGRAPH ASIA 2016, Article No. 243.   
    Yong-Jin Liu, Chun-Xu Xu, Ran Yi, Dian Fan, Ying He

    Computing centroidal Voronoi tessellations (CVT) has many applications in computer graphics. The existing methods, such as the Lloyd algorithm and the quasi-Newton solver, are efficient and easy to implement; however, they compute only the local optimal solutions due to the highly non-linear nature of the CVT energy. This paper presents a novel method, called manifold differential evolution (MDE), for computing globally optimal geodesic CVT energy on triangle meshes. Formulating the mutation operator using discrete geodesics, MDE naturally extends the powerful differential evolution framework from Euclidean spaces to manifold domains. Under mild assumptions, we show that MDE has a provable probabilistic convergence to the global optimum. Experiments on a wide range of 3D models show that MDE consistently outperforms the existing methods by producing results with lower energy. Thanks to its intrinsic and global nature, MDE is insensitive to initialization and mesh tessellation. Moreover, it is able to handle multiply-connected Voronoi cells, which are challenging to the existing geodesic CVT methods.




    A Robust Divide and Conquer Algorithm for Progressive Medial Axes of Planar Shapes
    IEEE Transactions on Visualization and Computer Graphics, 2016, Vol.22, No.12, 2522-2536.   
    Yong-Jin Liu, Cheng-Chi Yu, Min-Jing Yu, Kai Tang, and Deok-Soo Kim

    The medial axis is an important shape representation that finds a wide range of applications in shape analysis. For largescale shapes of high resolution, a progressive medial axis representation that starts with the lowest resolution and gradually adds more details is desired. In this paper, we propose a fast and robust geometric algorithm that computes progressive medial axes of a largescale planar shape. The key ingredient of our method is a novel structural analysis of merging medial axes of two planar shapes along a shared boundary. Our method is robust by separating the analysis of topological structure from numerical computation. Our method is also fast and we show that the time complexity of merging two medial axes is $O(n log n_v)$, where $n$ is the number of total boundary generators, $n_v$ is strictly smaller than n and behaves as a small constant in all our experiments. Experiments on large-scale polygonal data and comparison with state-of-the-art methods show the efficiency and effectiveness of the proposed method.




    HFS: Hierarchical Feature Selection for Efficient Image Segmentation
    European Conference on Computer Vision (ECCV), 2016, 867-882.   
    Ming-Ming Cheng, Yun Liu, Qibin Hou, Jiawang Bian, Philip Torr, Shi-Min Hu, and Zhuowen Tu

    In this paper, we propose a real-time system, Hierarchical Feature Selection (HFS), that performs image segmentation at a speed of 50 frames-per-second. We make an attempt to improve the performance of previous image segmentation systems by focusing on two aspects: (1) a careful system implementation on modern GPUs for efficient feature computation; and (2) an effective hierarchical feature selection and fusion strategy with learning. Compared with classic segmentation algorithms, our system demonstrates its particular advantage in speed, with comparable results in segmentation quality. Adopting HFS in applications like salient object detection and object proposal generation results in a significant performance boost. Our proposed HFS system (will be opensourced) can be used in a variety computer vision tasks that are built on top of image segmentation and superpixel extraction.




    Appearance Harmonization for Single Image Shadow Removal
    Computer Graphics Forum, Vol. 35, No.7, 189-197, PG 2016.   
    Liqian Ma, Jue Wang, Eli Shechtman, Kalyan Sunkavalli and Shi-min Hu

    Shadow removal is a challenging problem and previous approaches often produce de-shadowed regions that are visually inconsistent with the rest of the image.We propose an automatic shadow region harmonization approach that makes the appearance of a de-shadowed region (produced using any previous technique) compatible with the rest of the image. We use a shadow-guided patch-based image synthesis approach that reconstructs the shadow region using patches sampled from non-shadowed regions. This result is then refined based on the reconstruction confidence to handle unique textures. Qualitative comparisons over a wide range of images, and a quantitative evaluation on a benchmark dataset show that our technique significantly improves upon the state-of-the-art.




    Multiphase SPH Simulation for Interactive Fluids and Solids
    ACM Transactions on Graphics, Vol. 35, No. 4. ACM SIGGRAPH 2016   
    Xiao Yan, Yun-Tao Jiang, Chen-Feng Li, Ralph R. Martin, and Shi-Min Hu

    This work extends existing multiphase-fluid SPH frameworks to cover solid phases, including deformable bodies and granular materials. In our extended multiphase SPH framework, the distribution and shapes of all phases, both fluids and solids, are uniformly represented by their volume fraction functions. The dynamics of the multiphase system is governed by conservation of mass and momentum within different phases. The behavior of individual phases and the interactions between them are represented by corresponding constitutive laws, which are functions of the volume fraction fields and the velocity fields. Our generalized multiphase SPH framework does not require separate equations for specific phases or tedious interface tracking. As the distribution, shape and motion of each phase is represented and resolved in the same way, the proposed approach is robust, efficient and easy to implement. Various simulation results are presented to demonstrate the capabilities of our new multiphase SPH framework, including deformable bodies, granular materials, interaction between multiple fluids and deformable solids, flow in porous media, and dissolution of deformable solids.




    Versatile Interactions at Interfaces for SPH-Based Simulations
    Eurographics/ ACM SIGGRAPH Symposium on Computer Animation,2016   
    Tao Yang, Ming C. Lin, Ralph R. Martin, Jian Chang, and Shi-Min Hu

    The realistic capture of various interactions at interfaces is a challenging problem for SPH-based simulation. Previous works have mainly considered a single type of interaction, while real-world phenomena typically exhibit multiple interactions at different interfaces. For instance, when cracking an egg, there are simultaneous interactions between air, egg white, egg yolk, and the shell. To conveniently handle all interactions simultaneously in a single simulation, a versatile approach is critical. In this paper, we present a new approach to the surface tension model based on pairwise interaction forces; its basis is to use a larger number of neighboring particles. Our model is stable, conserves momentum, and furthermore, prevents the particle clustering problem which commonly occurs at the free surface. It can be applied to simultaneous interactions at multiple interfaces (e.g. fluid-solid and fluid-fluid). Our method is versatile, physically plausible and easy-to-implement.We also consider the close connection between droplets and bubbles, and show how to animate bubbles in air as droplets, with the help of a new surface particle detection method. Examples are provided to demonstrate the capabilities and effectiveness of our approach.




    Traffic-Sign Detection and Classification in the Wild
    IEEE CVPR, 2016. 2110-2118   
    Zhe Zhu, Dun Liang, Song-Hai Zhang, Xiaolei Huang, Baoli Li and Shi-Min Hu

    Although promising results have been achieved in the areas of traffic-sign detection and classification, few works have provided simultaneous solutions to these two tasks for realistic real world images. We make two contributions to this problem. Firstly, we have created a large traffic-sign benchmark from 100000 Tencent Street View panoramas, going beyond previous benchmarks. It provides 100000 images containing 30000 traffic-sign instances. These images cover large variations in illuminance and weather conditions. Each traffic-sign in the benchmark is annotated with a class label, its bounding box and pixel mask. We call this benchmark Tsinghua-Tencent 100K. Secondly, we demon- strate how a robust end-to-end convolutional neural network (CNN) can simultaneously detect and classify traffic- signs. Most previous CNN image processing solutions target objects that occupy a large proportion of an image, and such networks do not work well for target objects occupying only a small fraction of an image like the traffic-signs here. Experimental results show the robustness of our network and its superiority to alternatives. The benchmark, source code and the CNN model introduced in this paper is publicly available.




    Manifold SLIC: A Fast Method to Compute Content-Sensitive Superpixels
    IEEE CVPR, 2016. 2110-2118   
    Yong-Jin Liu, Cheng-Chi Yu, Min-Jing Yu, Ying He

    Superpixels are perceptually meaningful atomic regions that can effectively capture image features. Among various methods for computing uniform superpixels, simple linear iterative clustering (SLIC) is popular due to its simplicity and high performance. In this paper, we extend SLIC to compute content-sensitive superpixels, i.e., small superpixels in content-dense regions (e.g., with high intensity or color variation) and large superpixels in content-sparse regions. Rather than the conventional SLIC method that clusters pixels in $R^5$, we map the image $I$ to a 2-dimensional manifold $M \ inR^5$, whose area elements are a good measure of the content density in $I$. We propose an efficient method to compute restricted centroidal Voronoi tessellation (RCVT) ¡ª a uniform tessellation ¡ª on $M$, which induces the content-sensitive superpixels in $I$. Unlike other algorithms that characterize content-sensitivity by geodesic distances, manifold SLIC tackles the problem by measuring areas of Voronoi cells on $M$, which can be computed at a very low cost. As a result, it runs 10 times faster than the state-of-the-art content-sensitive superpixels algorithm. We evaluate manifold SLIC and seven representative methods on the BSDS500 benchmark and observe that our method outperforms the existing methods.




    Faithful Completion of Images of Scenic Landmarks using Internet Images
    IEEE Transactions on Visualization and Computer Graphics, 2016, Vol.22, No. 8, 1945-1958.   
    Zhe Zhu, Hao-Zhi Huang, Zhi-Peng Tan, Kun Xu, and Shi-Min Hu

    Previous works on image completion typically aim to produce visually plausible results rather than factually correct ones. In this paper, we propose an approach to faithfully complete the missing regions of an image. We assume that the input image is taken at a well-known landmark, so similar images taken at the same location can be easily found on the Internet. We first download thousands of images from the Internet using a text label provided by the user. Next, we apply two-step filtering to reduce them to a small set of candidate images for use as source images for completion. For each candidate image, a co-matching algorithm is used to find correspondences of both points and lines between the candidate image and the input image. These are used to find an optimal warp relating the two images. A completion result is obtained by blending the warped candidate image into the missing region of the input image. The completion results are ranked according to combination score, which considers both warping and blending energy, and the highest ranked ones are shown to the user. Experiments and results demonstrate that our method can faithfully complete images.




    Support Substructures: Support-Induced Part-Level Structural Representation
    IEEE Transactions on Visualization and Computer Graphics, 2016, Vol.22, No. 8, 2024-2036.   
    Shi-sheng Huang, Hongbo Fu, Lingyu Wei, Shi-Min Hu

    In this work we explore a support-induced structural organization of object parts. We introduce the concept of support substructures, which are special subsets of object parts with support and stability. A bottom-up approach is proposed to identify such substructures in a support relation graph. We apply the derived high-level substructures to part-based shape reshuffling between models, resulting in nontrivial functionally plausible model variations that are difficult to achieve with symmetry-induced substructures by the state-of-the-art methods. We also show how to automatically or interactively turn a single input model to new functionally plausible shapes by structure rearrangement and synthesis, enabled by support substructures. To the best of our knowledge no single existing method has been designed for all these applications.




    Efficient, Edge-Aware, Combined Color Quantization and Dithering
    IEEE Transactions on Image Processing, 2016, Vol. 26, No. 3, 1152 - 1162.   
    Hao-Zhi Huang, Kun Xu, Ralph R. Martin, Fei-Yue Huang, and Shi-Min Hu

    In this paper we present a novel algorithm to simultaneously accomplish color quantization and dithering of images. This is achieved by minimizing a perception-based cost function which considers pixel-wise differences between filtered versions of the quantized image and the input image. We use edge aware filters in defining the cost function to avoid mixing colors on opposite sides of an edge. The importance of each pixel is weighted according to its saliency. To rapidly minimize the cost function, we use a modified multi-scale iterative conditional mode (ICM) algorithm which updates one pixel a time while keeping other pixels unchanged. As ICM is a local method, careful initialization is required to prevent termination at a local minimum far from the global one. To address this problem, we initialize ICM with a palette generated by a modified mediancut method. Compared to previous approaches, our method can produce high quality results with fewer visual artifacts but also requires significantly less computational effort.




    Comfort-driven disparity adjustment for stereoscopic video
    Computational Visual Media, 2016, Vol.2, No. 1, 3-17   
    Miao Wang, Xi-Jin Zhang, Jun-Bang Liang, Song-Hai Zhang, and Ralph R. Martin

    Pixel disparity¡ªthe offset of corresponding pixels between left and right views¡ªis a crucial parameter in stereoscopic three-dimensional (S3D) video, as it determines the depth perceived by the human visual system (HVS). Unsuitable pixel disparity distribution throughout an S3D video may lead to visual discomfort. We present a unified and extensible stereoscopic video disparity adjustment framework which improves the viewing experience for an S3D video by keeping the perceived 3D appearance as unchanged as possible while minimizing discomfort. We first analyse disparity and motion attributes of S3D video in general, then derive a wide-ranging visual discomfort metric from existing perceptual comfort models. An objective function based on this metric is used as the basis of a hierarchical optimisation method to find a disparity mapping function for each input video frame. Warping-based disparity manipulation is then applied to the input video to generate the output video, using the desired disparity mappings as constraints. Our comfort metric takes into account disparity range, motion, and stereoscopic window violation; the framework could easily be extended to use further visual comfort models. We demonstrate the power of our approach using both animated cartoons and real S3D videos.



    2015





    3D indoor scene modeling from RGB-D data: a survey
    Computational Visual Media, Vol. 1, No. 4, 267-278   
    Kang Chen, Yu-Kun Lai, Shi-Min Hu

    3D scene modeling has long been a fundamental problem in computer graphics and computer vision. With the popularity of consumer-level RGB-D cameras, there is a growing interest in digitizing real-world indoor 3D scenes. However, modeling indoor 3D scenes remains a challenging problem because of the complex structure of interior objects and poor quality of RGB-D data acquired by consumer-level sensors. Various methods have been proposed to tackle these challenges. In this survey, we provide an overview of recent advances in indoor scene modeling techniques, as well as public datasets and code libraries which can facilitate experiments and evaluation.




    Simultaneous Camera Path Optimization and Distraction Removal for Improving Amateur Video
    IEEE Transactions on Image Processing, 2015, Vol.24, No.12, 5982 - 5994.   
    Fang-Lue Zhang, Jue Wang, Han Zhao, Ralph R. Martin, Shi-Min Hu

    A major difference between amateur and professional video lies in the quality of camera paths. Previous work on video stabilization has considered how to improve amateur video by smoothing the camera path. In this paper, we show that additional changes to the camera path can further improve video aesthetics. Our new optimization method achieves multiple simultaneous goals: (i) stabilizing video content over short time scales, (ii) ensuring simple and consistent camera paths over longer time scales, and (iii) improving scene composition by automatically removing distractions, a common occurrence in amateur video. Our approach uses an L1 camera path optimization framework, extended to handle multiple constraints. Two-passes of optimization are used to address both low-level and high-level constraints on the camera path. Experimental and user study results show that our approach outputs video which is perceptually better than the input, or the results of using stabilization only.




    Magic Decorator: Automatic Material Suggestion for Indoor Digital Scenes
    ACM Transactions on Graphics, Vol. 34, No. 6, Article No. 232, SIGGRAPH ASIA 2015.   
    Kang Chen, Kun Xu, Yizhou Yu, Tian-Yi Wang, Shi-Min Hu

    Assigning textures and materials within 3D scenes is a tedious and labor-intensive task. In this paper, we present Magic Decorator, a system that automatically generates material suggestions for 3D indoor scenes. To achieve this goal, we introduce local material rules, which describe typical material patterns for a small group of objects or parts, and global aesthetic rules, which account for the harmony among the entire set of colors in a specific scene. Both rules are obtained from collections of indoor scene images. We cast the problem of material suggestion as a combinatorial optimization considering both local material and global aesthetic rules. We have tested our system on various complex indoor scenes. A user study indicates that our system can automatically and efficiently produce a series of visually plausible material suggestions which are comparable to those produced by artists.




    Fast Multiple-fluid Simulation Using Helmholtz Free Energy
    ACM Transactions on Graphics, Vol. 34, No. 6, Article No. 201, SIGGRAPH ASIA 2015.   
    Tao Yang, Jian Chang, Bo Ren, Ming C. Lin, Jian Jun Zhang, and Shi-Min Hu

    Multiple-fluid interaction is an interesting and common visual phenomenon we often observe. In this paper we present an energybased Lagrangian method that expands the capability of existing multiple-fluid methods to handle various phenomena, including extraction, partial dissolution, etc. Based on our user-adjusted Helmholtz free energy functions, the simulated fluid evolves from high-energy states to low-energy states, allowing flexible capture of various mixing and unmixing processes. We also extend the original Cahn-Hilliard equation to gain abilities of simulating complex fluid-fluid interaction and rich visual phenomena such as motionrelated mixing and position based pattern. Our approach is easy to be integrated with existing state-of-the-art smooth particle hydrodynamic (SPH) solvers and can be further implemented on top of the position based dynamics (PBD) method, improving the stability and incompressibility of the fluid during Lagrangian simulation under large time steps. Performance analysis shows that our method is at least 4 times faster than the state-of-the-art multiple-fluid method. Examples are provided to demonstrate the new capability and effectiveness of our approach.




    Efficient Construction and Simplification of Delaunay Meshes
    ACM Transactions on Graphics, Vol. 34, No.6, Article No.174, SIGGRAPH ASIA 2015.   
    Yong-Jin Liu, Chun-Xu Xu, Dian Fan, Ying He

    Delaunay meshes (DM) are a special type of triangle mesh where the local Delaunay condition holds everywhere. We present an efficient algorithm to convert an arbitrary manifold triangle mesh M into a Delaunay mesh. We show that the constructed DM has O(Kn) vertices, where n is the number of vertices in M and K is a model-dependent constant. We also develop a novel algorithm to simplify Delaunay meshes, allowing a smooth choice of detail levels. Our methods are conceptually simple, theoretically sound and easy to implement. The DM construction algorithm also scales well due to its O(nK logK) time complexity. Delaunay meshes have many favorable geometric and numerical properties. For example, a DM has exactly the same geometry as the input mesh, and it can be encoded by any mesh data structure. Moreover, the empty geodesic circumcircle property implies that the commonly used cotangent Laplace-Beltrami operator has non-negative weights. Therefore, the existing digital geometry processing algorithms can benefit the numerical stability of DM without changing any codes. We observe that DMs can improve the accuracy of the heat method for computing geodesic distances. Also, popular parameterization techniques, such as discrete harmonic mapping, produce more stable results on the DMs than on the input meshes.




    Active Exploration of Large 3D Model Repositories
    IEEE Transactions on Visualization and Computer Graphics, Vol. 21, No.12, 1390-1402.    
    Lin Gao, Yan-Pei Cao, Yu-Kun Lai, Hao-Zhi Huang, Leif Kobbelt, Shi-Min Hu

    With broader availability of large-scale 3D model repositories, the need for efficient and effective exploration becomes more and more urgent. Existing model retrieval techniques do not scale well with the size of the database since often a large number of very similar objects are returned for a query, and the possibilities to refine the search are quite limited. We propose an interactive approach where the user feeds an active learning procedure by labeling either entire models or parts of them as ¡°like¡± or ¡°dislike¡± such that the system can automatically update an active set of recommended models. To provide an intuitive user interface, candidate models are presented based on their estimated relevance for the current query. From the methodological point of view, our main contribution is to exploit not only the similarity between a query and the database models but also the similarities among the database models themselves. We achieve this by an offline pre-processing stage, where global and local shape descriptors are computed for each model and a sparse distance metric is derived that can be evaluated efficiently even for very large databases. We demonstrate the effectiveness of our method by interactively exploring a repository containing over 100K models.




    Anisotropic density estimation for photon mapping
    Computational Visual Media, Vol. 1, No. 3, 221-228   
    Fu-Jun Luan, Li-Fan Wu, Kun Xu

    Photon mapping is a widely used technique for global illumination rendering. In the density estimation step of photon mapping, the indirect radiance at a shading point is estimated through a ltering process using nearby stored photons; an isotropic ltering kernel is usually used. However, using an isotropic kernel is not always the optimal choice, especially for cases when eye paths intersect with surfaces with anisotropic BRDFs. In this paper, we propose an anisotropic ltering kernel for density estimation to handle such anisotropic eye paths. The anisotropic ltering kernel is derived from the recently introduced anisotropic spherical Gaussian representation of BRDFs. Compared to conventional photon mapping, our method is able to reduce rendering errors with negligible additional cost when rendering scenes containing anisotropic BRDFs.




    Semi-Continuity of Skeletons in 2-Manifold and Discrete Voronoi Approximation
    IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 37, No. 9, 1938 - 1944.   
    Yong-Jin Liu

    The skeleton of a 2D shape is an important geometric structure in pattern analysis and computer vision. In this paper we study the skeleton of a 2D shape in a 2-manifold $\mathcal{M}$, based on a geodesic metric. We present a formal definition of the skeleton $S(\Omega)$ for a shape $\Omega$ in $\mathcal{M}$ and show several properties that make $S(\Omega)$ distinct from its Euclidean counterpart in $\mathbb{R}^2$. We further prove that for a shape sequence $\{\Omega_i\}$ that converge to a shape $\Omega$ in $\mathcal{M}$, the mapping $\Omega\rightarrow\overline{S}(\Omega)$ is lower semi-continuous. A direct application of this result is that we can use a set $P$ of sample points to approximate the boundary of a 2D shape $\Omega$ in $\mathcal{M}$, and the Voronoi diagram of $P$ inside $\Omega\subset\mathcal{M}$ gives a good approximation to the skeleton $S(\Omega)$. Examples of skeleton computation in topography and brain morphometry are illustrated.




    A simple approach for bubble modelling from multiphase fluid simulation
    Computational Visual Media, Vol. 1, No. 2, 171-181   
    Bo Ren, Yuntao Jiang, Chenfeng Li, Ming C. Lin

    This article presents a novel and flexible bubble modelling technique for multi-fluid simulations using a volume fraction representation. By combining the volume fraction data obtained from a primary multi-fluid simulation with simple and efficient secondary bubble simulation, a range of real-world bubble phenomena are captured with a high degree of physical realism, including large bubble deformation, sub-cell bubble motion, bubble stacking over the liquid surface, bubble volume change, dissolving of bubbles, etc. Without any change in the primary multi-fluid simulator, our bubble modelling approach is applicable to any multi-fluid simulator based on the volume fraction representation.




    PatchTable: Efficient Patch Queries for Large Datasets and Applications
    ACM Transactions on Graphics, Vol. 34, No. 4, Article No. 97, SIGGRAPH 2015.   
    Connelly Barnes, Fang-Lue Zhang, Liming Lou, Xian Wu, Shi-Min Hu

    This paper presents a data structure that reduces approximate nearest neighbor query times for image patches in large datasets. Previous work in texture synthesis has demonstrated real-time synthesis from small exemplar textures. However, high performance has proved elusive for modern patch-based optimization techniques which frequently use many exemplar images in the tens of megapixels or above. Our new algorithm, PatchTable, offloads as much of the computation as possible to a pre-computation stage that takes modest time, so patch queries can be as efficient as possible. There are three key insights behind our algorithm: (1) a lookup table similar to locality sensitive hashing can be precomputed, and used to seed sufficiently good initial patch correspondences during querying, (2) missing entries in the table can be filled during precomputation with our fast Voronoi transform, and (3) the initially seeded correspondences can be improved with a precomputed knearest neighbors mapping. We show experimentally that this accelerates the patch query operation by up to 9 over k-coherence, up to 12 over TreeCANN, and up to 200 over PatchMatch. Our fast algorithm allows us to explore efficient and practical imaging and computational photography applications. We show results for artistic video stylization, light field super-resolution, and multiimage editing.




    Panorama completion for street views
    Computational Visual Media, Vol. 1, No. 1, 49-57   
    Zhe Zhu, Ralph R. Martin, Shi-Min Hu

    This paper considers panorama images used for street views. Their viewing angle of 360 degree causes pixels at the top and bottom to appear stretched and warped. Although current image completion algorithms work well, they cannot be directly used in the presence of such distortions found in panoramas of street views. We thus propose a novel approach to complete such 360 degree panoramas using optimization-based projection to deal with distortions. Experimental results show that our approach is efficient and provides an improvement over standard image completion algorithms.




    Fast Wavefront Propagation (FWP)for Computing Exact Geodesic Distances on Meshes
    IEEE Transactions on Visualization and Computer Graphics, 2015, Vol 21, No. 7, 822-834.   
    Chunxu Xu, Tuanfeng Y. Wang, Yong-Jin Liu, Ligang Liu, Ying He

    Computing geodesic distances on triangle meshes is a fundamental problem in computational geometry and computer graphics. To date, two notable classes of algorithms, the Mitchell-Mount-Papadimitriou (MMP) algorithm and the Chen-Han (CH) algorithm, have been proposed. Although these algorithms can compute exact geodesic distances if numerical computation is exact, they are computationally expensive, which diminishes their usefulness for large-scale models and/or time-critical applications. In this paper, we propose the fast wavefront propagation (FWP) framework for improving the performance of both the MMP and CH algorithms. Unlike the original algorithms that propagate only a single window (a data structure locally encodes geodesic information) at each iteration, our method organizes windows with a bucket data structure so that it can process a large number of windows simultaneously without compromising wavefront quality. Thanks to its macro nature, the FWP method is less sensitive to mesh triangulation than the MMP and CH algorithms. We evaluate our FWP-based MMP and CH algorithms on a wide range of large-scale real-world models. Computational results show that our method can improve the speed by a factor of 3-10.




    A Response Time Model for Abrupt Changes in Binocular Disparity
    The Visual Computer, 2015, Vol. 31, N0. 5, 675-687.    
    Tai-Jiang Mu, Jia-Jia Sun, Ralph Martin, Shi-Min Hu

    We propose a novel depth perception model to determine the time taken by the human visual system (HVS) to adapt to an abrupt change in stereoscopic disparity, such as can occur in a scene cut. A series of carefully designed perceptual experiments on successive disparity contrast were used to build our model. Factors such as disparity, changes in disparity, and the spatial frequency of luminance contrast were taken into account. We further give a computational method to predict the response time during scene cuts in stereoscopic cinematography, which has been validated in user studies. We also consider various applications of our model.




    Global Contrast based Salient Region Detection
    IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015,Vol. 37, No. 3, 569 - 582.    
    Ming-Ming Cheng, Niloy J. Mitra, Xiaolei Huang, Philip H. S. Torr, and Shi-Min Hu
    (Earlier version was presented in IEEE CVPR 2011)

    Automatic estimation of salient object regions across images, without any prior assumption or knowledge of the contents of the corresponding scenes, enhances many computer vision and computer graphics applications. We introduce a regional contrast based salient object detection algorithm, which simultaneously evaluates global contrast differences and spatial weighted coherence scores. The proposed algorithm is simple, efficient, naturally multi-scale, and produces full-resolution, high-quality saliency maps. These saliency maps are further used to initialize a novel iterative version of GrabCut, namely SaliencyCut, for high quality unsupervised salient object segmentation. We extensively evaluated our algorithm using traditional salient object detection datasets, as well as a more challenging Internet image dataset. Our experimental results demonstrate that our algorithm consistently outperforms 15 existing salient object detection and segmentation methods, yielding higher precision and better recall rates. We also show that our algorithm can be used to efficiently extract salient object masks from Internet images, enabling effective sketch-based image retrieval (SBIR) via simple shape comparisons. Despite such noisy internet images, where the saliency regions are ambiguous, our saliency guided image retrieval achieves a superior retrieval rate compared with state-of-the-art SBIR methods, and additionally provides important target object region information.



    2014





    BiggerPicture: Data-Driven Image Extrapolation Using Graph Matching
    ACM Transactions on Graphics, 2014, Vol. 33, No. 6, Article No. 173 (ACM SIGGRAPH ASIA 2014).   
    Miao Wang, Yu-Kun Lai, Yuan Liang, Ralph R. Martin, Shi-Min Hu

    Filling a small hole in an image with plausible content is well studied. Extrapolating an image to give a distinctly larger one is much more challenging¡ªa significant amount of additional content is needed which matches the original image, especially near its boundaries. We propose a data-driven approach to this problem. Given a source image, and the amount and direction(s) in which it is to be extrapolated, our system determines visually consistent content for the extrapolated regions using library images. As well as considering low-level matching, we achieve consistency at a higher level by using graph proxies for regions of source and library images. Treating images as graphs allows us to find candidates for image extrapolation in a feasible time. Consistency of subgraphs in source and library images is used to find good candidates for the additional content; these are then further filtered. Region boundary curves are aligned to ensure consistency where image parts are joined using a photomontage method. We demonstrate the power of our method in image editing applications.




    Improving Visual Quality of View Transitions in Automultiscopic Displays
    ACM Transactions on Graphics, 2014, Vol. 33, No. 6, Article No. 192(ACM SIGGRAPH ASIA 2014).   
    Song-Pei Du, Piotr Didyk, Fredo Durand, Shi-Min Hu, Wojciech Matusik

    Automultiscopic screens present different images depending on the viewing direction. This enables glasses-free 3D and provides motion parallax effect. However, due to the limited angular resolution of such displays, they suffer from hot-spotting, i. e., image quality is highly affected by the viewing position. In this paper, we analyze light fields produced by lenticular and parallax-barrier displays, and show that, unlike in real world, the light fields produced by such screens have a repetitive structure. This induces visual artifacts in the form of view discontinuities, depth reversals, and excessive disparities when viewing position is not optimal. Although the problem has been always considered as inherent to the technology, we demonstrate that light fields reproduced on automultiscopic displays have enough degrees of freedom to improve the visual quality. We propose a new technique that modifies light fields using global and local shears followed by stitching to improve their continuity when displayed on a screen. We show that this enhances visual quality significantly, which is demonstrated in a series of user experiments with an automultiscopic display as well as lenticular prints.




    Automatic Semantic Modeling of Indoor Scenes from Low-quality RGB-D Data using Contextual Information
    ACM Transactions on Graphics, 2014, Vol. 33, No. 6, Article 208(ACM SIGGRAPH ASIA 2014).   
    Kang Chen, Yu-Kun Lai, Yu-Xin Wu, Ralph Martin, Shi-Min Hu

    We present a novel solution to automatic semantic modeling of indoor scenes from a sparse set of low-quality RGB-D images. Such data presents challenges due to noise, low resolution, occlusion and missing depth information. We exploit the knowledge in a scene database containing 100s of indoor scenes with over 10,000 manually segmented and labeled mesh models of objects. In seconds, we output a visually plausible 3D scene, adapting these models and their parts to fit the input scans. Contextual relationships learned from the database are used to constrain reconstruction, ensuring semantic compatibility between both object models and parts. Small objects and objects with incomplete depth information which are difficult to recover reliably are processed with a two-stage approach. Major objects are recognized first, providing a known scene structure. 2D contour-based model retrieval is then used to recover smaller objects. Evaluations using our own data and two public datasets show that our approach can model typical real-world indoor scenes efficiently and robustly.




    Multiple-fluid SPH Simulation Using a Mixture Model
    ACM Transactions on Graphics, 2014, Vol. 33, No. 5, article 171.   
    Bo Ren, Chen-Feng Li, Xiao Yan, Ming C. Lin, Javier Bonet, and Shi-Min Hu

    This paper presents a versatile and robust SPH simulation approach for multiple-fluid flows. The spatial distribution of different phases or components is modeled using the volume fraction representation, the dynamics of multiple-fluid flows is captured by using an improved mixture model, and a stable and accurate SPH formulation is rigorously derived to resolve the complex transport and transformation processes encountered in multiple-fluid flows. The new approach can capture a wide range of realworld multiple-fluid phenomena, including mixing/unmixing of miscible and immiscible fluids, diffusion effect and chemical reaction etc. Moreover, the new multiple-fluid SPH scheme can be readily integrated into existing state-of-the-art SPH simulators, and the multiple-fluid simulation is easy to set up. Various examples are presented to demonstrate the effectiveness of our approach.




    Interactive Image-Guided Modeling of Extruded Shapes
    Computer Graphics Forum, 2014, Vol. 33, No. 7, 101-110 (Pacific Graphics 2014).   
    Yan-Pei Cao, Tao Ju, Zhao Fu, Shi-Min Hu
    (This paper is one of the two Best student papers in Pacific Graphics 2014)

    A recent trend in interactive modeling of 3D shapes from a single image is designing minimal interfaces, and accompanying algorithms, for modeling a specific class of objects. Expanding upon the range of shapes that existing minimal interfaces can model, we present an interactive image-guided tool for modeling shapes made up of extruded parts. An extruded part is represented by extruding a closed planar curve, called base, in the direction orthogonal to the base. To model each extruded part, the user only needs to sketch the projected base shape in the image. The main technical contribution is a novel optimization-based approach for recovering the 3D normal of the base of an extruded object by exploring both geometric regularity of the sketched curve and image contents. We developed a convenient interface for modeling multi-part shapes and a method for optimizing the relative placement of the parts. Our tool is validated using synthetic data and tested on real-world images.




    Learning Natural Colors for Image Recoloring
    Computer Graphics Forum, 2014, Vol. 33, No. 7, 299-308 (Pacific Graphics 2014).   
    Hao-Zhi Huang, Song-Hai Zhang, Ralph R. Martin, Shi-Min Hu

    We present a data-driven method for automatically recoloring a photo to enhance its appearance or change a viewer¡¯s emotional response to it. A compact representation called a RegionNet summarizes color and geometric features of image regions, and geometric relationships between them. Correlations between color property distributions and geometric features of regions are learned from a database of well-colored photos. A probabilistic factor graph model is used to summarize distributions of color properties and generate an overall probability distribution for color suggestions. Given a new input image, we can generate multiple recolored results which unlike previous automatic results, are both natural and artistic, and compatible with their spatial arrangements.




    Polyline-sourced geodesic Voronoi diagrams on triangle meshes
    Computer Graphics Forum, 2014, Vol. 33, No. 7, 161-170 (Pacific Graphics 2014).   
    Chunxu Xu, Yong-Jin Liu, Qian Sun, Jinyan Li and Ying He

    This paper studies the Voronoi diagrams on 2-manifold meshes based on geodesic metric (a.k.a. geodesic Voronoi diagrams or GVDs), which have polyline generators. We show that our general setting leads to situations more complicated than conventional 2D Euclidean Voronoi diagrams as well as point-source based GVDs, since a typical bisector contains line segments, hyperbolic segments and parabolic segments. To tackle this challenge, we introduce a new concept, called local Voronoi diagram (LVD), which is a combination of additively weighted Voronoi diagram and line-segment Voronoi diagram on a mesh triangle. We show that when restricting on a single mesh triangle, the GVD is a subset of the LVD and only two types of mesh triangles can contain GVD edges. Based on these results, we propose an efficient algorithm for constructing the GVD with polyline generators. Our algorithm runs in O(nNlogN) time and takes O(nN) space on an n-face mesh with m generators, where N = max{m;n}. Computational results on real-world models demonstrate the efficiency of our algorithm.




    Parametric meta-filter modeling from a single example pair
    The Visual Computer, 2014, Vol. 30, No.6-8, 673-684.   
    Shi-Sheng Huang, Guo-Xin Zhang, Yu-Kun Lai, Johannes Kopf, Daniel Cohen-Or, Shi-Min Hu

    We present a method for learning a meta-filter from an example pair comprising an original image A and its filtered version A' using an unknown image filter. A metafilter is a parametric model, consisting of a spatially varying linear combination of simple basis filters. We introduce a technique for learning the parameters of the meta-filter f such that it approximates the effects of the unknownfilter, i.e., f(A) approximates A'. The meta-filter can be transferred to novel input images, and its parametric representation enables intuitive tuning of its parameters to achieve controlled variations. We show that our technique successfully learns and models meta-filters that approximate a large variety of common image filters with high accuracy both visually and quantitatively.




    SalientShape: group saliency in image collections
    The Visual Computer, 2014, Vol. 30, No.4, 443-453.   
    Ming-Ming Cheng, Niloy J. Mitra,Xiaolei Huang, Shi-Min Hu

    Efficiently identifying salient objects in large image collections is essential for many applications including image retrieval, surveillance, image annotation, and object recognition. We propose a simple, fast, and effective algorithm for locating and segmenting salient objects by analysing image collections. As a key novelty, we introduce group saliency to achieve superior unsupervised salient object segmentation by extracting salient objects (in collections of pre-filtered images) that maximize between-image similarities and within-image distinctness. To evaluate our method, we construct a large benchmark dataset consisting of 15 K images across multiple categories with 6000+ pixel-accurate ground truth annotations for salient object regions where applicable. In all our tests, group saliency consistently outperforms state-of-the-art single-image saliency algorithms, resulting in both higher precision and better recall. Our algorithm successfully handles image collections, of an order larger than any existing benchmark datasets, consisting of diverse and heterogeneous images from various internet sources.




    A practical algorithm for rendering interreflections with all-frequency BRDFs
    ACM Transactions on Graphics, 2014, Vol. 33, No.1, Article No. 10.   
    Kun Xu, Yan-Pei Cao, Li-Qian Ma,Zhao Dong, Rui Wang, Shi-Min Hu

    Algorithms for rendering interreflection (or indirect illumination) effects often make assumptions about the frequency range of the materials' reflectance properties. For example, methods based on Virtual Point Lights (VPLs) perform well for diffuse and semi-glossy materials but not so for highly glossy or specular materials; the situation is reversed for methods based on ray tracing. In this article, we present a practical algorithm for rendering interreflection effects with all-frequency BRDFs. Our method builds upon a spherical Gaussian representation of the BRDF, based on which a novel mathematical development of the interreflection equation is made. This allows us to efficiently compute one-bounce interreflection from a triangle to a shading point, by using an analytic formula combined with a piecewise linear approximation. We show through evaluation that this method is accurate for a wide range of BRDFs. We further introduce a hierarchical integration method to handle complex scenes (i.e., many triangles) with bounded errors. Finally, we have implemented the present algorithm on the GPU, achieving rendering performance ranging from near interactive to a few seconds per frame for various scenes with different complexity.




    A Sketch-Based Approach for Interactive Organization of Video Clips
    ACM Transactions on Multimedia Computing, Communications, and Applications, 2014, Vol. 11, No.1, Article No. 2.   
    Yong-Jin Liu, Cui-Xia Ma, Qiufang Fu, Xiaolan Fu, Sheng-Feng Qin, and Lexing Xie

    With the rapid growth of video resources, techniques for efficient organization of video clips are becoming appealing in the multimedia domain. In this article, a sketch-based approach is proposed to intuitively organize video clips by: (1) enhancing their narrations using sketch annotations and (2) structurizing the organization process by gesture-based free-form sketching on touch devices. There are two main contributions of this work. The first is a sketch graph, a novel representation for the narrative structure of video clips to facilitate content organization. The second is a method to perform context-aware sketch recommendation scalable to large video collections, enabling common users to easily organize sketch annotations. A prototype system integrating the proposed approach was evaluated on the basis of five different aspects concerning its performance and usability. Two sketch searching experiments showed that the proposed context-aware sketch recommendation outperforms, in terms of accuracy and scalability, two state-of-the-art sketch searching methods. Moreover, a user study showed that the sketch graph is consistently preferred over traditional representations such as keywords and keyframes. The second user study showed that the proposed approach is applicable in those scenarios where the video annotator and organizer were the same person. The third user study showed that, for video content organization, using sketch graph users took on average 1/3 less time than using a mass-market tool MovieMaker and took on average 1/4 less time than using a state-of-theart sketch alternative. These results demonstrated that the proposed sketch graph approach is a promising video organization tool.



    Other publications in 2014

    1. Bin Liu, Ralph Martin, Ji-Wu Huang, Shi-Min Hu, Structure Aware Visual Cryptography, Computer Graphics Forum, 2014, Vol. 33, No. 7, 141-150 (Pacific Graphics 2014).   
    2. Cheng-Chi Yu, Yong-Jin Liu, Tianfu Wu, Kai-Yun Li, Xiaolan Fu, A global energy optimization framework for 2.1D sketch extraction from monocular images, Graphical Models, 2014, Vol. 76, No.5, 507-521.   
    3. Tai-Jiang Mu, Ju-Hong Wang, Song-Pei Du, Shi-Min Hu, Stereoscopic image completion and depth recovery, The Visual Computer, 2014, Vol. 30, No.6-8, 833-843.   
    4. Long Zeng, Yong-Jin Liu, Jin Wang, Dong-Liang Zhang, Ming-Fai Yuen, Sketch2Jewelry: Semantic feature modeling for sketch-based jewelry design, Computers & Graphics, 2014, Vol. 38, No.1, 69-77 (Presented in CAD/Graphics 2013).   



    2013





    Recovering a Semantic Editing History from a Before-and-After Image Pair
    ACM Transactions on Graphics, Vol. 32, No.6, Article No. 194, 2013 (SIGGRAPH ASIA 2013).   
    Shi-Min Hu, Kun Xu, Li-Qian Ma, Bin Liu, Bi-Ye Jiang and Jue Wang

    We study the problem of inverse image editing, which recovers a semantically-meaningful editing history from a source image and an edited copy. Our approach supports a wide range of commonlyused editing operations such as cropping, object insertion and removal, linear and non-linear color transformations, and spatiallyvarying adjustment brushes. Given an input image pair, we first apply a dense correspondence method between them to match edited image regions with their sources. For each edited region, we determine geometric and semantic appearance operations that have been applied. Finally, we compute an optimal editing path from the region-level editing operations, based on predefined semantic constraints. The recovered history can be used in various applications such as image re-editing, edit transfer, and image revision control.




    PatchNet: A Patch-based Image Representation for Interactive Library-driven Image Editing
    ACM Transactions on Graphics, Vol. 32, No.6, Article No. 196, 2013 (SIGGRAPH ASIA 2013).    
    Shi-Min Hu, Fang-Lue Zhang, Miao Wang, Ralph R. Martin, Jue Wang

    We introduce PatchNets, a compact, hierarchical representation describing structural and appearance characteristics of image regions, for use in image editing. In a PatchNet, an image region with coherent appearance is summarized by a graph node, associated with a single representative patch, while geometric relationships between different regions are encoded by labelled graph edges giving contextual information. The hierarchical structure of a PatchNet allows a coarse-to-fine description of the image. We show how this PatchNet representation can be used as a basis for interactive, library-driven, image editing. The user draws rough sketches to quickly specify editing constraints for the target image. The system then automatically queries an image library to find semanticallycompatible candidate regions to meet the editing goal. Contextual image matching is performed using the PatchNet representation, allowing suitable regions to be found and applied in a few seconds, even from a library containing thousands of images.




    3-Sweep: Extracting Editable Objects from a Single Photo
    ACM Transactions on Graphics, Vol. 32, No.6, Article No. 195, 2013 (SIGGRAPH ASIA 2013).   
    Tao Chen, Zhe Zhu, Ariel Shamir, Shi-Min Hu, Daniel Cohen-Or

    We introduce an interactive technique for manipulating simple 3D shapes based on extracting them from a single photograph. Such extraction requires understanding of the components of the shape, their projections, and relations. These simple cognitive tasks for humans are particularly difficult for automatic algorithms. Thus, our approach combines the cognitive abilities of humans with the computational accuracy of the machine to solve this problem. Our technique provides the user the means to quickly create editable 3D parts¡ª human assistance implicitly segments a complex object into its components, and positions them in space. In our interface, three strokes are used to generate a 3D component that snaps to the shape¡¯s outline in the photograph, where each stroke defines one dimension of the component. The computer reshapes the component to fit the image of the object in the photograph as well as to satisfy various inferred geometric constraints imposed by its global 3D structure. We show that with this intelligent interactive modeling tool, the daunting task of object extraction is made simple. Once the 3D object has been extracted, it can be quickly edited and placed back into photos or 3D scenes, permitting object-driven photo editing tasks which are impossible to perform in image-space.




    Anisotropic Spherical Gaussians
    ACM Transactions on Graphics, Vol. 32, No.6, Article No. 209, 2013 (SIGGRAPH ASIA 2013).   
    Kun Xu, Wei-Lun Sun, Zhao Dong, Dan-Yong Zhao, Run-Dong Wu, Shi-Min Hu

    We present a novel anisotropic Spherical Gaussian (ASG) function, built upon the Bingham distribution [Bingham 1974], which is much more effective and efficient in representing anisotropic spherical functions than Spherical Gaussians (SGs). In addition to retaining many desired properties of SGs, ASGs are also rotationally invariant and capable of representing all-frequency signals. To further strengthen the properties of ASGs, we have derived approximate closed-form solutions for their integral, product and convolution operators, whose errors are nearly negligible, as validated by quantitative analysis. Supported by all these operators, ASGs can be adapted in existing SG-based applications to enhance their scalability in handling anisotropic effects. To demonstrate the accuracy and efficiency of ASGs in practice, we have applied ASGs in two important SG-based rendering applications and the experimental results clearly reveal the merits of ASGs.




    A Metric of Visual Comfort for Stereoscopic Motion
    ACM Transactions on Graphics, Vol. 32, No.6, Article No. 222, 2013 (SIGGRAPH ASIA 2013).   
    Song-Pei Du, Belen Masia, Shi-Min Hu and Diego Gutierrez

    We propose a novel metric of visual comfort for stereoscopic motion, based on a series of systematic perceptual experiments. We take into account disparity, motion in depth, motion on the screen plane, and the spatial frequency of luminance contrast. We further derive a comfort metric to predict the comfort of short stereoscopic videos. We validate it on both controlled scenes and real videos available on the internet, and show how all the factors we take into account, as well as their interactions, affect viewing comfort. Last, we propose various applications that can benefit from our comfort measurements and metric.




    Change Blindness Images (Spotlight paper)  
    IEEE Transactions on Visualization and Computer Graphics, Vol. 19, No.11, 1808-1819, 2013.  
    Li-Qian Ma, Kun Xu, Tien-Tsin Wong, Bi-Ye Jiang and Shi-Min Hu

    Change blindness refers to human inability to recognize large visual changes between images. In this paper, we present the first computational model of change blindness to quantify the degree of blindness between an image pair. It comprises a novel context-dependent saliency model and a measure of change, the former dependent on the site of the change, and the latter describing the amount of change. This saliency model in particular addresses the influence of background complexity, which plays an important role in the phenomenon of change blindness. Using the proposed computational model, we are able to synthesize changed images with desired degrees of blindness. User studies and comparisons to state-of-the-art saliency models demonstrate the effectiveness of our model.




    Flow Field Modulation
    IEEE Transactions on Visualization and Computer Graphics, Vol. 19, No.10, 1708-1719, 2013.  
    Bo Ren, Chen-Feng Li, Ming C. Lin, Theodore Kim, and Shi-Min Hu

    The nonlinear and non-stationary nature of Navier-Stokes equations produces fluid flows that can be noticeably different in appearance with subtle changes. In this paper we introduce a method that can analyze the intrinsic multiscale features of flow fields from a decomposition point of view, by using the Hilbert-Huang transform method on 3D fluid simulation. We show how this method can provide insights to flow styles and help modulate the fluid simulation with its internal physical information. We provide easy-toimplement algorithms that can be integrated with standard grid-based fluid simulation methods, and demonstrate how this approach can modulate the flow field and guide the simulation with different flow styles. The modulation is straightforward and relates directly to the flow¡¯s visual effect, with moderate computational overhead.




    Sketch2Scene: Sketch-based Co-retrieval and Co-placement of 3D Models
    ACM Transactions on Graphics,Vol. 32, No. 4, Article No. 123, SIGGRAPH 2013.    (click for project webpage)
    Kun Xu, Kang Chen, Hongbo Fu, Wei-Lun Sun, Shi-Min Hu

    This work presents Sketch2Scene, a framework that automatically turns a freehand sketch drawing inferring multiple scene objects to semantically valid, well arranged scenes of 3D models. Unlike the existing works on sketch-based search and composition of 3D models, which typically process individual sketched objects one by one, our technique performs co-retrieval and co-placement of 3D relevant models by jointly processing the sketched objects. This is enabled by summarizing functional and spatial relationships among models in a large collection of 3D scenes as structural groups. Our technique greatly reduces the amount of user intervention needed for sketch-based modeling of 3D scenes and fits well into the traditional production pipeline involving concept design followed by 3D modeling. A pilot study indicates that the 3D scenes automatically synthesized by our technique in seconds are comparable to those manually created by an artist in hours in terms of visual aesthetics.




    Cubic Mean Value Coordinates
    ACM Transactions on Graphics,Vol. 32, No. 4, Article No. 126, SIGGRAPH 2013.    (click for project webpage)
    Xian-Ying Li, Tao Ju and Shi-Min Hu

    We present a new method for interpolating both boundary values and gradients over a 2D polygonal domain. Despite various previous efforts, it remains challenging to define a closed-form interpolant that produces natural-looking functions while allowing flexible control of boundary constraints. Our method builds on an existing transfinite interpolant over a continuous domain, which in turn extends the classical mean value interpolant. We re-derive the interpolant from the mean value property of biharmonic functions, and prove that the interpolant indeed matches the gradient constraints when the boundary is piece-wise linear. We then give closed-form formula (as generalized barycentric coordinates) for boundary constraints represented as polynomials up to degree 3 (for values) and 1 (for normal derivatives) over each polygon edge. We demonstrate the flexibility and efficiency of our coordinates in two novel applications, smooth image deformation using curved cage networks and adaptive simplification of gradient meshes.




    Qualitative Organization of Collections of Shapes via Quartet Analysis
    ACM Transactions on Graphics,Vol. 32, No. 4, Article No. 71, SIGGRAPH 2013.    (click for project webpage)
    Shi-Sheng Huang, Ariel Shamir, Chao-Hui Shen, Hao Zhang, Alla Sheffer, Shi-Min Hu, Daniel Cohen-Or

    We present a method for organizing a heterogeneous collection of 3D shapes for overview and exploration. Instead of relying on quantitative distances, which may become unreliable between dissimilar shapes, we introduce a qualitative analysis which utilizes multiple distance measures but only in cases where the measures can be reliably compared. Our analysis is based on the notion of quartets, each defined by two pairs of shapes, where the shapes in each pair are close to each other, but far apart from the shapes in the other pair. Combining the information from many quartets computed across a shape collection using several distance measures, we create a hierarchical structure we call categorization tree of the shape collection. This tree satisfies the topological (qualitative) constraints imposed by the quartets creating an effective organization of the shapes. We present categorization trees computed on various collections of shapes and compare them to ground truth data from human categorization. We further introduce the concept of degree of separation chart for every shape in the collection and show the effectiveness of using it for interactive shapes exploration.




    Manipulating Perspective in Stereoscopic Images
    IEEE Transactions on Visualization and Computer Graphics, 2013, Vol. 19, No. 8, 1288-1297.   
    Song-Pei Du, Shi-Min Hu and Ralph R Martin

    Stereoscopic ("3D") devices and content relying on stereopsis are now widely available. However, traditional image editing techniques cannot be directly used to edit stereoscopic media, as extra constraints are needed to ensure consistent changes are made to both left and right images. This paper addresses the problem of manipulating perspective in stereoscopic pairs. We note that a straightforward approach based on depth recovery is unsatisfactory. Instead, our method relies on feature correspondences between stereoscopic image pairs. Given a new, user-specified perspective, we determine correspondence constraints under this perspective, and optimize a 2D warp for each image which preserves straight lines, and guarantees proper stereopsis relative to the new camera. Experiments demonstrate that our method generates new views with suitable stereoscopic output which correspond well to expected projections, for a wide range of specified perspective. Various advanced camera effects, such as dolly zoom and wide angle effects, can also be readily generated for stereoscopic image pairs using our method.




    Aesthetic Image Enhancement by Dependence-Aware Object Re-Composition
    IEEE Transactions on Multimedia, Vol. 15, No. 7, 1480-1490, 2013.  
    Fang-Lue Zhang, Miao Wang, Shi-Min Hu

    This paper proposes an image enhancement method to optimize photo composition, by rearranging foreground objects in the photo. To adjust objects¡¯ positions while keeping the original scene content, we first perform a novel structure dependence analysis on the image to obtain the dependencies between all background regions. To determine the optimal positions for foreground objects, we formulate an optimization problem based on widely used heuristics for aesthetically pleasing pictures. Semantic relations between foreground objects are also taken into account during optimization. The final output is produced by moving foreground objects, together with their dependent regions, to optimal positions. The results show that our approach can effectively optimize photos with single or multiple foreground objects without compromising the original photo content.




    Time-Line Editing of Objects in Video
    IEEE Transactions on Visualization and Computer Graphics, 2013, Vol. 19, No.7, 1218-1227.    
    Shao-Ping Lu, Song-Hai Zhang, Jin Wei, Shi-Min Hu and Ralph R Martin

    We present a video editing technique based on changing the time-lines of individual objects in video, which leaves them in their original places but puts them at different times. This allows the production of object-level slow motion effects, fast motion effects, or even time reversal. This is more flexible than simply applying such effects to whole frames, as new relationships between objects can be created. As we restrict object interactions to the same spatial locations as in the original video, our approach can produce high-quality results using only coarse matting of video objects. Coarse matting can be done efficiently using automatic video object segmentation, avoiding tedious manual matting. To design the output, the user interactively indicates the desired new life-spans of objects, and may also change the overall running time of the video. Our method rearranges the time-lines of objects in the video whilst applying appropriate object interaction constraints. We demonstrate that, while this editing technique is somewhat restrictive, it still allows many interesting results.




    Motion-Aware Gradient Domain Video Composition
    IEEE Transactions on Image Processing, 2013, Vol. 22, No.7, 2532 - 2544.  
    Tao Chen, Jun-Yan Zhu, Ariel Shamir, and Shi-Min Hu

    For images, gradient domain composition methods like Poisson blending offer practical solutions for uncertain object boundaries and differences in illumination conditions. However, adapting Poisson image blending to video faces new challenges due to the added temporal dimension. In video, the human eye is sensitive to small changes in blending boundaries across frames, and slight differences in motions of the source patch and target video. We present a novel video blending approach that tackles these problems by merging the gradient of source and target videos and optimizing a consistent blending boundary based on a user provided blending trimap for the source video. Our approach extends mean-value coordinates interpolation to support hybrid blending with a dynamic boundary while maintaining interactive performance. We also provide a user interface and source object positioning method that can efficiently deal with complex video sequences beyond the capabilities of alpha blending.




    Internet visual media processing: a survey with graphics and vision applications
    The Visual Computer, 2013, Vol. 29, No.5, 393-405.  
    Shi-Min Hu, Tao Chen, Kun Xu, Ming-Ming Cheng, Ralph R. Martin

    In recent years, the computer graphics and computer vision communities have devoted significant attention to research based on Internet visual media resources. The huge number of images and videos continually being uploaded by millions of people have stimulated a variety of visual media creation and editing applications, while also posing serious challenges of retrieval, organization, and utilization. This article surveys recent research as regards processing of large collections of images and video, including work on analysis, manipulation, and synthesis. It discusses the problems involved, and suggests possible future directions in this emerging research area.




    Mixed-Domain Edge-Aware Image Manipulation
    IEEE Transactions on Image Processing, 2013, Vol. 22, No. 5, 1915 - 1925.  
    Xian-Ying Li, Yan Gu, Shi-Min Hu, and Ralph R. Martin

    This paper gives a novel approach to edge-aware image manipulation. Our method processes a Gaussian pyramid from coarse to fine, and at each level, we apply a nonlinear filter bank to the neighborhood of each pixel. Outputs of these spatially-varying filters are merged using global optimization, and this optimization problem is solved using an explicit mixeddomain (real space and DCT transform space) solution, which is efficient, accurate, and easy-to-implement. We demonstrate applications of our method to a set of problems including detail and contrast manipulation, HDR compression, non-photorealistic rendering, and haze removal.




    PoseShop: A Human Image Database and Personalized Content Synthesis
    IEEE Transactions on Visualization and Computer Graphics, 2013, Vol.19, No. 5, 824-837.   
    Tao Chen, Ping Tan, Li-Qian Ma, Ming-Ming Cheng, Ariel Shamir and Shi-Min Hu

    We present a human image database collected from online images where human figures are segmented out of their background. The images are organized based on action semantic, clothes attributes and indexed by the shape of their poses. The database is built by downloading, analyzing, and filtering over 3 million human images from the Internet and can be queried using either silhouette sketch or a skeleton to find a given pose. We demonstrate the application of this database for multi-frame personalized content synthesis in the form of comic-strips, where the main character is the user or his/her friends. We address the two challenges of such synthesis, namely personalization and consistency over a set of frames, by introducing head swapping and clothes swapping techniques. We also demonstrate an action correlation analysis application to show the usefulness of the database for vision application.




    A Data-Driven Approach to Realistic Shape Morphing
    Computer Graphics Forum, (Eurographics 2013), Vol. 32, No. 2, 449-457, 2013   

    Lin Gao, Yu-Kun Lai, Qixing Huang and Shi-Min Hu

    This paper proposes a novel data-driven approach for shape morphing. Given a database with various models belonging to the same category, we treat them as data samples in the plausible deformation space. These models are then clustered to form local shape spaces of plausible deformations. We use a simple metric to reasonably represent the closeness between pairs of models. Given source and target models, the morphing problem is casted as a global optimization problem of finding a minimal distance path within the local shape spaces connecting these models. Under the guidance of intermediate models in the path, an extended as-rigid-as-possible interpolation is used to produce the final morphing. By exploiting the knowledge of plausible models, our approach produces realistic morphing for challenging cases as demonstrated by various examples in the paper.



    Efficient Synthesis of Gradient Solid Textures
    Graphical Models, Vol. 75, No. 3, 104-117, 2013   
    (An earlier version has been presented in Computaional Visual Media 2013, Beijing, and received Best paper Award)
    Guo-Xin Zhang, Yu-Kun Lai and Shi-Min Hu

    Solid textures require large storage and are computationally expensive to synthesize. In this paper, we propose a novel solid representation called gradient solids to compactly represent solid textures, including a tricubic interpolation scheme of colors and gradients for smooth variation and a region-based approach for representing sharp boundaries. We further propose a novel approach based on this to directly synthesize gradient solid textures from exemplars. Compared to existing methods, our approach avoids the expensive step of synthesizing the complete solid textures at voxel level and produces optimized solid textures using our representation. This avoids significant amount of unnecessary computation and storage involved in the voxel-level synthesis while producing solid textures with comparable quality to the state of the art. The algorithm is much faster than existing approaches for solid texture synthesis and makes it feasible to synthesize high-resolution solid textures in full. Our compact representation also supports efficient novel applications such as instant editing propagation on full solids.



    Semi-Regular Solid Texturing from 2D Image Exemplars
    IEEE Transactions on Visualization and Computer Graphics, 2013, Vol. 19, No. 3, 460-469.    
    Song-Pei Du, Shi-Min Hu and Ralph R.Martin

    Solid textures, comprising 3D particles embedded in a matrix in a regular or semi-regular pattern, are common in natural and man-made materials, such as brickwork, stone walls, plant cells in a leaf, etc. We present a novel technique for synthesizing such textures, starting from 2D image exemplars which provide cross-sections of the desired volume texture. The shapes and colors of typical particles embedded in the structure are estimated from their 2D cross-sections. Particle positions in the texture images are also used to guide spatial placement of the 3D particles during synthesis of the 3D texture. Our experiments demonstrate that our algorithm can produce higher-quality structures than previous approaches; they are both compatible with the input images, and have a plausible 3D nature.




    Poisson Coordinates
    IEEE Transactions on Visualization and Computer Graphics, 2013, Vol.19, No. 2, 344-352.    
    Xian-Ying Li and Shi-Min Hu,

    Harmonic functions are the critical points of a Dirichlet energy functional, the linear projections of conformal maps. They play an important role in computer graphics, particularly for gradient-domain image processing and shape-preserving geometric computation. We propose Poisson coordinates, a novel transfinite interpolation scheme based on the Poisson integral formula, as a rapid way to estimate a harmonic function on a certain domain with desired boundary values. Poisson coordinates are an extension of the Mean Value coordinates (MVCs) which inherit their linear precision, smoothness, and kernel positivity. We give explicit formulae for Poisson coordinates in both continuous and 2D discrete forms. Superior to MVCs, Poisson coordinates are proved to be pseudoharmonic (i.e., they reproduce harmonic functions on n-dimensional balls). Our experimental results show that Poisson coordinates have lower Dirichlet energies than MVCs on a number of typical 2D domains (particularly convex domains). As well as presenting a formula, our approach provides useful insights for further studies on coordinates-based interpolation and fast estimation of harmonic functions.




    View-Dependent Multiscale Fluid Simulation
    IEEE Transactions on Visualization and Computer Graphics, 2013, Vol. 19, No. 2, 178-188.   
    Yue Gao, Chen-Feng Li, Bo Ren and Shi-Min Hu

    Fluid motions are highly nonlinear and non-stationary, with turbulence occurring and developing at different length and time scales. In real-life observations, the multiscale flow generates different visual impacts depending on the distance to the viewer. We propose a new fluid simulation framework that adaptively allocates computational resources according to the human visual perception. First, a 3D empirical model decomposition scheme is developed to obtain the velocity spectrum of the turbulent flow. Then, depending on the distance to the viewer, the fluid domain is divided into a sequence of nested simulation partitions. Finally, the multiscale fluid motions revealed in the velocity spectrum are distributed non-uniformly to these view-dependent partitions, and the mixed velocity fields defined on different partitions are solved separately using different grid sizes and time steps. The fluid flow is solved at different spatial-temporal resolutions, such that higher-frequency motions closer to the viewer are solved at higher resolutions and vice versa. The new simulator better utilizes the computing power, producing visually plausible results with realistic fine-scale details in a more efficient way. It is particularly suitable for large scenes with the viewer inside the fluid domain. Also, as high-frequency fluid motions are distinguished from low-frequency motions in the simulation, the numerical dissipation is effectively reduced.



    2012





    Structure Recovery by Part Assembly
    ACM Transactions on Graphics, Vol. 31, No. 6, Article No. 180, ACM SIGGRAPH ASIA 2012.    (click for project webpage, data set is available)
    Chao-Hui Shen, Hongbo Fu, Kang Chen and Shi-Min Hu

    This work presents a technique that allows quick conversion of acquired low-quality data from consumer-level scanning devices to high-quality 3D models with labeled semantic parts and meanwhile their assembly reasonably close to the underlying geometry. This is achieved by a novel structure recovery approach that is essentially local to global and bottom up, enabling the creation of new structures by assembling existing labeled parts with respect to the acquired data. We demonstrate that using only a small-scale shape repository, our part assembly approach is able to faithfully recover a variety of high-level structures from only a single-view scan of man-made objects acquired by the Kinect system, containing a highly noisy, incomplete 3D point cloud and a corresponding RGB image.




    An Optimization Approach for Extracting and Encoding Consistent Maps in a Shape Collection
    ACM Transactions on Graphics, Vol. 31, No. 6, Article No. 167, ACM SIGGRAPH ASIA 2012.   
    Qi-xing Huang, Guoxin Zhang, Lin Gao, Shi-Min Hu, Adrian Butscher and Leonidas Guibas

    We introduce a novel diffusion-based approach for computing high quality point-to-point maps among a collection of shapes so that several desirable properties are satisfied. The proposed approach takes as input a sparse set of initial maps between pairs of shapes (sufficient to connect the model graph) and implicitly builds a new set of pointwise maps between all pairs of shapes which aim to (1) align with the initial maps, (2) map neighboring points to neighboring points, and (3) provide cycle-consistency, so that map compositions along cycles approximate the identity map. Maps among subsets of the shapes that admit nearly perfect loop closure are highly redundant and can be compactly represented by maps from a single base shape to other shapes. Our algorithm extracts such a set of base shapes so that every other shape is ¡°covered¡± by at least one of the base shapes.




    ImageAdmixture: Putting Together Dissimilar Objects from Groups
    IEEE Transactions on Visualization and Computer Graphics, 2012, Vol. 18, No.11, 1849-1857.    
    Fang-Lue Zhang, Ming-Ming Cheng, Jiaya Jia, Shi-Min Hu

    We present a semi-automatic image editing framework dedicated to individual structured object replacement from groups. The major technical difficulty is element separation with irregular spatial distribution, hampering previous texture and image synthesis methods from easily producing visually compelling results. Our method uses the object-level operations and finds grouped elements based on appearance similarity and curvilinear features. This framework enables a number of image editing applications, including natural image mixing, structure preserving appearance transfer, and texture mixing.




    Fisheye Video Correction
    IEEE Transactions on Visualization and Computer Graphics, 2012, Vol. 18, No.10, 1771-1783.    
    Jin Wei, Chen-Feng Li, Shi-Min Hu, Ralph Martin, and Chiew-Lan Tai

    Various types of video are captured with fisheye lenses, particularly surveillance video, due to their ability to capture a wide field of view. However, distortion changes as objects in the scene move, making fisheye video difficult to interpret and uncomfortable to watch. Current still fisheye image correction methods are either limited to small angles of view, or are strongly content-dependent, and therefore not suitable for processing video streams. We present a novel scheme for fisheye video correction, which minimizes time-varying distortions and preserves salient content features in a coherent manner. Our optimization process is controlled by user annotation, and includes a comprehensive set of measures addressing different aspects of natural scene appearance. These terms are all formulated in quadratic form, leading to a quadratic programming problem which can be solved in a closed form using a sparse linear system. We illustrate our method with a range of examples, demonstrating coherent natural-looking video output in which the visual quality of individual frames is comparable to state-of-the-art methods for still fisheye photograph correction.




    Interactive Images: Proxy-based Scene Understanding for Smart Manipulation
    ACM Transactions on Graphics (ACM SIGGRAPH),2012,Vol. 31, No. 4, article number 99,    
    Youyi Zheng, Xiang Chen, Ming-Ming Cheng, Kun Zhou, Shi-Min Hu, Niloy J. Mitra

    Images are static and lack important depth information of underlying 3D scenes. We introduce interactive images in the context of man-made environments wherein objects are simple, regular, share various non-local relations (e.g., coplanarity, repetitions, etc.), and are often repeated. We present an interactive framework to create a partial scene reconstruction based on cuboid-proxies using minimal user interaction. This enables a range of intuitive image edits mimicking real-world behavior, which are otherwise difficult to achieve. Effectively, the user simply provides high-level semantic hints, while our system ensures plausible operations by conforming to the extracted non-local relations. We demonstrate our system on a range of real-world images and validate the plausibility of the results using a user study.



    Other publications in 2012

    1. Li-Qian Ma and Kun XU, Efficient antialiased edit propagation for images and videos, Computer & Graphics, Vol. 36, No. 8, 1005-1012.   
    2. Yong-Liang Yang and Chao-Hui Shen, Multi-Scale Salient Features for Analyzing 3D Shapes, Journal of Computer Science and technology, Vol. 27, No. 6, 1092-1099, 2012.   
    3. Long Zeng, Yong-Jin Liu, Sang-Hun Lee, Ming-Fai Yuen, Q-Complex: efficient non-manifold boundary representation with inclusion topology, Computer-Aided Design, Vol. 44, No. 11, 1115-1126, 2012.    
    4. Ling-Qi Yan, Yahan Zhou, Kun Xu and Rui Wang,Accurate Translucent Material Rendering under Spherical Gaussian Lights, Computer Graphics Forum, Vol. 31, No. 7, 2267-2276, 2012.   
    5. Long Zeng, Yong-Jin Liu, Ming Chen, Ming-Fai Yuen, Least squares quasi-developable mesh approximation, Computer Aided Geometric Design, Vol. 29, No. 7, 565-578, 2012.    
    6. Chen Goldberg, Tao Chen, Fang-Lue Zhang, Ariel Shamir, Shi-Min Hu, Data-Driven Object Manipulation in Images, Computer Graphics Forum, Vol. 31, No. 2, 265-274, 2012 (Eurographics 2012).   
    7. Tao Chen, Aidong Lu and Shi-Min Hu, Visual storylines: Semantic visualization of movie sequence, Computer & Graphics, Vol. 36, No. 4, 241-249, 2012.   
    8. Cui-Xia Ma, Yong-Jin Liu, Hong-An Wang, Dong-Xing Teng, Guo-Zhong Dai, Sketch-based Annotation and Visualization in Video Authoring, IEEE Transactions on Multimedia, Vol. 14, No. 4, 1153-1165, 2012.    
    9. Yong-Jin Liu, Yi-Fu Zheng, Lu Lv, Yu-Ming Xuan, Xiao-Lan Fu, 3D Model Retrieval based on Color+Geometry Signatures, The Visual Computer, Vol. 28, No. 1, 75-86, 2012.    



    2011





    Interactive Hair Rendering and Appearance Editing under Environment Lighting
    ACM Transactions on Graphics, Vol. 30, No. 6, ACM SIGGRAPH ASIA 2011.    
    Kun Xu, Li-Qian Ma, Bo Ren, Rui Wang, Shi-Min Hu

    We present an interactive algorithm for hair rendering and appearance editing under complex environment lighting represented as spherical radial basis functions (SRBFs). Our main contribution is to derive a compact 1D circular Gaussian representation that can accurately model the hair scattering function introduced by [Marschner et al. 2003]. The primary benefit of this representation is that it enables us to evaluate, at run-time, closed-form integrals of the scattering function with each SRBF light, resulting in efficient computation of both single and multiple scatterings. In contrast to previous work, our algorithm computes the rendering integrals entirely on the fly and does not depend on expensive precomputation. Thus we allow the user to dynamically change the hair scattering parameters, which can vary spatially. Analyses show that our 1D circular Gaussian representation is both accurate and concise. In addition, our algorithm incorporates the eccentricity of the hair. We implement our algorithm on the GPU, achieving interactive hair rendering and simultaneous appearance editing under complex environment maps for the first time.




    Adaptive Partitioning of Urban Facades
    ACM Transactions on Graphics, Vol. 30, No. 6, ACM SIGGRAPH ASIA 2011.    
     
    Chao-Hui Shen, Shi-Sheng Huang, Hongbo Fu, Shi-Min Hu

    Automatically discovering high-level facade structures in unorganized 3D point clouds of urban scenes is crucial for applications like digitalization of real cities. However, this problem is challenging due to poor-quality input data, contaminated with severe missing areas, noise and outliers. This work introduces the concept of adaptive partitioning to automatically derive a flexible and hierarchical representation of 3D urban facades. Our key observation is that urban facades are largely governed by concatenated and/or interlaced grids. Hence, unlike previous automatic facade analysis works which are typically restricted to globally rectilinear grids, we propose to automatically partition the facade in an adaptive manner, in which the splitting direction, the number and location of splitting planes are all adaptively determined. Such an adaptive partition operation is performed recursively to generate a hierarchical representation of the facade. We show that the concept of adaptive partitioning is also applicable to flexible and robust analysis of image facades. We evaluate our method on a dozen of LiDAR scans of various complexity and styles, and the image facades from the eTRIMS database and the Ecole Centrale Paris database. A series of applications that benefit from our approach are also demonstrated.




    Online Video Stream Abstraction and Stylization
    IEEE Transactions on Multimedia, vol.13, no.6, pp.1286-1294, Dec. 2011    
    Song-Hai Zhang, Xian-Ying Li, Shi-Min Hu, and Ralph R. Martin

    This paper gives an automatic method for online video stream abstraction, producing a temporally coherent output video stream, in the style with large regions of constant color and highlighted bold edges. Our system includes two novel components. Firstly, to provide coherent and simplified output, we segment frames, and use optical flow to propagate segmentation information from frame to frame; an error control strategy is used to help ensure that the propagated information is reliable. Secondly, to achieve coherent and attractive coloring of the output, we use a color scheme replacement algorithm specifically designed for an online video stream. We demonstrate real-time performance for CIF videos, allowing our approach to be used for live communication and other related applications. Index Terms¡ªAbstraction, color scheme replacement, optical flow, segmentation, temporal coherence, video stream.




    A Geometric Study of V-style Pop-ups: Theories and Algorithms
    ACM Transactions on Graphics 2011, Vol. 30, No. 4, ACM SIGGRAPH 2011    
    Xian-Ying Li, Tao Ju, Yan Gu, Shi-Min Hu

    Pop-up books are a fascinating form of paper art with intriguing geometric properties. In this paper, we present a systematic study of a simple but common class of pop-ups consisting of patches falling into four parallel groups, which we call v-style pop-ups. We give sufficient conditions for a v-style paper structure to be pop-uppable. That is, it can be closed flat while maintaining the rigidity of the patches, the closing and opening do not need extra force besides holding two patches and are free of intersections, and the closed paper is contained within the page border. These conditions allow us to identify novel mechanisms for making pop-ups. Based on the theory and mechanisms, we developed an interactive tool for designing v-style pop-ups and an automated construction algorithm from a given geometry, both of which guaranteeing the popuppability of the results.




    Global Contrast based Salient Region Detection
    IEEE CVPR, p. 409-416, 2011,   [bib]
    Ming-Ming Cheng, Guo-Xin Zhang, Niloy J. Mitra, Xiaolei Huang, Shi-Min Hu

    Reliable estimation of visual saliency allows appropriate processing of images without prior knowledge of their content, and thus remains an important step in many computer vision tasks including image segmentation, object recognition, and adaptive compression. We propose a regional contrast based saliency extraction algorithm, which simultaneously evaluates global contrast differences and spatial coherence. The proposed algorithm is simple, efficient, and yields full resolution saliency maps. Our algorithm consistently outperformed existing saliency detection methods, yielding higher precision and better recall rates, when evaluated using one of the largest publicly available data sets. We also demonstrate how the extracted saliency map can be used to create high quality segmentation masks for subsequent image processing.




    Construction of Iso-contours, Bisectors and Voronoi Diagrams on Triangulated Surfaces
    IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 33, No. 8, 1502-1517, 2011  
    Yong-Jin Liu, Zhan-Qing Chen, Kai Tang

    In the research of computer vision and machine perception, three-dimensional objects are usually represented by 2-manifold triangular meshes M. In this paper, we present practical and efficient algorithms to construct iso-contours, bisectors and Voronoi diagrams of point sites onM, based on an exact geodesic metric. Compared to Euclidean metric spaces, the Voronoi diagrams on M exhibit many special properties that fail all the existing Euclidean Voronoi algorithms. To provide practical algorithms for constructing geodesic-metric-based Voronoi diagrams on M, this paper studies the analytic structure of iso-contours, bisectors and Voronoi diagrams on M. After a necessary preprocessing of model M, practical algorithms are proposed for quickly obtaining full information about iso-contours, bisectors and Voronoi diagrams on M. The complexity of the construction algorithms is also analyzed. Finally three interesting applications, surface sampling and reconstruction, 3D skeleton extraction and point pattern analysis are presented that show the potential power of the proposed algorithms in pattern analysis.




    Image Retargeting Quality Assessment
    Computer Graphics Forum, 2011, Vol. 30, No. 2, Eurographics 2011,   
    Yong-Jin Liu, Xi Luo, Yu-Ming Xuan, Wen-Feng Chen, Xiao-Lan Fu

    Content-aware image retargeting is a technique that can flexibly display images with different aspect ratios and simultaneously preserve salient regions in images. Recently many image retargeting techniques have been proposed. To compare image quality by different retargeting methods fast and reliably, an objective metric simulating the human vision system (HVS) is presented in this paper. Different from traditional objective assessment methods that work in bottom-up manner (i.e., assembling pixel-level features in a local-to-global way), in this paper we propose to use a reverse order (top-down manner) that organizes image features from global to local viewpoints, leading to a new objective assessment metric for retargeted images. A scale-space matching method is designed to facilitate extraction of global geometric structures from retargeted images. By traversing the scale space from coarse to fine levels, local pixel correspondence is also established. The objective assessment metric is then based on both global geometric structures and local pixel correspondence. To evaluate color images, CIE Lab color space is utilized. Experimental results are obtained to measure the performance of objective assessments with the proposed metric. The results show good consistency between the proposed objective metric and subjective assessment by human observers.




    Connectedness of Random Walk Segmentation
    IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011. 33(1): p. 200 -202..
    Ming-Ming Cheng, Guo-Xin Zhang

    Connectedness of random walk segmentation is examined, and novel properties are discovered, by considering electrical circuits equivalent to random walks. A theoretical analysis shows that earlier conclusions concerning connectedness of random walk segmentation results are incorrect, and counterexamples are demonstrated.



    Other publications in 2011

    1. Wen-Qi Zhang, Yong-Jin Liu, Approximating the Longest Paths in Grid Graphs, Theoretical Computer Science, 2011, Vol. 412, No. 39, 5340-5350.    
    2. Yong-Jin Liu, Kai Tang, Wen-Yong Gong, Tie-Ru Wu, Industrial Design using Interpolatory Discrete Developable Surfaces, Computer-Aided Design, 2011, Vol. 43, No. 9, 1089-1098, 2011.    
    3. Guo-Xin Zhang, Song-Pei Du, Yu-Kun Lai, Tianyun Ni, Shi-Min Hu, Sketch Guided Solid Texturing, Graphics Models, 2011, Vol. 73, No.3, 59-73.    
    4. Cui-Xia Ma, Yong-Jin Liu, Hai-Yan Yang, Dong-Xing Teng, Hong-An Wang, Guo-Zhong Dai, KnitSketch: A Sketch Pad for Conceptual Design of 2D Garment Patterns, IEEE Transactions on Automation Science and Engineering, 2011, Vol. 8, No. 2,    
    5. Zhe Bian, Shi-Min Hu, Preserving detailed features in digital bas-relief making, Computer Aided Geometric Design, Vol. 28, No. 4, 245-256, 2011.    
    6. Yong-Jin Liu, Cui-Xia Ma, Dong-Liang Zhang, Easytoy: a plush toy design system using editable sketch curves, IEEE Computer Graphics & Applications, 2011, Vol. 31, No. 2,    
    7. Shao-Ping Lu and Song-Hai Zhang, Saliency-Based Fidelity Adaptation Preprocessing for Video Coding, Journal of Computer Science and Technology, 2011, Vol. 26, No. 1, 195-202    



    2010





    Instant Propagation of Sparse Edits on Images and Videos
    Computer Graphics Forum, Special issue of Pacific Graphics 2010, Vol. 29, No. 7, 2049-2054
    Yong Li, Tao Ju, Shi-Min Hu

    The ability to quickly and intuitively edit digital contents has become increasingly important in our everyday life. We propose a novel method for propagating a sparse set of user edits (e.g., changes in color, brightness, contrast, etc.) expressed as casual strokes to nearby regions in an image or video with similar appearances. Existing methods for edit propagation are typically based on optimization, whose computational cost can be prohibitive for large inputs. We re-formulate propagation as a function interpolation problem in a high-dimensional space, which we solve very efficiently using radial basis functions. While simple to implement, our method significantly improves the speed and space cost of existing methods, and provides instant feedback of propagation results even on large images and videos.




    Popup: Automatic Paper Architectures from 3D Models
    ACM Transactions on Graphics 2010, Vol. 29, No. 4, ACM SIGGRAPH 2010  
    Xian-Ying Li, Chao-Hui Shen, Shi-Sheng Huang, Tao Ju, Shi-Min Hu

    Paper architectures are 3D paper buildings created by folding and cutting. The creation process of paper architecture is often labor intensive and highly skill-demanding, even with the aid of existing computer-aided design tools. We propose an automatic algorithm for generating paper architectures given a user-specified 3D model. The algorithm is grounded on geometric formulation of planar layout for paper architectures that can be popped-up in a rigid and stable manner, and sufficient conditions for a 3D surface to be popped up from such a planar layout. Based on these conditions, our algorithm computes a class of paper architectures containing two sets of parallel patches that approximate the input geometry while guaranteed to be physically realizable. The method is demonstrated on a number of architectural examples, and physically engineered results are presented.




    RepFinder: Finding Approximately Repeated Scene Elements for Image Editing
    ACM Transactions on Graphics 2010, Vol. 29, No. 4, ACM SIGGRAPH 2010    
    Ming-Ming Cheng, Fang-Lue Zhang, Niloy J. Mitra, Xiaolei Huang, Shi-Min Hu

    Repeated elements are ubiquitous and abundant in both manmade and natural scenes. Editing such images while preserving the repetitions and their relations is nontrivial due to overlap, missing parts, deformation between instances, illumination variation, etc. Manually enforcing such relations is laborious and error prone. We propose a novel framework where simple user input in the form of scribbles are used to guide detection and extraction of such repeated elements. Our detection process is based on a novel boundary band method, and robustly extracts the repetitions along with their mutual depth relations. We then use topological sorting to establish a partial depth ordering of overlapping repeated instances. Missing parts on occluded instances are completed using information from other instances. The extracted repeated instances can then be seamlessly edited and manipulated for a variety of high level tasks that are otherwise difficult to perform. We demonstrate the versatility of our framework on a large set of inputs of varying complexity, showing applications to image rearrangement, edit transfer, deformation propagation, and instance replacement.




    Metric-Driven RoSy Fields Design
    IEEE Transactions on Visualization and Computer Graphics, 2010, Vol. 16, No. 1, 95-108.   
    Yu-Kun Lai, Miao Jin, Xuexiang Xie, Ying He, Jonathan Palacios, Eugene Zhang, Shi-Min Hu and Xianfeng David Gu

    This work introduces a rigorous and practical approach for automatic N-RoSy field design on arbitrary surfaces with user defined field topologies. The user has full control of the number, positions and indices of the singularities, the turning numbers of the loops, and is able to edit the field interactively. We formulate N-RoSy field construction as designing a Riemannian metric, such that the holonomy along any loop is compatible with the local symmetry of N-RoSy fields. We prove the compatibility condition using discrete parallel transport. The complexity of N-RoSy field design is caused by curvatures. In our work, we propose to simplify the Riemannian metric to make it flat almost everywhere. This approach greatly simplifies the process and improves the flexibility, such that, it can design N-RoSy fields with single singularity, and mixed-RoSy fields. To demonstrate the effectiveness of our approach, we apply our design system to pen-and-ink sketching and geometry remeshing.



    Other publications in 2010

    1. Yong-Jin Liu, Dong-Liang Zhang, Matthew Ming-Fai Yuen, A survey on CAD methods in garment design, Computers in Industry, 2010, Vol. 61, No. 6, 576-593    
    2. Yong-Jin Liu, Kam-Lung Lai, Gang Dai, Ming-Fai Yuen, A semantic feature model in concurrent engineering, IEEE Transactions on Automation Science and Engineering, 2010, Vol. 7, No. 3, 659-665    
    3. Yu-Ping Wang, Shi-Min Hu, Optimization approach for 3D model watermarking by linear binary programming, Computer Aided Design, 2010, Vol. 27, No. 5, 395-404    
    4. Yong-Jin Liu, Wen-Qi Zhang, Kai Tang, Some notes on maximal arc intersection of spherical polygons: its NP-hardness and approximation algorithms, The Visual Computer, 2010, Vol. 26, No. 4, 287-292    
    5. Chao-Hui Shen, Guo-Xin Zhang, Yu-Kun Lai, Shi-Min Hu, Harmonic Field Based Volume Model Construction from Triangle Soup, Journal of Computer Science and Technology, 2010, Vol. 25, No. 3, 562-571    
    6. Jin Wei and Yu Lou, Feature Preserving Mesh Simplification Using Feature Sensitive Metric, Journal of Computer Science and Technology, 2010, Vol. 25, No. 3, 595-605    
    7. Yu-Kun Lai, Leif Kobbelt and Shi-Min Hu, Feature aligned quad dominant remeshing using iterative local updates, Computer Aided Design, 2010, Vol. 42, No. 2, 109-117
    (An earlier version has been presented in ACM Symosium on Solid and Physical Modeling, June 2-4, 2008)




    2009





    Sketch2Photo: Internet Image Montage
    ACM Transactions on Graphics, Vol. 28, No. 5, Article No. 124, ACM SIGGRAPH ASIA 2009   
    Tao Chen, Ming-Ming Cheng, Ping Tan, Ariel Shamir, Shi-Min Hu
    The paper was selected as one of the top 10 most innovative and promising worldwide initiatives of 2009 by the Netexplorateur jury.

    We present a system that composes a realistic picture from a user provided sketch with text labels. The composed picture is generated by seamlessly stitching several photographs automatically searched from internet according to the sketch and its text labels. While on line image search generates noisy results, our system can automat ically select suitable photographs to generate a high quality com position. To achieve this, we first design a filtering scheme to exclude undesirable images from searched results. Then we propose a novel image blending algorithm for seamless image composition. Our blending algorithm returns a numeric score for each blending, which is used to optimize the combination of searched images. Several vivid results are generated in the experiments. We also perform a user study to demonstrate the advantages of our system.




    Efficient Affinity-based Edit Propagation using K-D Tree
    ACM Transactions on Graphics, Vol. 28, No. 5, Article No. 118, ACM SIGGRAPH ASIA 2009    
    Kun Xu, Yong Li, Tao Ju, Shi-Min Hu, Tian-Qiang Liu

    Image/video editing by strokes has become increasingly popular due to the ease of interaction. Propagating the user inputs to the rest of the image/video, however, is often time and memory consuming especially for large data. We propose here an efficient scheme that allows affinity-based edit propagation to be computed on data containing tens of millions of pixels at interactive rate (in matter of seconds). The key in our scheme is a novel means for approximately solving the optimization problem involved in edit propagation, using adaptive clustering in a high-dimensional, affinity space. Our approximation significantly reduces the cost of existing affinitybased propagation methods while maintaining visual fidelity, and enables interactive stroke-based editing even on high resolution images and long video sequences using commodity computers.




    Simulating Gaseous Fluids with Low and High Speeds
    Computer Graphics Forum, Special issue of Pacific Graphics 2009, Vol. 28, No. 7, 1845-1852    
    Yue Gao, Chen-Feng Li, Shi-Min Hu, Brian A. Barsky

    Gaseous fluids may move slowly, as smoke does, or at high speed, such as occurs with explosions. High-speed gas flow is always accompanied by low-speed gas flow, which produces rich visual details in the fluid motion. Realistic visualization involves a complex dynamic flow field with both low and high speed fluid behavior. In computer graphics, algorithms to simulate gaseous fluids address either the low speed case or the high speed case, but no algorithm handles both efficiently. With the aim of providing visually pleasing results, we present a hybrid algorithm that efficiently captures the essential physics of both low- and high-speed gaseous fluids. We model the low speed gaseous fluids by a grid approach and use a particle approach for the high speed gaseous fluids. In addition, we propose a physically sound method to connect the particle model to the grid model. By exploiting complementary strengths and avoiding weaknesses of the grid and particle approaches, we produce some animation examples and analyze their computational performance to demonstrate the effectiveness of the new hybrid method.




    Edit Propagation on Bidirectional Texture Functions
    Computer Graphics Forum, Special issue of Pacific Graphics 2009, Vol. 28, No. 7, 1871-1877    
    Kun Xu, Jiaping Wang, Xin Tong, Shi-Min Hu, Baining Guo

    We propose an efficient method for editing bidirectional texture functions (BTFs) based on edit propagation scheme. In our approach, users specify sparse edits on a certain slice of BTF. An edit propagation scheme is then applied to propagate edits to the whole BTF data. The consistency of the BTF data is maintained by propagating similar edits to points with similar underlying geometry/reflectance. For this purpose, we propose to use view independent features including normals and reflectance features reconstructed from each view to guide the propagation process. We also propose an adaptive sampling scheme for speeding up the propagation process. Since our method needn't any accurate geometry and reflectance information, it allows users to edit complex BTFs with interactive feedback.




    A Shape-Preserving Approach to Image Resizing
    Computer Graphics Forum, Special issue of Pacific Graphics 2009, Vol. 28, No. 7, 1897-1906  
    Guo-Xin Zhang, Ming-Ming Cheng, Shi-Min Hu, Ralph R. Martin

    We present a novel image resizing method which attempts to ensure that important local regions undergo a geometric similarity transformation, and at the same time, to preserve image edge structure. To accomplish this, we define handles to describe both local regions and image edges, and assign a weight for each handle based on an importance map for the source image. Inspired by conformal energy, which is widely used in geometry processing, we construct a novel quadratic distortion energy to measure the shape distortion for each handle. The resizing result is obtained by minimizing the weighted sum of the quadratic distortion energies of all handles. Compared to previous methods, our method allows distortion to be diffused better in all directions, and important image edges are well-preserved. The method is efficient, and offers a closed form solution.




    Generalized Discrete Ricci Flow
    Computer Graphics Forum, Special issue of Pacific Graphics 2009, Vol. 28, No. 7, 2005-2014    
    Yong-Liang Yang, Ren Guo, Feng Luo, Shi-Min Hu, Xianfeng Gu

    Surface Ricci flow is a powerful tool to design Riemannian metrics by user defined curvatures. Discrete surface Ricci flow has been broadly applied for surface parameterization, shape analysis, and computational topology. Conventional discrete Ricci flow has limitations. For meshes with low quality triangulations, if high conformality is required, the flow may get stuck at the local optimum of the Ricci energy. If convergence to the global optimum is enforced, the conformality may be sacrificed. This work introduces a novel method to generalize the traditional discrete Ricci flow. The generalized Ricci flow is more flexible, more robust and conformal for meshes with low quality triangulations. Conventional method is based on circle packing, which requires two circles on an edge intersect each other at an acute angle. Generalized method allows the two circles either intersect or separate from each other. This greatly improves the flexibility and robustness of the method. Furthermore, the generalized Ricci flow preserves the convexity of the Ricci energy, this ensures the uniqueness of the global optimum. Therefore the algorithm won't get stuck at the local optimum. Generalized discrete Ricci flow algorithms are explained in details for triangle meshes with both Euclidean and hyperbolic background geometries. Its advantages are demonstrated by theoretic proofs and practical applications in graphics, especially surface parameterization.




    Automatic and Topology-Preserving Gradient Mesh Generation for Image Vectorization
    ACM Transactions on Graphics, Vol. 28, No. 3, article 85, ACM SIGGRAPH 2009    
    Yu-Kun Lai, Shi-Min Hu, Ralph R. Martin

    Gradient mesh vector graphics representation, used in commercial software, is a regular grid with specified position and color, and their gradients, at each grid point. Gradient meshes can compactly represent smoothly changing data, and are typically used for single objects. This paper advances the state of the art for gradient meshes in several significant ways. Firstly, we introduce a topology-preserving gradient mesh representation which allows an arbitrary number of holes. This is important, as objects in images often have holes, either due to occlusion, or their 3D structure. Secondly, our algorithm uses the concept of image manifolds, adapting surface parameterization and fitting techniques to generate the gradient mesh in a fully automatic manner. Existing gradient-mesh algorithms require manual interaction to guide grid construction, and to cut objects with holes into disk-like regions. Our new algorithm is empirically at least 10 times faster than previous approaches. Furthermore, image segmentation can be used with our new algorithm to provide automatic gradient mesh generation for a whole image. Finally, fitting errors can be simply controlled to balance quality with storage.




    Vectorizing Cartoon Animations
    IEEE Transactions on Visualization and Computer Graphics, 2009, Vol. 15, No. 4, May/June, 618-629  
    Song-hai Zhang, Tao Chen, Yi-Fei Zhang, Shi-Min Hu, Ralph R. Martin

    We present a system for vectorizing 2D raster format carton animations. The output animations are visually flicker free, smaller in file size, and easy to edit. We identify decorative lines separately from coloured regions. We use an accurate and semantically meaningful image decomposition algorithm which supports an arbitrary color model for each region. To ensure temporal coherence in the output cartoon, we reconstruct a universal background for all frames, and separately extract foreground regions. Simple user-assistance is required to complete the background. Each region and decorative line is vectorized and stored together with their motions from frame to frame.




    A new watermarking method for 3D model based on integral invariant
    IEEE Transactions on Visualization and Computer Graphics, 2009, Vol. 15, No. 2, March/April, 285-294   
    Yu-Ping Wang and Shi-Min Hu

    In this report, we propose a new semi-fragile watermarking algorithm for the authentication of 3D models based on integral invariants. To do so, we embed a watermark image by modifying the integral invariants of some of the vertices. Basically, we shift a vertex and its neighbors in order to change the integral invariants. To extract the watermark, test all the vertices for the embedded information, and combine them to recover the watermark image. How many parts can
    the watermark image be recovered would help us to make the authentication decision. Experimental test shows that this method is robust against rigid transform and noise attack, and useful to test purposely attack besides transferring noise and geometrical transforming noise. An additional contribution of this paper is a new algorithm for computing two kinds of integral invariants.



    Other publications in 2009

    1. Yong-Jin Liu, Yu-Kun Lai and Shi-Min Hu, Stripification of Free-Form Surfaces with Global Error Bounds for Developable Approximation, IEEE Transactions on Automation Science and Engineering, 2009, Vol. 6, No. 4, 700-709    
    2. Yu-Kun Lai, Shi-Min Hu, Ralph R. Martin and Paul L. Rosin,apid and Effective Segmentation of 3D Models using Random Walks, Computer Aided Geometric Design, 2009, Vol. 26, No. 6, 665-679.  
    (An earlier version has been presented in ACM Symosium on Solid and Physical Modeling, June 2-4, 2008)
    3. Song-hai Zhang, Tao Chen, Yi-Fei Zhang, Shi-Min Hu, Ralph R. Martin, Video-Based Running Water Animation in Chinese Painting Style, Science in China Series F: Information Sciences, 2009, Vol. 52, No. 2, 162-171    
    4. Zhe Bian, Shi-Min Hu and Ralph R Martin, Evaluation for Small Visual Difference Between Conforming Meshes on Strain Field, Journal of Computer Science and Technology, 2009, Vol. 24, No. 1, 65-75    
    The preliminary version of this work has been presented on GMP2008.

    2008





    Optimal Surface Parameterization Using Inverse Curvature Map
    IEEE Transactions on Visualization and Computer Graphics, 2008, Vol. 14, No. 5, Septmber/Octber, 1054-1066.   
    Yong-Liang Yang, Junho Kim, Feng Luo, Shi-Min Hu, and Xianfeng Gu

    Mesh parameterization is a fundamental technique in computer graphics. The major goals during mesh parameterization are to minimize both the angle distortion and the area distortion. Angle distortion can be eliminated by use of conformal mapping, in principle. Our paper focuses on solving the problem of nding the best discrete conformal mapping that also minimizes area distortion. Major theoretical results and practical algorithms are presented for optimal parameterization based on the inverse curvature map. Comparisons are conducted with existing methods and using different energies. Novel parameterization applications are also introduced. The theoretical framework of the inverse curvature map can be applied to further study discrete conformal mappings.




    Shrinkability Maps for Content-Aware Video Resizing
    Computer Graphics Forum, Special issue of Pacific Graphics 2008 , Vol. 27, No. 7, 1797-1804 .
    Yi-Fei Zhang,Shi-Min Hu, Ralph R. Martin

    A novel method is given for content-aware video resizing, i.e. targeting video to a new resolution (which may involve aspect ratio change) from the original. We precompute a per-pixel cumulative shrinkability map which takes into account both the importance of each pixel and the need for continuity in the resized result. (If both x and y resizing are required, two separate shrinkability maps are used, otherwise one suffices). A random walk model is used for efficient offline computation of the shrinkability maps. The latter are stored with the video to create a multi-sized video, which permits arbitrarysized new versions of the video to be later very efficiently created in real-time, e.g. by a video-on-demand server supplying video streams to multiple devices with different resolutions. These shrinkability maps are highly compressible, so the resulting multi-sized videos are typically less than three times the size of the original compressed video. A scaling function operates on the multi-sized video, to give the new pixel locations in the result, giving a high-quality content-aware resized video.



    Shape Deformation using a Skeleton to Drive Simplex Transformations
    IEEE Transactions on Visualization and Computer Graphics, 2008, Vol. 14, No. 3, May/June, 693-706  
    Han-Bing Yan, Shi-Min Hu, Ralph R Martin, and Yong-Liang Yang
    The preliminary version of this work has been presented on CGI 2006

    This paper presents a skeleton-based method for deforming meshes (the skeleton need not be the medial axis). The significant difference from previous skeleton-based methods is that the latter use the skeleton to control movement of vertices whereas we use it to control the simplices defining the model. By doing so, errors that occur near joints in other methods can be spread over the whole mesh, via an optimization process, resulting in smooth transitions near joints of the skeleton. By controlling simplices, our method has the additional advantage that no vertex weights need be defined on the bones, which is a tedious requirement in previous skeleton-based methods. Furthermore, by incorporating the translation vector in our optimisation, unlike other methods, we do not need to fix an arbitrary vertex, and the deformed mesh moves with the deformed skeleton. Our method can also easily be used to control deformation by moving a few chosen line segments, rather than a skeleton.




    Spherical Piecewise Constant Basis Functions for All-Frequency Precomputed Radiance Transfer
    IEEE Transactions on Visualization and Computer Graphics, 2008, Vol. 14, No. 2, March/April, 454-467  
    Kun Xu, Yun-Tao Jia, Hongbo Fu, Shi-Min Hu and Chiew-Lan Tai

    This paper presents a novel basis function, called spherical piecewise constant basis function (SPCBF), for precomputed radiance transfer. SPCBFs have several desirable properties: rotatability, ability to represent all-frequency signals, and support for efficient multiple product. By partitioning the illumination sphere into a set of subregions, and associating each subregion with an SPCBF valued 1 inside the region and 0 elsewhere, we precompute the light coefficients using the resulting SPCBFs. We run-time approximate BRDF and visibility coefficients with the same set of SPCBFs through fast lookup of summed-area-table (SAT) and visibility distance table (VDT), respectively. SPCBFs enable new effects such as object rotation in all-frequency rendering of dynamic scenes and onthe-fly BRDF editing under rotating environment lighting. With graphics hardware acceleration, our method achieves real-time frame rates.

    Video: Download video here (13.0MB).



    Other publications in 2008

    1. Yu-Kun Lai, Yong-Jin Liu, Yu Zang and Shi-Min Hu, Fairing Wireframes in Industrial Design, IEEE International Conference on Shape Modeling and Applications, June 4-6, 2008, 29-35.  
    2. Yong-Jin Liu, Matthew Ming-Fai Yuen, Geometry-optimized virtual human head and its applications, Computer & Graphics, 2008, Vol. 32, No. 6, 624-631    


    2007





    Editing The Topology of 3D Models by Sketching
    ACM Transactions on Graphics, Vol. 26, No. 3, Article 42, ACM SIGGRAPH 2007
    Tao Ju, Qian-Yi Zhou and Shi-Min Hu

    We present a method for modifying the topology of a 3D model with user control. The heart of our method is a guided topology editing algorithm. Given a source model and a user-provided target shape, the algorithm modifies the source so that the resulting model is topologically consistent with the target. Our algorithm permits removing or adding various topological features (e.g., handles, cavities and islands) in a common framework and ensures that each topological change is made by minimal modification to the source model. To create the target shape, we have also designed a convenient 2D sketching interface for drawing 3D line skeletons. As demonstrated in a suite of examples, the use of sketching allows more accurate removal of topological artifacts than previous methods, and enables creative designs with specific topological goals.

    Video:  Download video here (31.8MB). (Cannot open the video? Cannot hear the audio? Get latest QuickTime player.)

    Software: A software MendIT based on this paper will come soon, please refer to webpage: http://graphics.usc.edu/~qianyizh/software.html




    Real-time homogeneous translucent material editing
    EuroGraphics 2007, Computer Graphics Forum, Vol. 26, No. 3, 545–552.
    Kun Xu, Yue Gao, Yong Li, Tao Ju and Shi-Min Hu

    This paper presents a novel method for real-time homogeneous translucent material editing under fixed illumination. We consider the complete analytic BSSRDF model proposed by Jensen et al.[JMLH01], including both multiple scattering and single scattering. Our method allows the user to adjust the analytic parameters of BSSRDF and provides high-quality, real-time rendering feedback. Inspired by recently developed Precomputed Radiance Transfer (PRT) techniques, we approximate both the multiple scattering diffuse reflectance function and the single scattering exponential attenuation function in the analytic model using basis functions, so that re-computing the outgoing radiance at each vertex as parameters change reduces to simple dot products. In addition, using a non-uniform piecewise polynomial basis, we are able to achieve smaller approximation error than using bases adopted in previous PRT-based works, such as spherical harmonics and wavelets. Using hardware acceleration, we demonstrate that our system generates images comparable to [JMLH01] at real-time frame-rates.

    Video: Download video here (17.3MB).




    Topology Repair of Solid Models Using Skeletons
    IEEE Transactions on Visualization and Computer Graphics, 2007, Vol. 13, No. 4, 675-685.
    Qian-Yi Zhou, Tao Ju and Shi-Min Hu

    We present a method for repairing topological errors on solid models in the form of small surface handles, which often arise from surface reconstruction algorithms. We utilize a skeleton representation that offers a new mechanism for identifying and measuring handles. Our method presents two unique advantages over previous approaches. First, handle removal is guaranteed not to introduce invalid geometry or additional handles. Second, by using an adaptive grid structure, our method is capable of processing huge models efficiently at high resolutions.

    Slides: Download slides here (24.8MB). A poster for this paper is also available, download poster here (7.5MB).

    Software: A software TopoMender based on this paper is now available, please refer to webpage: http://graphics.usc.edu/~qianyizh/software.html




    Robust Feature Classification and Editing
    IEEE Transactions on Visualization and Computer Graphics, 2007, Vol. 13, No.1, January/Feburary, 34-45.
    Yu-Kun Lai, Qian-Yi Zhou, Shi-Min Hu, Johannes Wallner and Helmut Pottmann

    Sharp edges, ridges, valleys and prongs are critical for the appearance and an accurate representation of a 3D model. In this paper, we propose a novel approach that deals with the global shape of features in a robust way. Based on a remeshing algorithm which delivers an isotropic mesh in a feature sensitive metric, features are recognized on multiple scales via integral invariants of local neighborhoods. Morphological and smoothing operations are then used for feature region extraction and classification into basic types such as ridges, valleys and prongs. The resulting representation of feature regions is further used for feature-specific editing operations.



    Other publications in 2007

    1. Jean-Baptiste Debard(Yang Fei), Romain Balp (Bai Luomin) , Raphaelle Chaine, Dynamic Delaunay tetrahedralisation of a deforming surface, The Visual Computers, 2007, Vol. 23, No. 12, 975 - 986
    2. Yong-Jin Liu, Qian-Yi Zhou and Shi-Min Hu, Handling Degenerate Cases in Exact Geodesic Computation on Triangle Meshes, The Visual Computers, 2007, Vol. 23, No. 9-11, 661-668. 
    3. Yong-Jin Liu, Kai Tang, Ajay Joneja, Modeling dynamic developable meshes by the Hamilton principle, Computer-Aided Design, 2007, Vol. 39, No. 9, 719-731.
    4. Han-Bing Yan, Shi-Min Hu, Ralph R Martin, 3D morphing using strain field interpolation, Journal of Computer Science and Technology, 2007, Vol. 22, No. 1, 147-155.


    2006




    Geometry and Convergence Analysis of Algorithms for Registration of 3D Shapes
    Geometry and convergence analysis of algorithms for registration of 3D shapes, International Journal of Computer Vision, 2006, Vol. 67, No. 3, 277-296.
    Helmut Pottmann, Qi-Xing Huang, Yong-Liang Yang and Shi-Min Hu

    The computation of a rigid body transformation which optimally aligns a set of measurement points with a surface and related registration problems are studied from the viewpoint of geometry and optimization.We provide a convergence analysis for widely used registration algorithms such as ICP, using either closest points (Besl and McKay [2]) or tangent planes at closest points (Chen and Medioni [4]), and for a recently developed approach based on quadratic approximants of the squared distance function [24]. ICP based on closest points exhibits local linear convergence only. Its counterpart which minimizes squared distances to the tangent planes at closest points is a Gauss-Newton iteration; it achieves local quadratic convergence for a zero residual problem and { if enhanced by regularization and step size control { comes close to quadratic convergence in many realistic scenarios. Quadratically convergent algorithms are based on the approach in [24]. The theoretical results are supported by a number of experiments; there, we also compare the algorithms with respect to global convergence behavior, stability and running time.




    Robust Principal Curvatures on Multiple Scales
    Proceedings of 4th Eurographics Symposium on Geometry Processing (2006). Eurographics Association, 223-226.
    Yong-Liang Yang, Yu-Kun Lai, Shi-Min Hu and Helmut Pottmann

    Geometry processing algorithms often require the robust extraction of curvature information. We propose to achieve this with principal component analysis (PCA) of local neighborhoods, defined via spherical kernels centered on the given surface $\Phi$. Intersection of a kernel ball $B_r$ or its boundary sphere $S_r$ with the volume bounded by $\Phi$ leads to the so-called ball and sphere neighborhoods. Information obtained by PCA of these neighborhoods turns out to be more robust than PCA of the patch neighborhood $B_r \cap \Phi$ previously used. The relation of the quantities computed by PCA with the principal curvatures of $\Phi$ is revealed by an asymptotic analysis as the kernel radius $r$ tends to zero. This also allows us to define principal curvatures ''at scale $r$'' in a way which is consistent with the classical setting. The advantages of the new approach are discussed in a comparison with results obtained by normal cycles and local fitting; whereas the former method somewhat lacks in robustness, the latter does not achieve a consistent behavior at features on coarse scales. As to applications, we address computing principal curves and feature extraction on multiple scales.



    Other publications in 2006

    1. Qi-Xing Huang, Simon Flory, Natasha Gelfand, Michael Hofer and Helmut Pottmann, Reassembling Fractured Objects by Geometric Matching, ACM Transactions on Graphics, Vol. 25, No. 3, 569-578, ACM SIGGRAPH 2006
    2. Yang Liu, Helmut Pottmann, Johannes Wallner, Yong-Liang Yang and Wenping Wang, Geometric Modeling with Conical Meshes and Developable Surfaces, ACM Transactions on Graphics, Vol. 25 , No. 3, 681-689, ACM SIGGRAPH 2006
    3. Yu-Kun Lai, Shi-Min Hu and Helmut Pottmann, Surface Fitting Based on a Feature Sensitive Parameterization, Computer-Aided Design, 2006, Vol. 38, No. 7, 800--807.
    4. Li Jin, Donguk Kim, Lisen Mu, Deok-Soo Kim and Shi-Min Hu, A Sweepline Algorithm for Euclidean Voronoi Diagram of Circles, Computer-Aided Design, 2006, Vol. 38, No. 3, 260-278.
    5. Yu-Kun Lai, Shi-Min Hu and Ralph R. Martin, Surface Mosaics, The Visual Computer, 2006, Vol. 22, No. 9-10, 604-611 (Pacific Graphics 2006).
    6. Jiaping Wang, Kun Xu, Kun Zhou, Stephen Lin, Shi-Min Hu and Baining Guo, Spherical Harmonics Scaling, The Visual Computer, 2006, Vol. 22, No. 9-10, 713-720 (Pacific Graphics 2006).
    7. Yu-Kun Lai, Qian-Yi Zhou, Shi-Min Hu and Ralph R. Martin, Feature Sensitive Mesh Segmentation, ACM Symp. Solid and Physical Modeling, 7-16, 2006.
    8. Xiao-Hua Cai, Yun-Tao Jia, Xi Wang, Shi-Min Hu and Ralph R. Martin, Rendering Soft Shadows using Multilayered Shadow Fins, Computer Graphics Forum, 2006, Vol.25, No.1, 1-14.


    2005




    Video Completion using Tracking and Fragment Merging
    The Visual Computer, 2005, Vol. 21, No. 8-10, 601-601. (Pacific Graphics 2005)
    Yun-Tao Jia, Shi-Min Hu and Ralph R. Martin

    Video completion is the problem of automatically filling space-time holes in video sequences left by the removal of unwanted objects in a scene. We solve it using texture synthesis, filling a hole inwards using three steps iteratively: we select the most promising target pixel at the edge of the hole, we find the source fragment most similar to the known part of the target neighborhood, and we merge source and target fragments to complete the target neighborhood, reducing the size of the hole. Earlier methods were slow, due to searching the whole video data for source fragments or completing holes pixel by pixel; they also produced blurred results due to sampling and smoothing. For speed, we track moving objects, allowing us to use a much smaller search space when seeking source fragments; we also complete holes fragment by fragment instead of pixelwise. Fine details are maintained by use of a graph cut algorithm when merging source and target fragments. Further techniques ensure temporal consistency of hole filling over successive frames. Examples demonstrate the effectiveness of our method.



    Other publications in 2005

    1. Xu-Ping Zhu, Shi-Min Hu, Chiew-Lan Tai, and Ralph R Martin, A Marching Method for Computing Intersection Curves of Two Subdivision Solids, in Mathematics of Srufaces XI, Eds. R. R. Martin, H. Bez, M. A. Sabin, 458-471, 2005.
    2. Johannes Wallner, Hans-Peter Schrocker, Shi-min Hu, Tolerance in geometric constraints solving, Reliable Computing, 2005, Vol. 11, No. 3, 235-251.
    3. Yu-Kun Lai, Shi-Min Hu, Xianfeng Gu, Ralph R. Martin, Geometric texture synthesis and transfer via geometry images, ACM Solid and Physical Modeling, MIT, USA, June 13-15, 2005, 15-26.
    4. Shi-min Hu, Johannes Wallner, A second order algorithm for orthogonal projection onto curves and surfaces, Computer Aided Geometric Design, 2004, Vol.22, No. 3, 251-260.
    5. Qi-Xing Huang, Shi-Min Hu and Ralph R. Martin, Fast degree elevation and knot insertion for B-spline curves, Computer Aided Geometric Design, 2005, Vol 22, No. 2, 183-197.


    2004




    Generalized Displacement Maps
    Proceedings of Eurographics Symposium on Rendering, 2004
    Xi Wang, Xin Tong, Stephen Lin, Shimin Hu, Baining Guo and Heung-Yeung Shum

    In this paper, we introduce a real-time algorithm to render the rich visual effects of general non-height-field geometric details, known as mesostructure. Our method is based on a five-dimensional generalized displacement map (GDM) that represents the distance of solid mesostructure along any ray cast from any point within a volumetric sample. With this GDM information, we propose a technique that computes mesostructure visibility jointly in object space and texture space which enables both control of texture distortion and efficient computation of texture coordinates and shadowing. GDM can be rendered with either local or global illumination as a per-pixel process in graphics hardware to achieve real-time rendering of general mesostructure.


    Other publications in 2004

    1. Han-Bing Yan, Shi-Min Hu, Ralph R. Martin, Morphing Based on Strain Field Interpolation, Journal of Computer Animation and Virtual Worlds (CAVW), 2004, Vol.15, No.3-4, 443-452.
    2. Shi-min Hu, Johannes Wallner, Error Propagation through Geometric Transformations, Journal for Geometry and Graphics, 2004, Vol.8, No.2, 171-183.
    3. Shi-Min Hu, Chen-feng Li, Hui Zhang, Actual Morphing: A phsical-based approach for blending two 2D/3D shapes, ACM Symposium on Solid Modeling and Applications, Genova, Italy, June 9-11, 2004.


    2003




    View-Dependent Displacement Mapping
    ACM Transactions on Graphics, Vol. 22, No. 3. 334-339, ACM SIGGRAPH 2003
    Lifeng Wang, Xi Wang, Xin Tong, Steve Lin, Shimin Hu, Baining Guo and Heung-Yeung Shum

    Significant visual effects arise from surface mesostructure, such as fine-scale shadowing, occlusion and silhouettes. To efficiently render its detailed appearance, we introduce a technique called viewdependent displacement mapping (VDM) that models surface displacements along the viewing direction. Unlike traditional displacement mapping, VDM allows for efficient rendering of selfshadows, occlusions and silhouettes without increasing the complexity of the underlying surface mesh. VDM is based on per-pixel processing, and with hardware acceleration it can render mesostructure with rich visual appearance in real time.


    Other publications in 2003

    1. Xu-Ping Zhu, Shi-Min Hu and Martin Ralph, Skeleton-Based Seam Computation for Triangulated Surface Parameterization, In Proceedings of Mathematics in Surfaces X, Sept 2003, Leeds, UK; Lecture Notes in Computer Science, 2003. [PS]
    2. Chiew-Lan Tai, Hu Shi-Min and Qixing Huang, Approximate merging of B-Spline curves via knot adjustment and constrained optimization, Computer Aided Design, 2003, Vol. 35, No. 10, 893 - 899.
    3. Xi Wang, Lifeng Wang, Ligang Liu, Shi-Min Hu and Baining Guo, Interactive Modeling of Tree Bark, In: Proceedings of Pacific Graphics 2003, IEEE CS Press, Oct 8-10, 2003
    4. Tao Wang, Yong Rui, Shi-Min Hu and Jia-guang Sun, Adaptive tree similarity learning for image retrieval, Multimedia Systems, 2003, Vol. 9, 131-143.


    2002


    1. Shi-Min Hu, Chiew-Lan Tai, Song-Hai Zhang, An Extension algorithm for B-spline curves by curve unclamping, Computer Aided Design, 2002, Vol. 34, No. 5, 415-4191.
    2. Yan-Tao Li, Shi-Min Hu and Jia-Guang Sun, A Constructive Approach to Solving 3-D Geometric Constraint Systems Using Dependence Analysis, Computer Aided Design, 2002, Vol. 34, No. 2, 97-108.
    3. Liu Shi-Xia, Hu Shi-Min, Sun Jiaguang, Two accelerating techniques for 3D reconstruction, Journal of Computer Science and Technology, 17(3), 362-368, 2002.



    2001


    1. Shi-Xia Liu, Shi-Min Hu, Yu-Jian Chen, Jia-Guang Sun, Reconstruction of curved solids from engineering drawings, Computer Aided Design, 2001, Vol. 33, No. 14, 1059-1072.
    2. Shi-Min Hu, Youfu Li, Tao JU, Xiang Zhu, Modifying the shape of NURBS surfaces with geometric constraints, Computer Aided Design, 2001, Vol. 33, No. 12, 903-912.
    3. Shi-Min, Hu Conversion between triangular and rectangular Bezier patches, Computer Aided Geometric Design, 2001, Vol.18, No. 7, 667-671. (In Special issue of memory of P. Bezier).
    4. Shi-Min Hu, Hui Zhang, Chiew-Lan Tai, Jia-Guang Sun, Direct Manipulation of FFD: Efficient Explicit Solutions and Decomposible Multiple Point Constraints, The Visual Computers, 2001, Vol. 17, No. 6, 370-379.
    5. Jun-Hai Yong, Shi-Min Hu, Jia-Guang Sun, Degree reduction of B-spline curves, Computer Aided Geometric Design, 2001, Vol. 13, NO. 2, 2001, 117-127.
    6. Shi-Min Hu, Ruofeng Tong, Tao JU, Jia-Guang Sun, Approximate merging of a pair of Bezier curves, Computer Aided Design, Vol 33, No. 2, 125-136, 2001.
    7. Jun-Hai Yong, Shi-Min Hu, JIa-Guang Sun, CIM Algorithm for Approximating Three Dimensional Polygonal Curves, Journal of Computer Science and Technology, 16(6), 489-497,2001.
    8. Tao Wang, Yong Rui and Shi-Min Hu, Optimal Adaptive Learning for Image Retrieval, Proceedings of IEEE Computer Vision and Pattern Recognition (CVPR 2001), I-1140 to 1147, Kauai, Hawaii, December 11-13, 2001.
    9. Yan-Tao Li, Shi-Min Hu, Jia-Guang Sun, On the numerical redundancies of geometric constraint systems, Proceedings of Pacific Graphics 2001, 118-123, IEEE Computre Society Press, 2001, Tokyo.
    10. Jian-Hua Wu, Shi-Min Hu, Chiew-Lan Tai and Jia-Guang Sun, An effective feature-preserving mesh simplification scheme based on face constriction, Proceedings of Pacific Graphics 2001, 12-21, IEEE Computre Society Press, 2001, Tokyo.