Kun Xu

Kun Xu (Kun Xu)

I am an associate professor in the Department of Computer Science and Technology of Tsinghua University. I received my doctor and bachelor degree from Department of Computer Science and Technology. Tsinghua University in 2009 and in 2005, respectively.

My research interests include: real-time rendering, image/video editing, and 3D scene synthesis.

My contact info:

Email: xukun (at) tsinghua.edu.cn
Office: East Main Building 9-213, Tsinghua Univeristy, Beijing, P.R. China, P.C.: 100084


  • Program Co-Chair, Pacific Graphics 2015
  • Program Committee Member, SIGGRAPH Asia Technical Papers 2015
  • Program Committee Member, SIGGRAPH Asia Technical Briefs & Posters 2014,2015,2016,2017,2018
  • Program Committee Member, Pacific Graphics 2013, 2014, 2016, 2017, 2018
  • Youth Associate Editor, Frontiers of Computer Science, 2015-2016


  • National Science Fund for Excellent Young Scholars by NSFC, 2018
  • Young Elite Scientists Sponsorship Program by CAST, 2016
  • The State Natural Science Award of China (Second Class Prize, 4th Achiever), 2015
  • CCF-Intel Young Faculty Researcher Program (YFRP), 2013
  • Outstanding Doctoral Dissertations of CCF, 2009
  • Microsoft Fellowship, 2008

Main Publications

Chinese Text in the Wild
Tai-Ling Yuan, Zhe Zhu, Kun Xu, Cheng-Jun Li, Shi-Min Hu
arXiv:1803.00085, 2018.

We introduce Chinese Text in the Wild, a very large dataset of Chinese text in street view images. While optical character recognition (OCR) in document images is well studied and many commercial tools are available, detection and recognition of text in natural images is still a challenging problem, especially for more complicated character sets such as Chinese text. Lack of training data has always been a problem, especially for deep learning methods which require massive training data. In this paper we provide details of a newly created dataset of Chinese text with about 1 million Chinese characters annotated by experts in over 30 thousand street view images. This is a challenging dataset with good diversity. It contains planar text, raised text, text in cities, text in rural areas, text under poor illumination, distant text, partially occluded text, etc. For each character in the dataset, the annotation includes its underlying character, its bounding box, and 6 attributes. The attributes indicate whether it has complex background, whether it is raised, whether it is handwritten or printed, etc. The large size and diversity of this dataset make it suitable for training robust neural networks for various tasks, particularly detection and recognition. We give baseline results using several state-of-the-art networks, including AlexNet, OverFeat, Google Inception and ResNet for character recognition, and YOLOv2 for character detection in images. Overall Google Inception has the best performance on recognition with 80.5% top-1 accuracy, while YOLOv2 achieves an mAP of 71.0% on detection. Dataset, source code and trained models will all be publicly available on the website.

[ project page ] [ paper ] [ dataset ]

Real-time High-fidelity Surface Flow Simulation
Bo Ren, Tailing Yuan, Chenfeng Li, Kun Xu, and Shi-Min Hu
IEEE Transactions on Visualization and Computer Graphics, 24(8), 2411-2423, 2018.

Surface flow phenomena, such as rain water flowing down a tree trunk and progressive water front in a shower room, are common in real life. However, compared with the 3D spatial fluid flow, these surface flow problems have been much less studied in the graphics community. To tackle this research gap, we present an efficient, robust and high-fidelity simulation approach based on the shallow-water equations. Specifically, the standard shallow-water flow model is extended to general triangle meshes with a feature-based bottom friction model, and a series of coherent mathematical formulations are derived to represent the full range of physical effects that are important for real-world surface flow phenomena. In addition, by achieving compatibility with existing 3D fluid simulators and by supporting physically realistic interactions with multiple fluids and solid surfaces, the new model is flexible and readily extensible for coupled phenomena. A wide range of simulation examples are presented to demonstrate the performance of the new approach.

[ paper ] [ video ] [ bibtex ]

Computational Design of Transforming Pop-up Books
Nan Xiao, Zhe Zhu, Ralph R. Martin, Kun Xu, Jia-Ming Lu, and Shi-Min Hu
ACM Transactions on Graphics, 37(1), 8:1--8:14, 2018. (presented at SIGGRAPH 2018)

We present the first computational tool to help ordinary users create transforming pop-up books. In each transforming pop-up, when the user pulls a tab, an initial flat 2D pattern, i.e. a 2D shape with a superimposed picture, such as an airplane, turns into a new 2D pattern, such as a robot, standing up from the page. Given the two 2D patterns, our approach automatically computes a 3D pop-up mechanism that transforms one pattern into the other; it also outputs a design blueprint, allowing the user to easily make the final model. We also present a theoretical analysis of basic transformation mechanisms; combining these basic mechanisms allows more flexibility of final designs. Using our approach, inexperienced users can create models in a short time; previously, even experienced artists often took weeks to manually create them. We demonstrate our method on a variety of real world examples.

[ paper ] [ video ] [ bibtex ]

A survey of image synthesis and editing with generative adversarial networks
Xian Wu, Kun Xu, and Peter Hall
Tsinghua Science and Technology, 22(6), 660-674, 2017.

This paper presents a survey of image synthesis and editing with Generative Adversarial Networks (GANs). GANs consist of two deep networks, a generator and a discriminator, which are trained in a competitive way. Due to the power of deep networks and the competitive training manner, GANs are capable of producing reasonable and realistic images, and have shown great capability in many image synthesis and editing applications. This paper surveys recent GAN papers regarding topics including, but not limited to, texture synthesis, image inpainting, image-to-image translation, and image editing.

[paper ] [ bibtex ]

Static Scene Illumination Estimation from Video with Applications
Bin Liu, Kun Xu, and Ralph R. Martin
Journal of Computer Science and Technology, 32(3), 430-442, 2017.

We present a system that automatically recovers scene geometry and illumination from a video, providing a basis for various applications. Previous image based illumination estimation methods either require user interaction or external information in the form of a database. We adopt structure-from-motion and multi-view stereo for initial scene reconstruction, and then estimate an environment map represented by spherical harmonics (as these perform better than other bases). We also demonstrate several video editing applications that exploit the recovered geometry and illumination, including object insertion (e.g. for augmented reality), shadow detection, and video relighting.

[ paper ] [ bibtex ] [ videos ]

Efficient, Edge-Aware, Combined Color Quantization and Dithering
Hao-Zhi Huang, Kun Xu, Ralph R. Martin, Fei-Yue Huang, and Shi-Min Hu
IEEE Transactions on Image Processing, 25(3), 1152-1162, 2016.

In this paper we present a novel algorithm to simultaneously accomplish color quantization and dithering of images. This is achieved by minimizing a perception-based cost function which considers pixel-wise differences between filtered versions of the quantized image and the input image. We use edge aware filters in defining the cost function to avoid mixing colors on opposite sides of an edge. The importance of each pixel is weighted according to its saliency. To rapidly minimize the cost function, we use a modified multi-scale iterative conditional mode (ICM) algorithm which updates one pixel a time while keeping other pixels unchanged. As ICM is a local method, careful initialization is required to prevent termination at a local minimum far from the global one. To address this problem, we initialize ICM with a palette generated by a modified mediancut method. Compared to previous approaches, our method can produce high quality results with fewer visual artifacts but also requires significantly less computational effort.

[ paper ] [ bibtex ]

Faithful Completion of Images of Scenic Landmarks using Internet Images
Zhe Zhu, Hao-Zhi Huang, Zhi-Peng Tan, Kun Xu, and Shi-Min Hu
IEEE Transactions on Visualization and Computer Graphics, 22(8), 1945 - 1958, 2016.

Previous works on image completion typically aim to produce visually plausible results rather than factually correct ones. In this paper, we propose an approach to faithfully complete the missing regions of an image. We assume that the input image is taken at a well-known landmark, so similar images taken at the same location can be easily found on the Internet. We first download thousands of images from the Internet using a text label provided by the user. Next, we apply two-step filtering to reduce them to a small set of candidate images for use as source images for completion. For each candidate image, a co-matching algorithm is used to find correspondences of both points and lines between the candidate image and the input image. These are used to find an optimal warp relating the two images. A completion result is obtained by blending the warped candidate image into the missing region of the input image. The completion results are ranked according to combination score, which considers both warping and blending energy, and the highest ranked ones are shown to the user. Experiments and results demonstrate that our method can faithfully complete images.

[ paper ] [ bibtex ]

Magic Decorator: Automatic Material Suggestion for Indoor Digital Scenes
Kang Chen, Kun Xu, Yizhou Yu, Tian-Yi Wang, Shi-Min Hu
ACM Transactions on Graphics, 34(6), 232:1 - 232:11, 2015. (Proceedings of SIGGRAPH Asia 2015)

Assigning textures and materials within 3D scenes is a tedious and labor-intensive task. In this paper, we present Magic Decorator, a system that automatically generates material suggestions for 3D indoor scenes. To achieve this goal, we introduce local material rules, which describe typical material patterns for a small group of objects or parts, and global aesthetic rules, which account for the harmony among the entire set of colors in a specific scene. Both rules are obtained from collections of indoor scene images. We cast the problem of material suggestion as a combinatorial optimization considering both local material and global aesthetic rules. We have tested our system on various complex indoor scenes. A user study indicates that our system can automatically and efficiently produce a series of visually plausible material suggestions which are comparable to those produced by artists.

[project page] [ paper 22M ] [slides 85M ] [supplemental 45M] [ video 45M ] [ bibtex ]

A Practical Algorithm for Rendering Interreflections with All-frequency BRDFs
Kun Xu, Yan-Pei Cao, Li-Qian Ma, Zhao Dong, Rui Wang, Shi-Min Hu
ACM Transactions on Graphics, 33(1), 10:1 - 10:16, 2014. (presented at SIGGRAPH 2014)

Algorithms for rendering interreflection (or indirect illumination) effects often make assumptions about the frequency range of the materials' reflectance properties. For example, methods based on Virtual Point Lights (VPLs) perform well for diffuse and semi-glossy materials but not so for highly-glossy or specular materials; the situation is reversed for methods based on ray tracing. In this paper, we present a practical algorithm for rendering interreflection effects with all-frequency BRDFs. Our method builds upon a Spherical Gaussian representation of the BRDF, based on which a novel mathematical development of the interreflection equation is made. This allows us to efficiently compute one-bounce interreflection from a triangle to a shading point, by using an analytic formula combined with a piecewise linear approximation. We show through evaluation that this method is accurate for a wide range of BRDFs. We further introduce a hierarchical integration method to handle complex scenes (i.e., many triangles) with bounded errors. Finally, we have implemented the present algorithm on the GPU, achieving rendering performance ranging from near-interactive to a few seconds per frame for various scenes with different complexity. 

[project page] [ paper 1.2M ] [slides 37M ] [ video 30.9M ] [ bibtex ]

Anisotropic Spherical Gaussians
Kun Xu, Wei-Lun Sun, Zhao Dong, Dan-Yong Zhao, Run-Dong Wu, Shi-Min Hu
ACM Transactions on Graphics 32(6), 209:1 - 209:11, 2013. (Proceedings of SIGGRAPH Asia 2013).

We present a novel anisotropic Spherical Gaussian (ASG) function, built upon the Bingham distribution [Bingham 1974], which is much more effective and efficient in representing anisotropic spherical functions than Spherical Gaussians (SGs). In addition to retaining many desired properties of SGs, ASGs are also rotationally invariant and capable of representing all-frequency signals. To further strengthen the properties of ASGs, we have derived approximate closed-form solutions for their integral, product and convolution operators, whose errors are nearly negligible, as validated by quantitative analysis. Supported by all these operators, ASGs can be adapted in existing SG-based applications to enhance their scalability in handling anisotropic effects. To demonstrate the accuracy and efficiency of ASGs in practice, we have applied ASGs in two important SG-based rendering applications and the experimental results clearly reveal the merits of ASGs. 

[project page] [ paper 2.3M ] [slides 16.5M ] [supplemental 1.7M] [ video 45.8M ] [ bibtex ]

Inverse Image Editing: Recovering a Semantic Editing History from a Before-and-After Image Pair
Shi-Min Hu, Kun Xu, Li-Qian Ma, Bin Liu, Bi-Ye Jiang, Jue Wang
ACM Transactions on Graphics 32(6), 194:1 - 194:11, 2013. (Proceedings of SIGGRAPH Asia 2013).

We study the problem of inverse image editing, which recovers a semantically-meaningful editing history from a source image and an edited copy. Our approach supports a wide range of commonly-used editing operations such as cropping, object insertion and removal, linear and non-linear color transformations, and spatially-varying adjustment brushes. Given an input image pair, we first apply a dense correspondence method between them to match edited image regions with their sources. For each edited region, we determine geometric and semantic appearance operations that have been applied. Finally, we compute an optimal editing path from the region-level editing operations, based on predefined semantic constraints. The recovered history can be used in various applications such as image re-editing, edit transfer, and image revision control. A user study suggests that the editing histories generated from our system are semantically comparable to the ones generated by artists.

[project page] [ paper 8.2M ] [slides 43.4M ] [supplemental 13.6M] [ bibtex ]

Change Blindness Images
Li-Qian Ma, Kun Xu, Tien-Tsin Wong, Bi-Ye Jiang, Shi-Min Hu
IEEE Transactions on Visualization and Computer Graphics 19(11),1808-1819, 2013.

Change blindness refers to human inability to recognize large visual changes between images. In this paper, we present the first computational model of change blindness to quantify the degree of blindness between an image pair. It comprises a novel context-dependent saliency model and a measure of change, the former dependent on the site of the change, and the latter describing the amount of change. This saliency model in particular addresses the influence of background complexity, which plays an important role in the phenomenon of change blindness. Using the proposed computational model, we are able to synthesize changed images with desired degrees of blindness. User studies and comparisons to state-of-the-art saliency models demonstrate the effectiveness of our model. 

[project page] [ paper 3.9M ] [slides 6.7M ] [supplemental 3.5M] [ video 13.2M] [ bibtex ]

Sketch2Scene: Sketch-based Co-retrieval and Co-placement of 3D Models
Kun Xu, Kang Chen, Hongbo Fu, Wei-Lun Sun, Shi-Min Hu
ACM Transactions on Graphics 32(4), 123:1-123:12, 2013. (Proceedings of SIGGRAPH 2013).

This work presents Sketch2Scene, a framework that automatically turns a freehand sketch drawing inferring multiple scene objects to semantically valid, well arranged scenes of 3D models. Unlike the existing works on sketch-based search and composition of 3D models, which typically process individual sketched objects one by one, our technique performs co-retrieval and co-placement of 3D relevant models by jointly processing the sketched objects. This is enabled by summarizing functional and spatial relationships among models in a large collection of 3D scenes as structural groups. Our technique greatly reduces the amount of user intervention needed for sketch-based modeling of 3D scenes and fits well into the traditional production pipeline involving concept design followed by 3D modeling. A pilot study indicates that it is promising to use our technique as an alternative but more efficient tool of standard 3D modeling for 3D scene construction. 

[project page] [ paper 6.1M ] [ pptx 33.0M ] [supplemental 4.4M ] [ video 12.5M ] [ bibtex ]

Accurate Translucent Material Rendering under Spherical Gaussian Lights
Ling-Qi Yan, Yahan Zhou, Kun Xu, Rui Wang
Computer Graphics Forum 31(7), 2267–2276, 2012. (Proceedings of Pacific Graphics 2012)

In this paper we present a new algorithm for accurate rendering of translucent materials under Spherical Gaussian (SG) lights. Our algorithm builds upon the quantized-diffusion BSSRDF model recently introduced in [dI11]. Our main contribution is an efficient algorithm for computing the integral of the BSSRDF with an SG light. We incorporate both single and multiple scattering components. Our model improves upon previous work by accounting for the incident angle of each individual SG light. This leads to more accurate rendering results, notably elliptical profiles from oblique illumination. In contrast, most existing models only consider the total irradiance received from all lights, hence can only generate circular profiles. Experimental results show that our method is suitable for rendering of translucent materials under finite-area lights or environment lights that can be approximated by a small number of SGs.

[ paper ] [ bibtex ]

Efficient Antialiased Edit Propagation for Images and Videos
Li-Qian Ma, Kun Xu
Computer & Graphics 36(8), 1005–1012, 2012.

Edit propagation on images/videos has become more and more popular in recent years due to simple and intuitive interaction. It propagates sparse user edits to the whole data following the policy that nearby regions with similar appearances receive similar edits. While it gives a friendly editing mode, it often produces aliasing artifacts on edge pixels. In this paper, we present a simple algorithm to resolve this artifact for edit propagation. The key in our method is a new representation called Antialias Map, in which we represent each antialiased edge pixel by a linear interpolation of neighboring pixels around the edge, and instead of considering the original edge pixels in solving edit propagation, we consider those neighboring pixels. We demonstrate that our work is effective in preserving antialiased edges for edit propagation and could be easily integrated with existing edit propagation methods.

[ paper ] [ bibtex ]


Interactive Hair Rendering and Appearance Editing under Environment Lighting
Kun Xu, Li-Qian Ma, Bo Ren, Rui Wang, Shi-Min Hu
ACM Transactions on Graphics 30(6), 173:1-173:10, 2011. (Proceedings of SIGGRAPH Asia 2011)

We present an interactive algorithm for hair rendering and appearance editing under complex environment lighting represented as spherical radial basis functions (SRBFs). Our main contribution is to derive a compact 1D circular Gaussian representation that can accurately model the hair scattering function introduced by [Marschner et al. 2003]. The primary benefit of this representation is that it enables us to evaluate, at run-time, closed-form integrals of the scattering function with each SRBF light, resulting in efficient computation of both single and multiple scatterings. In contrast to previous work, our algorithm computes the rendering integrals entirely on the fly and does not depend on expensive precomputation. Thus we allow the user to dynamically change the hair scattering parameters, which can vary spatially. Analyses show that our 1D circular Gaussian representation is both accurate and concise. In addition, our algorithm incorporates the eccentricity of the hair. We implement our algorithm on the GPU, achieving interactive hair rendering and simultaneous appearance editing under complex environment maps for the first time.

[project page] [ paper 1.9M ] [ pptx 34.6M 4.4M (no video) ] [supplemental 1.3M ] [ video 53.2M ] [ bibtex ]

Efficient Affinity-based Edit Propagation using K-D Tree
Kun Xu, Yong Li , Tao Ju, Shi-Min Hu, Tian-Qiang Liu
ACM Transactions on Graphics 28(5), 118:1-118:6, 2009
. (Proceedings of SIGGRAPH Asia 2009)

Image/video editing by strokes has become increasingly popular due to the ease of interaction. Propagating the user inputs to the rest of the image/video, however, is often time and memory consuming especially for large data. We propose here an efficient scheme that allows affinity-based edit propagation to be computed on data containing tens of millions of pixels at interactive rate (in matter of seconds). The key in our scheme is a novel means for approximately solving the optimization problem involved in edit propagation, using adaptive clustering in a high-dimensional, affinity space. Our approximation significantly reduces the cost of existing affinity-based propagation methods while maintaining visual fidelity, and enables interactive stroke-based editing even on high resolution images and long video sequences using commodity computers.

[ paper ] [ slides ] [ video ] [ bibtex ]

Edit Propagation on Bidirectional Texture Functions
Kun Xu, Jiaping Wang, Xin Tong, Shi-Min Hu, Baining Guo
Computer Graphics Forum 28(7), 1871-
1877, 2009. (Proceedings of Pacific Graphics 2009)

We propose an efficient method for editing bidirectional texture functions (BTFs) based on edit propagation scheme. In our approach, users specify sparse edits on a certain slice of BTF. An edit propagation scheme is then applied to propagate edits to the whole BTF data. The consistency of the BTF data is maintained by propagating similar edits to points with similar underlying geometry/reflectance. For this purpose, we propose to use view independent features including normals and reflectance features reconstructed from each view to guide the propagation process. We also propose an adaptive sampling scheme for speeding up the propagation process. Since our method needn’t any accurate geometry and reflectance information, it allows users to edit complex BTFs with interactive feedback.

[ paper ] [ video ] [ bibtex ]


Spherical Piecewise Constant Basis Functions for All-Frequency Precomputed Radiance Transfer
Kun Xu, Yun-Tao Jia, Hongbo Fu, Shi-Min Hu, Chiew-Lan Tai
IEEE Transaction on Visualization and Computer Graphics 14(2), 454-467, 2008

This paper presents a novel basis function, called spherical piecewise constant basis function (SPCBF), for precomputed radiance transfer. SPCBFs have several desirable properties: rotatability, ability to represent all-frequency signals, and support for efficient multiple product. By partitioning the illumination sphere into a set of subregions, and associating each subregion with an SPCBF valued 1 inside the region and 0 elsewhere, we precompute the light coefficients using the resulting SPCBFs. We run-time approximate BRDF and visibility coefficients with the same set of SPCBFs through fast lookup of summed-area-table (SAT) and visibility distance table (VDT), respectively. SPCBFs enable new effects such as object rotation in all-frequency rendering of dynamic scenes and onthe-fly BRDF editing under rotating environment lighting. With graphics hardware acceleration, our method achieves real-time frame rates.

[ paper ] [ video ] [ bibtex ]

Real-time homogeneous translucent material editing
Kun Xu, Yue Gao, Yong Li, Tao Ju, Shi-Min Hu
Computer Graphics Forum 26(3), 545-552, 2007. (Proceedings of EuroGraphics 2007)

This paper presents a novel method for real-time homogeneous translucent material editing under fixed illumination. We consider the complete analytic BSSRDF model proposed by Jensen et al.[JMLH01], including both multiple scattering and single scattering. Our method allows the user to adjust the analytic parameters of BSSRDF and provides high-quality, real-time rendering feedback. Inspired by recently developed Precomputed Radiance Transfer (PRT) techniques, we approximate both the multiple scattering diffuse reflectance function and the single scattering exponential attenuation function in the analytic model using basis functions, so that re-computing the outgoing radiance at each vertex as parameters change reduces to simple dot products. In addition, using a non-uniform piecewise polynomial basis, we are able to achieve smaller approximation error than using bases adopted in previous PRT-based works, such as spherical harmonics and wavelets. Using hardware acceleration, we demonstrate that our system generates images comparable to [JMLH01] at real-time frame-rates.

[ paper ] [ video ] [ bibtex ]
Spherical Harmonics Scaling
Jiaping Wang, Kun Xu, Kun Zhou, Stephen Lin, Shi-Min Hu, Baining Guo
Pacific Conference on Computer Graphics and Applications, Oct 2006.
The Visual Computer, Volume 22, p713-720, Sept 2006.

In this paper, we present a new SH operation, called spherical harmonics scaling, to shrink or expand a spherical function in frequency domain. We show that this problem can be elegantly formulated as a linear transformation of SH projections, which is efficient to compute and easy to implement on a GPU. Spherical harmonics scaling is particularly useful for extrapolating visibility and radiance functions at a sample point to points closer to or farther from an occluder or light source. With SH scaling, we present applications to lowfrequency shadowing for general deformable object, and to efficient approximation of spherical irradiance functions within a mid-range illumination environment.

[ paper ] [ video ] [ bibtex ]

Copyright © 2015 Kun Xu