简介:
We propose a method for 3D human reconstruction from a single RGB image. Since this problem is highly intractable, we adopt a stage-wise, coarse-to-fine method consisting of three steps, namely inner body estimation, outer surface reconstruction and frontal surface detail refinement. Once an inner body is estimated from the given image, our method generates a dense semantic representation from the inner body to encode body shape and pose and to bridge the 2D image plane and 3D space. An image-guided volume-to-volume translation CNN is introduced to reconstruct the outer surface given the input image and the dense semantic representation. One key feature of our network is that it fuses different scales of image features into the 3D space through volumetric feature transformation, which helps to recover details of the subject's outer surface geometry. The details on the frontal areas of the outer surface are further refined through a normal map refinement network, which can be concatenated with the volume generation network using our proposed volumetric normal projection layer. We also contribute THUman, a 3D real-world human model dataset containing approximately 7000 models. The whole network is trained using training data generated from the dataset. Overall, due to the specific design of our network and the diversity in our dataset, our method enables 3D human reconstruction given only a single image and outperforms state-of-the-art approaches.
IFVT-Face:基于人工智能算法分析RGB视频中的面部表情
简介:
This technology mainly focuses on how to extract facial expression from the widely available video (eg those captured using a smartphone). In order to take advantage of using AI technology, we have collected a huge amount of video data from professional actors and the annotation is done by professional animators. We further developed a deep learning framework to 1) analyze the facial expression from videos, and 2) learning the correlations between facial expression and 3D facial animation parameters. By this, 3D animation can be created automatically by using a 2D video as input.
Key features:
1) Constructed a huge facial expression database
2) Developed a new deep learning framework to learn the correlation between facial expression and 3D animation by using our new facial expression database
3) Developed a new system for 3D facial animation with a Web-based interface as well as a Maya plug-in.
人体关键点检测
简介: ICCV 2017
Multi-person pose estimation in the wild is challenging. Although state-of-the-art human detectors have demonstrated good performance, small errors in localization and recognition are inevitable. These errors can cause failures for a single-person pose estimator (SPPE), especially for methods that solely depend on human detection results. In this work, we propose a novel regional multi-person pose estimation (RMPE) framework to facilitate pose estimation in the presence of inaccurate human bounding boxes. Our framework consists of three components: Symmetric Spatial Transformer Network (SSTN), Parametric Pose Non-Maximum-Suppression (NMS), and Pose-Guided Proposals Generator (PGPG). Our method is able to handle inaccurate bounding boxes and redundant detections, allowing it to achieve 76.7 mAP on the MPII(multiperson) dataset
利用相对情感强度编辑动作及照片的通用框架
简介: D2AT @ Siggraph Asia 2017, CAVW 2019
Unlike previous work that encode emotions into discrete motion style descriptors,we propose a continuous control indicator called emotion strength by controlling which a data-driven approach is presented to synthesize motions with fine control over emotions. Rather than interpolating motion features to synthesize new motion as in existing work, our method explicitly learns a model mapping low-level motion features to the emotion strength. Because the motion synthesis model is learned in the training stage, the computation time required for synthesizing motions at run time is very low.
特点1: the first method to learn the relationship between low-level features and the strength of emotion expressions for human
motion and facial expression synthesis
特点2: a real-time motion synthesis framework that is controlled solely by the strength of emotion expressions
特点3: a real-time face image editing framework that is controlled solely by the strength of emotion expressions
Fast Reconstruction of Human Body from a Single RGBD image
简介:
We present a fast 3D reconstruction method to recover realistic surfaces of complete human bodies free from the interference of lighting conditions. Specifically, the proposed approach introduces depth images into the reconstruction work to remove the scale ambiguity of color images and reconstruct 3D human bodies from both front and back views. With depth information, geometry can be independent from texture. Image transformation from the perspective view to the orthographic view ensures the completion of our reconstructed models. Additionally, we free color images from different lighting environments by encoding geometric information in our networks. Combining the “front” map with the “back” map inferred by our adversarial learning based networks, we can reconstruct our high-resolution 3D human models at 20 fps. Results on testing set and real data captured by consumer RGB-D sensors demonstrate the superior performance of our approach.
基于半监督学习的人体部位语义解析
简介: spotlight CVPR 2018
Human body part parsing, or human semantic part segmentation, is important to many computer vision tasks. In conventional semantic segmentation methods, the ground truth segmentations are provided and a fully convolutional network (FCN) or its variants are trained in an end-to-end manner. Although these methods have demonstrated impressive results, their performances highly depend on the quantity and quality of training data. Getting high quality training data, however, is labor intensive. In this work, we present a novel method to generate synthetic human part segmentation data using the easily-obtained human keypoint annotations. Our key idea is to exploit the anatomical similarity among human to transfer the parsing results of a person to another person with similar pose. Using these estimated results as extra training data, our semi-supervised model exceeds its strong-supervised counterpart by 6 mIOU on the PASCAL-Person-Part dataset, and we achieve the state-of-the-art human parsing result. Our approach is general, and it is ready to be extended to other object/animal parsing task assuming that their anatomical similarity can be annotated by keypoints.