VISION-BASED ANALYSIS OF OBJECT POSES AND HUMAN ACTIVITIES FOR HUMAN-COMPUTER INTERACTION APPLICATIONS

博士 === 國立交通大學 === 資訊科學系 === 89 === In order to increase productivity and to facilitate every day’s life, scientists and engineers have been trying to build intelligent systems that can interact with human beings via human ways for thousands of years. Such intelligent systems must have the capabiliti...

Full description

Bibliographic Details
Main Authors: Chin-Chun Chang, 張欽圳
Other Authors: Wen-Hsiang Tsai
Format: Others
Language:en_US
Published: 2000
Online Access:http://ndltd.ncl.edu.tw/handle/79224045715994893386
Description
Summary:博士 === 國立交通大學 === 資訊科學系 === 89 === In order to increase productivity and to facilitate every day’s life, scientists and engineers have been trying to build intelligent systems that can interact with human beings via human ways for thousands of years. Such intelligent systems must have the capabilities to analyze human activities and to provide natural feedback. Today, since computer vision is a noninvasive method of sensing, vision-based systems for analyzing human activities are more convenient and friendly for many applications than the other ways of sensing. Hence, technologies of computer vision for analyzing human activities are desired for developing human-computer interaction systems. Since it is natural for humans to conduct intended activities by head poses, facial expressions, hand gestures, and leg movements, we are interested in analyzing these human activities in this dissertation study, and have proposed new methods for analyzing such human activities. In addition, by placing some man-made marks on a human, the activity of the human can be determined by analyzing the motion of these marks. This technique is often used for precise localization. There exist many vision-based techniques of localization but few of them can tell us about the qualities of inputs and estimated results. In this dissertation study, this problem is also investigated. For the analysis of the head pose and the facial expression, four new iterative methods based on the use of single images of human faces are proposed. Two of them are direct methods and designed for simplified cases. The other two are iterative methods for the general case. The two direct methods and one of the iterative methods are derived from the perspective projection equations of the feature points on the human face. The other iterative method extends the concept of successive scaled orthographic approximations to estimate the parameters for the human face. Experimental results show that the proposed methods are robust. Furthermore, the iterative methods are shown to have high percentages of convergence, proving the feasibility of the proposed approach. For the analysis of free hand gestures, a new model-based system for analyzing free-hand gestures from single images by computer vision techniques is proposed. In this study, the orientation and position of the hand, and the joint angles of the fingers and the thumb are estimated separately by two steps. The orientation and position of the hand are estimated first by using sparse range data generated by laser beams and the generalized Hough transform. Next, estimation of the parameters for the joint angles of the fingers and the thumb is regarded as an optimization problem. Possible configurations for the fingers and the thumb are generated by a novel inverse kinematic technique, and the best configurations are found by a new algorithm based on the dynamic programming technique. The estimated parameters are shown suitable for 3-D hand gesture animation by experiments. In addition, the applicability of the proposed system is also demonstrated by a simple hand gesture recognition system. Experimental results show the feasibility of the proposed approach. For the analysis of leg movements, a vision-based system for tracking and interpreting leg motion in image sequences using a single camera is developed for a user to control his movement in the virtual world by his legs. Twelve control commands are defined. The trajectories of the color marks placed on the shoes of the user are used to determine the types of leg movement by a first-order Markov process. Then, the types of leg movement are encoded symbolically as input to Mealy machines to recognize the control command associated with a sequence of leg movements. The proposed system is implemented on a commercial PC without any special hardware. Because the transition functions of Mealy machines are deterministic, the implementation of the proposed system is simple and the response time of the system is short. Experimental results with a 14 Hz frame rate on image resolution are included to prove the feasibility of the proposed approach. To develop a reliable computer vision system, the employed algorithm must guarantee good output quality. In this study, to ensure the quality of the pose estimated from line features, two simple test functions based on statistical hypothesis testing are defined. First, an error function based on the relation between the line features and some quality thresholds is defined. By using the first test function defined by a lower bound of the error function, poor input can be detected before estimating the pose. After pose estimation, the second test function can be used to decide if the estimated result is sufficiently accurate. Experimental results show that the first test function can detect input with low qualities or erroneous line correspondences, and that the overall proposed method yields reliable estimated results. In summary, the conducted experimental results of all the proposed approaches show their feasibility and prove that the proposed systems can be taken as the basis for developing a more effective human-computer interaction system.