Summary: | When people interact with each other, they not only listen to what the other says, they react to facial expressions, gaze direction, and head movement. Human-computer interaction would be enhanced in a friendly and non-intrusive way if computers could understand and respond to users’ body language in the same way.
This thesis aims to investigate new methods for human computer interaction by combining information from the body language of the head to recognize the emotional and cognitive states. We concentrated on the integration of facial expression, eye gaze and head movement using soft computing techniques. The whole procedure is done in two-stage. The first stage focuses on the extraction of explicit information from the modalities of facial expression, head movement, and eye gaze. In the second stage, all these information are fused by soft computing techniques to infer the implicit emotional states.
In this thesis, the frequency of head movement (high frequency movement or low frequency movement) is taken into consideration as well as head nods and head shakes. A very high frequency head movement may show much more arousal and active property than the low frequency head movement which differs on the emotion dimensional space. The head movement frequency is acquired by analyzing the tracking results of the coordinates from the detected nostril points.
Eye gaze also plays an important role in emotion detection. An eye gaze detector was proposed to analyze whether the subject's gaze direction was direct or averted. We proposed a geometrical relationship of human organs between nostrils and two pupils to achieve this task. Four parameters are defined according to the changes in angles and the changes in the proportion of length of the four feature points to distinguish avert gaze from direct gaze. The sum of these parameters is considered as an evaluation parameter that can be analyzed to quantify gaze level.
The multimodal fusion is done by hybridizing the decision level fusion and the soft computing techniques for classification. This could avoid the disadvantages of the decision level fusion technique, while retaining its advantages of adaptation and flexibility. We introduced fuzzification strategies which can successfully quantify the extracted parameters of each modality into a fuzzified value between 0 and 1. These fuzzified values are the inputs for the fuzzy inference systems which map the fuzzy values into emotional states.
|