Summary: | Being able to determine the pose of a hand is an important task for an artificial agent in order to facilitate a cognitive system. Hand pose estimation, in particular - because of its highly articulated nature, from is essential for a number of applications such as automatic sign language recognition and robot learning from demonstration. A typical essential hand model is formulated using around 30-50 degrees of freedom, implying a wide variety of possible configurations with a high degree of self occlusions leading to ambiguities and difficulties in automatic recognition. In addition, we are often interested in using a passive sensor, as a cam- era, to extract this information. These properties of hand poses warrant robust, efficient and consistent visual shape descriptors which can be utilized seamlessly for automatic hand pose estimation and hand tracking. A conducive view of the environment for its probabilis- tic modeling, is to perceive it as being controlled from an underlying unobserved latent variable. Given the observa- tions from the environment (hand images) and the features extracted from them, it is interesting to infer the state of this latent variable which controls the generating process of the data (hand pose). It becomes essential to investigate - the generative methods which produce hand images from well defined poses and the discriminative inverse problems where a hand pose need be recognized from an observed image. Central to both these paradigms is also the need to formulate a measure of goodness for comparing high dimen- sional data and separately for examining a model tailored for some data. In this project, three prototypical state-of-the-art vi- sual shape descriptors, commonly used for hand and hu- man body pose estimation are evaluated. The nature of the mappings from the hand pose space to the feature spaces spanned by the visual shape descriptors, in terms of the smoothness, discriminability, and generativity of the pose-feature mappings, as well as their robustness to noise in terms of these properties are studied. Based on this, recommendations are given on which types of applications each visual shape descriptor is suitable. Novel goodness measures are devised to quantify data similarities and to provide a scale for the performance of these visual shape descriptors. The evaluation of the experiments provides a basis for creating novel and improved models for hand pose estimation. === Handposeigenkanning ar, inte minst pa grund av dess le- dade natur, av central betydelse i ett flertal tillampningar sasom igenkanning av teckensprak och robot-inlarning fran exempel. En grundlaggande modell for en hand ar formule- rad med mellan 30 och 50 frihetsgrader vilket medfor en stor mangfald av mojliga konfigurationer med en hog grad av sjalv-overlappning, vilket leder till tvetydigheter och andra svarigheter vid automatisk igenkanning. Vidare ar det ofta av intresse att anvanda en passiv sensor, till exempel en ka- mera, for att hamta denna information. Dessa egenskaper hos handposer motiverar en robust, effektiv och konsekvent visuell formdeskriptor som somlost kan anvandas for au- tomatisk handposeigenkanning och hand-tracking. For att framja en probabilistisk modell av situationen, kan man se pa den som kontrollerad av en underliggande, dold, vari- abel. Givet observationer av situationen (hand-bilder) och features hamtade fran dem, ar det intressant att ta fram en indikation pa tillstandet av denna dolda variabel som styr skapandet av datan (hand-posen). Det ar angelaget att stu- dera dels de generativa metoder som producerar handbilder fran valdefinierade poser och dels det inversa diskriminati- va problemet dar en handpose ska kannas igen fran en bild. Centralt for bada dessa problem ar att formulera ett matt for att jamfora hogdimensionell data samt separata matt for att utvardera modeller skraddarsydda for viss data. I det har projektet evalueras tre olika prototyper av state-of-the- art deskriptorer for visuella former, vilka ofta anvands for uppskattning av mannisko- och handposer. Dessa avbild- ningar mellan hand-poserummet och feature-rummet som spanns upp av de visuella formdeskriptorerna utvarderas betraffande deras jamnhet samt deras formaga att skilja mellan olika poser. Aven deras robusthet vid brus i termer av dessa egenskaper studeras. Utifran detta ges rekommen- dationer gallande vilken typ av visuell formdeskriptor som passar vid olika tillampningar. Nya matt ar utarbetade for att kvantifiera likheter i datan samt for att ge ett prestan- damatt for dessa visuella formdeskriptorer. Utvarderingen av experimenten ger en grund for att skapa nya och for- battrade modeller for handposeigenkanning. 1
|