Summary: | The recognition approach for people (faces) and objects is general and does not use any contextual information. The algorithm is based on feature points. It has a learning stage which attempts to maximise information theoretic entropy and learns relational information between extracted feature points. The result of learning is an ordered list of three-dimensional moves in space and scale, termed <i>jumps. </i>These jumps are performed during the runtime recognition stage to differentiate between similar feature points to achieve recognition. The approach can be aptly described by the following question, “Where do I look next in space and scale to gain the most information about what I am looking at?” Closely related to localisation is navigation. Without localisation, navigation is impossible. The research in this document examines how these two are intertwined. The approach taken to tackle navigation is one which does not require the use of a globally consistent map. This removes the need to build such a map which can be computational expensive and difficult to build. An augmented reality application which is able to guide a human user from one location to another is presented. Key-frames are extracted from a training video sequence which shows the path to the destination. Local reference frames are built from pairs of key-frames and there is no necessity to have a consistent scale across local reference frames. Navigation is achieved by solving a series of smaller navigation tasks. At runtime, a live frame is localised to a local reference frame and navigation is achieved by moving from one local reference frame to the next. An iterative re-weighted least squares estimator is used for pose estimation and a Kalman filter is used as a smoothing agent to reduce the effects of jitter and outlying pose estimates. The system runs at about 5Hz on 640 x 480 frames.
|