Summary: | This thesis presents a method to efficiently recognize 3D objects from single, 2D
images by the use of a novel, probabilistic indexing technique. Indexing is a two-stage
process that includes an offline training stage and a runtime lookup stage. During training,
feature vectors representing object appearance are acquired from several points of
view about each object and stored in the index. At runtime, for each image feature vector
detected, a small set of the closest model vectors is recovered from the index and used to
form match hypotheses. This set of nearest neighbours provides interpolation between
the nearby training views of the objects, and is used to compute probability estimates
that proposed matches are correct. The overall recognition process becomes extremely
efficient when hypotheses are verified in order of their probabilities.
Contributions of this thesis include the use of an indexing data structure (the
/cd-tree) and search algorithm (Best-Bin First search) which, unlike the standard hash
table methods, remain efficient to higher index space dimensionalities. This behavior is
critical to provide discrimination between models in large databases. In addition, the
repertoire of 3D objects that can be recognized has been significantly expanded from
that in most previous indexing work, by explicitly avoiding the requirement for special-case
invariant features. Finally, an incremental learning procedure has been introduced
which extracts model grouping information from real images as the system performs
recognition, and adds it into the index to improve indexing accuracy. A new clustering
algorithm (Weighted Vector Quantization) is used to limit the memory requirements of
this continual learning process.
The indexing algorithm has been embedded within a fully functional automatic
recognition system that typically requires only a few seconds to recognize objects in
standard sized images. Experiments with real and synthetic images are presented, using
indexing features derived from groupings of line segments. Indexing accuracy is shown to
be high, as indicated by the rankings assigned to correct hypotheses. Experiments with
the Best-Bin First search algorithm show that, if it is acceptable to miss a small fraction
of the exact closest neighbours, the regime in which Кd-tree search remains efficient can be
extended, roughly from 5-dimensional to 20-dimensional spaces, and that this efficiency
holds for very large numbers of stored points. Finally, experiments with the Weighted
Vector Quantization algorithm show that it is possible to incorporate real image data
into the index via incremental learning so that indexing performance is improved without
increasing the memory requirements of the system.
|