Exploration and Design of High-Efficiency Intelligent Visual Recognition Systems

博士 === 國立臺灣大學 === 電子工程學研究所 === 101 === Visual intelligent recognition nowadays plays an essential role in many applications such as smart automobiles, human-machine interaction, surveillance and gaming. In this dissertation, we explore high-efficient visual recognition systems which are developed wi...

Full description

Bibliographic Details
Main Authors: Yi-Min Tsai, 蔡一民
Other Authors: Liang-Gee Chen
Format: Others
Language:en_US
Published: 2013
Online Access:http://ndltd.ncl.edu.tw/handle/77533046789339225348
Description
Summary:博士 === 國立臺灣大學 === 電子工程學研究所 === 101 === Visual intelligent recognition nowadays plays an essential role in many applications such as smart automobiles, human-machine interaction, surveillance and gaming. In this dissertation, we explore high-efficient visual recognition systems which are developed with the specific machine-learning algorithm and the generic sparse reconstruction algorithm. The dissertation is divided into two parts. In the first part, we present an intelligent vision-based on-road preceding vehicle detect-and-track recognition system based on computer vision and machine learning techniques. We discuss the system requirements and specification for vision-based automotive applications. High-accurate detection is achieved via the machine learning-based method. We present an efficient knowledge-based tracking algorithm for multi-vehicle tracking tasks. Our framework is favored for versatile automotive applications, which yields above 90% detection rate in long-range and 99.1% tracking successful rate in middle-range. To achieve real-time criteria, we implement the system in VLSI. Architecture optimization is investigated to reduce hardware costs without significantly degrading the accuracy. We show an intelligent vision SoC implemented in a 40nm CMOS process. The die size is 3.0x3.1mm2. 3.01TOPS/W power efficiency and 55.6GOPS/mm2 area efficiency are achieved. The system supports at most 64 object tracking. It raises 1.62x improvement on power efficiency and at least 1.79x increase on frame rate with the proposed knowledge-based tracking processor. For Haar-like object detection, the processing efficiency is 0.327fps/MHz normalized to VGA resolution with 3.6x to 8.8x outperformance compared to the state-of-the-arts. The architecture realizes 140 meters active distance at 60fps and 60 meters at 300fps under Quad-VGA (1280x960) resolution. The chip achieves 354.2fps/W power efficiency with 69mW average power consumption. In the second part, we propose a sparse reconstruction algorithm and an architecture for generic visual recognition systems. The algorithm adopts the fundamental characteristics of signal sparsity for sparse representation (SR) of object patterns and for decoding signals via compressed sensing (CS). Both reconstruction kernels of CS and SR can be modeled as convex optimizations, which may induce high computational complexity. The quadric form is known as the LASSO equation. We then develop a generic iterative reconstruction kernel based on Homotopy-based algorithm. The method can recovery a signal from a previous reconstructed result as a starting point, called warm-start, which is suitable for signals with temporal/spatial dependency. The proposed method can rapidly reconstruct a sparse signal under several dynamic modifications. We also exploit algorithmic optimization methods for practical implementations. We then show a visual object tracking system simultaneously performs object recognition. The system is designed using the proposed sparse reconstruction algorithm. The improvement on processing time of the warm-start algorithm is also exploited. We develop a versatile universal architecture for high-dimensional sparse signal reconstruction. The chip supports high-dimensional sparse signal reconstruction for compressed sensing and sparse representation. It achieves the real-time processing capability for various visual recognition applications. The versatile signal reconstruction platform is designed in a 40nm CMOS process. The die size is 3.7x3.7mm2. It dissipates 353.3mW average power at 250MHz with 0.9V/2.5V core/IO voltage. A 4G entries/s (8Gbps) high-throughput sensing matrix generation engine is proposed. With the matrix generation engine, the chip reduces over 75% bandwidth requirement compared to loading the full sensing matrix from off-chip. It also reduces 77% total processing cycles with the matrix generation engine. A generic matrix factorization engine is proposed for solving linear algebra equations. Over 57% processing time reduction is achieved in solving linear equations via the proposed incremental matrix updating scheme. The chip achieves 401GFlops/W power efficiency with the proposed 16 multiprocessing cores. 10.4GFlops/mm2 area efficiency is also achieved. The chip supports various sparisty levels according to signal dimensions. The chip yields a 292x speedup for a surveillance video reconstruction compared to software implementations. The chip also yields over 200x improvement on computing time compared to software implementations for visual object tracking tasks. For a Gaussian-randomized arbitrary sparse signal recovery with identity basis, it achieves 1008x speedup for a signal with N=2048, M=1024, Sparsity=5%.