Summary: | Cameras are widely adopted for high image quality with the rapid advancement of complementary metal-oxide-semiconductor (CMOS) image sensors while offloading vision applications’ computation to the cloud. It raises concern for time-critical applications such as autonomous driving, surveillance, and defense systems since moving pixels from the sensor’s focal plane are expensive. This paper presents a hardware architecture for smart cameras that understands the salient regions from an image frame and then performs high-level inference computation for sensor-level information creation instead of transporting raw pixels. A visual attention-oriented computational strategy helps to filter a significant amount of redundant spatiotemporal data collected at the focal plane. A computationally expensive learning model is then applied to the interesting regions of the image. The hierarchical processing in the pixels’ data path demonstrates a bottom-up architecture with massive parallelism and gives high throughput by exploiting the large bandwidth available at the image source. We prototype the model in field-programmable gate array (FPGA) and application-specific integrated circuit (ASIC) for integrating with a pixel-parallel image sensor. The experiment results show that our approach achieves significant speedup while in certain conditions exhibits up to 45% more energy efficiency with the attention-oriented processing. Although there is an area overhead for inheriting attention-oriented processing, the achieved performance based on energy consumption, latency, and memory utilization overcomes that limitation.
|