A General-Purpose CMOS Vision Sensor with In-Pixel 5-bit Convolutional Layer Computation

Hardware accelerators for deep convolutional neural networks (CNN) commonly reduce the bit-depth of weights and input feature maps to decrease both circuit complexity and power consumption. Nevertheless, even when the input mage is codified with less number of bits, still an analog-to-digital conversion is required, being this one of the most energy-hungry parts of an image sensor datapath. In this work, a smart pixel with processing capabilities to process the first layer of a CNN is presented. In this architecture, information captured from the sensor is fed directly into the CNN accelerator, reducing the power consumption and input error, which is only limited by the analog processing circuitry on-idealities. Programmability includes stride configuration and kernel size selection between 3×3, 5×5 and 7×7. This paper provides data based on nominal simulations in standard 180 nm CMOS technology.

keywords: CMOS image sensors, convolutional neural networks (CNNs), Hardware accelerator, focal plane processing,