A SAR Pipeline ADC Embedding Time Interleaved DAC Sharing for Ultra-low Power Camera Front Ends

. The growing need for ultra-low power cameras for sensors, surveillance and consumer applications has resulted in signiﬁcant advances in compressed domain data acquisition from pixel arrays. In this journal we present a novel 64-input Successive Approximation (SAR) Pipeline analog-to-digital converter (ADC) suitable for compressed domain data acquisition in camera front-ends. The proposed architecture features a time interleaved capacitive digital-to-analog converter (DAC) shared between column parallel ADCs for area savings (2.28X); and a shared ampliﬁer stage for power savings (60%), achieving 4X through-put as compared to traditional architectures. Simulations on a 130nm foundry process shows that the proposed SAR Pipeline ADC draws 31 µ W at 2MS/s having a target Figure-of-Merit (FOM) of 87fJ/conv. per step at Nyquist rate. The proposed compressive sensing front end achieves per patch energy per patch of 0.9nJ.


Introduction
Mobile devices for IOT (Internet of Things) require CMOS image sensor (CIS) with low power and area [1].Traditional CIS for wearable devices consume power more than 50mW [2].In a CMOS image sensor system the most power consuming blocks are: digital image processing back end & column parallel ADCs [3] [4].In most of the reported image sensors, column parallel ADCs draw 50-65% of the power of the entire image sensor signal acquisition chip [5] [1].The power consumed by column parallel ADCs is proportional to the number of measurements to be performed by the ADC.It increases with the number of pixels.For next generation IoT devices like "always on" Camera based image sensors, human machine interface systems with built in machine intelligence, low power is the key enabler.
Fig. 1 shows the traditional nyquist domain signal processing.Pixel voltages are digitized using high speed column parallel ADCs.Digitized image is encoded using algorithms like discrete cosine transform (DCT), discrete wavelet transform (DWT) etc.The power budget for transmitter blocks is shown in in Fig. 2. We can observe that encoding part like DCT, DWT consumes significant amount of power followed by Analog to Digital signal acquisition etc.As the resolution of the image goes up the number of measurements per ADC goes up and hence the encoding power also increases.This places huge power constraint on acquisition device and transmitter.

Quantization
Modulator +PA Recently developed algorithms of compressive sensing (CS) promise to reduce the number of measurements with non-linear recovery at the back-end [6].The signal processing chain for compressing sensing is shown in Fig. 3.This approach makes the encoding done at the transmitter simpler by completely eliminating power hungry blocks like DCT, DWT.If the pixel values in a camera are represented as a discrete time signal T , the number of measurements needed in traditional column parallel ADCs will be equal to n.Instead of n samples, CS needs only m linear measurements (m << n).Fig. 4 shows the plot of PSNR of the recovered image w.r.to number of measurements done at receiver.We can observe that to achieve PSNR of 30dB, 250 measurements are sufficient.PSNR of 30dB is sufficient for classifying objects [11].Therefore the value of m can be as small as n/250.Therefore compressive sensing achieves significant reduction is encoding power & transmission bandwidth.The CS measurement matrix is given by Eq. 1.Here Y [m] is the m-dimensional measured array, φ is a random binary matrix of size m * n and follows the "Independent and Identically Distributed (IID)" property.X is traditionally recovered at the back-end using an optimization algorithm, like determining the L 1 norm [6].
In this paper we present a novel pipeline-SAR ADC architecture with capacitive DAC sharing with the capability of acquiring linear combinations of 64 pixel data in a single conversion cycle.This is suitable for such compressed domain data acquisition.

ADC Architectures for CS Image Acquisition
In prior work for obtaining compressed domain data, both analog and digital techniques have been used to perform compressive measurements from the raw data.Typically, analog implementations of compressed sensing require an analog to digital converter to improve accuracy & digital transmission [7] [8].Resistor based compressed sensing multiplexor reported in [9], suffers from static power dissipation and the number of inputs (n) is limited, making it suitable for RF receiver applications only.To overcome some of the disadvantages of analog CS circuits, [10] has proposed compression in the digital domain after Analog to Digital Conversion.Fig. 5. a) shows the technique proposed in [10].The entire analog signal is converted into the digital domain by high-speed ADCs and the CS encoder does compression in the digital domain.This is primarily suited for low bandwidth application like bio-medical signal processing.However, for CIS of a typical 256 * 256 size, ADCs would need to acquire all the samples and then convert to the digital domain.The number of measurements by the ADC will not be reduced and it defeats the purpose of compressed domain data acquisition.Therefore ADC power will remain the same for image acquisition.Further, the size of digital CS encoder grows exponentially with the number of inputs.CS encoders will further add significant power along with the ADC making it infeasible for "always on" imaging front-end applications.

CS MUX
Fig. 6.Simultaneous compression and quantization within [5] To overcome the limitations of data acquisition followed by compressed domain measurements, Oike et.al, has proposed a CS camera through simultaneous averaging and quantization of pixels using a Σ − ∆ ADC [5].Fig. 5. b) shows the schematic of the resetting Σ − ∆ ADC used for such linear measurements.Pixel values are multiplied with random numbers (from the φ matrix) sequentially and passed to the input of the Σ −∆ ADC.This approach requires m measurements; however it requires n conversion cycles for one measurement.This architecture requires a 16 * 16 block for linear measurement.For each measurement of the block, the resetting Σ − ∆ ADC needs 256 clock cycles.For m measurements Σ − ∆ ADC needs n*256 clock cycles.During this conversion period, all the high gain amplifiers will remain on and consume power.Hence, for lowering the total power dissipation, faster conversion with the opportunity for power gating once the conversion is complete, will be critical.
Once compressed domain data is acquired, the image is often used for online classification to detect potential trigger signals.For such in-situ classification [16] and trigger identification, 8bits of inputs are sufficient.We have plotted classification accuracy vs Bit resolution for MNIST data base in Fig. 7.We can observe that recognition accuracy becomes constant after more than 6 bits of resolution.Further, it has been shown that for most of the machine learning applications moderate resolution (6-8bits) is sufficient [1][11].Fig. 8 shows the Energy per conversion with respect to the Signal to Noise and Distortion ratio (SNDR) for state of the art SAR, Pipeline and Σ − ∆ ADCs.SNDR is related to effective number of bits (EN OB = SN DR − 1.76/6) [19].From this plot we can observe that SAR ADC has best FOM (order of 10 − 1000f J/conv) for moderate resolution (6-8 bits).Pipeline ADCs also have competitive FOM for moderate and high speed applications.Since most of the image sensors speed varies from 1MS/s to 10MS/s and we are interested in 8b of resolution for in-situ image processing applications, we propose a SAR-Pipeline ADC which achieves ultra-low power and high area efficiency.For most of the low power applications SAR ADCs are used since they consume ultra-low energy per conversion(Fig.8).However, for portable image front-end applications resolution more than 4-5 bits SAR ADC occupies huge area since Fig. 8. Energy vs SNDR for state of the art reported SAR, Pipeline and Σ − ∆ ADCs [15] the MSB capacitance grows as 2 N .Since there will be many column parallel ADCs each will have capacitance of 2 N .To alleviate this problem two-stage SAR Pipeline ADCs are proposed [20] [21].SAR-Pipeline uses two stage SAR-ADC and an amplifier which is used for amplifying residue generated by stage 1 SAR ADC (Fig. 9).Both the SAR ADC stages operates in parallel and each stage has to resolve lesser number of bits (lesser DAC settling time & capacitance (hence lesser area) as compared to traditional SAR).Therefore, SAR-Pipeline ADCs can operate at much higher speeds with high area efficiency [21].One of the inherent advantage of SAR-Pipeline is residue voltage of Stag-1 SAR ADC is generated within its DAC after conversion phase.Hence this avoids extra DAC and clock phase to generate residue of Stage-1 unlike in traditional flash based Pipelined ADCs [21].Fig. 10 shows a previously reported multi-input SAR ADC used for compressed sensing (with 8 bit resolution).It uses charge sharing.The MSB capacitor is equally divided among the inputs.Because of charge sharing the inputs will get averaged after the sampling cycle.[17] demonstrates a 4 input CS SAR ADC for wireless applications.This technique requires (2 8 + 2 4 = 272C) number of capacitors for an 8 bit ADC and measures 256 inputs ( C is the unit capacitor).One of the main limitations of the proposed SAR ADC architecture for portable imaging application is the area occupied by the sampling capacitors [18].Dividing the MSB capacitors to accommodate 256 inputs requires 256 switches.For portable applications limited supply ≈ 1 − 1.3V provides high R ON .This provides us the time constant (τ conv ) for conversion (min.sized capacitor of 50fF) of ≈ 220nsecs (DAC settling time).This allows a maximum sampling frequency of 730KHz.Hence, for high speed cameras (with 30frames/sec) the proposed ADC architecture will not be able to meet latency requirements.Further, [17] uses calibration for capacitor mismatch, a requirement for more than 6-bits of resolution.Fig. 11 is the proposed SAR-Pipeline with DAC sharing.We use 4 bit ADC as the first stage.Since 4-bit ADC has 16C capacitors, all the capacitors are divided into equal value of C and 16 inputs are applied.We have 3 instances of the same DAC which is used for accessing additional 48 inputs.Sampling is done in two phases.During sampling phase (S1) all 4 DAC's sample 16 inputs each.During second phase of sampling charge is redistributed between them.The averaged voltages across 4 DAC's during S1 phase given by Eq. 2.
During the second sampling phase S2, averaging of V dac1 to V dac4 takes place.Therefore, the final voltage across DAC is given by Eq. 3.
We can observe form Eq. 3 that the final accumulated output represents the dot-product of the input pixel vector X with the sampling matrix, φ .φ can be random or programmed so that both random as well as structured compressed measurements can be obtained.As soon as S2 is done 3 DAC's are shared with neighboring column parallel ADC.Once the conversion in 4-bit SAR ADC is complete, we amplify the residue by 4x and pass it to a 5-bit fine ADC to resolve the LSBs.Ideally a gain of 16 is required for residue amplification.We use 1-bit digital redundancy in Stage 1 and half reference scaling for Stage 2 to reduce the gain requirement which helps to reduce the power in the high-gain op-amp [20].
Since all the capacitors we use are identical and of value C, calibration is not required (more details in section III).As 3 DAC's are shared with 4 ADC's, we need an additional capacitance of 12C.With 12C extra capacitance we can acquire linear measurements of 64 inputs in each conversion cycle.This DAC shared method significantly improves area efficiency and enables simultaneous acquisition of multiple inputs.In this architecture, the conversion time-constant (τ conv ) is determined by the 4-bit ADC settling time even tough we are sampling 64-inputs.This makes the architecture suitable for high speed sensing with large number of inputs.During this period we share 3 DACs with 3 of the neighboring ADCs.S3 to S8 are sampling phases of ADC2 to ADC4.S3 to S8 phase operates during conversion period of ADC1.Pipelining facilitates overlapping of Stage-1 and Stage-2 conversion phases.1-bit redundancy is added in the first stage to accommodate capacitor mismatch and offsets of the comparator, amplifiers [20].We also share residue amplifier between two neighboring ADCs to reduce the total power [21].Accumulator (10 bit) used to average 4 consecutive ADC output samples.The accumulator is reset after every 4 sampling cycles (F s ).The sampler operating at quarter sampling rate is used to capture the averaged output.The averaged output contains random measurement of 256 inputs.Fig. 12 also shows the control logic used for proposed CS front-end ADC architecture.Global reset (RST) is used generate S1, S2 and conversion phase for ADC1.S2 phase of ADC1 is used to trigger sampling phase for neighboring column parallel ADC.This process is continued for all 4 ADCs.Falling edge of S2 phase triggers the conversion phase of individual ADCs.

Design Components
In this section, the design details of the first and the second state of the ADC are discussed.The Op-amp open loop gain (A OL ), unity gain frequency (f u ) and swing (V p − p) target based on the inter-stage gain is given in Table .1.The required values are derived as per gain error, gain bandwidth (GBW) requirement of the OTA to be within 1/2LSB of the ADC error [21].The worst case values across process corners is mentioned in the Simulated values of the Table .We can observe that, simulated values across process corners for gain, bandwidth are by a factor of two larger than required values.Fig. 15 shows the telescopic cascode OTA used as interstate amplifier.It is well suited for two stage pipeline SAR since the swing requirement is low and it has high gain bandwidth efficiency.

Stage 1 ADC and residue amplification
We use pre-amplifier with output offset compensation to limit the offset of Stage 1 SAR ADC.The residual offset (V os,res ) is given by Eq. 4.
Systematic variations has no effect of capacitor matching since all the capacitance in Stage 1 SAR ADC are equal to C. The capacitance mismatch standard deviation for metal-insulator-metal (MiM) is given by Eq. 5.
where A ∆C/C is process constant which is 1%.µ m for 0.13µm CMOS process [22].W &/ L are width and length of the capacitor.The minimum size allowed in 0.13µm is 5µm * 5µm.With minimum sized capacitor σ ∆C/C obtained will be 0.002.
As per [23] maximum allowable capacitor mismatch for a resolution of n is given by Eq. 6.
For n=9, ∆C/C max reaches close to 0.002.This shows the residue generated by first ADC will fall within the range of 1/8LSB of error.Hence the proposed architecture is robust towards capacitor mismatch.

Simulation Results
Performance of the proposed SAR-Pipelined ADC is verified through design and simulations in the 0.13µm Mixed-Mode CMOS.Fig. 17 shows the normalized output frequency spectrum of the proposed ADC for input frequency (F in ) of 248.34kHz at sampling rate (Fs) of 1MSPS.A 1024-point FFT shows SNDR of 49.5dB which is equivalent to an ENOB of 7.9.Fig. 18 shows the 64 inputs applied to ADC at each sampling cycle.Each 64 inputs corresponds to CS multiplexor output (Product of input vector with random number).Fig. 19 shows the ADC and accumulator outputs at each conversion cycles.For a particular case study, as shown in the figure, an ideal averaging without quantization results in a output of 270.11mV.The proposed ADC after accumulated 4 samples each provides an output of 269.53mV which is less than 1LSB of error.
Fig. 20 shows the SNDR of the proposed ADC from input frequency range of 0.2MHz to to 0.98MHz.The ENOB at Nyquist frequency is 7.56.This ENOB achieves Walden FOM [19] of 85fJ/conv.step.The power budget for the proposed ADC is given in The energy per patch is given by Eq. 7.
Energy P atch = P * N c F s (7) where, P is the power drawn by ADC for each conversion, N c is the number of conversion cycles & F s is the sampling frequency.The energy per patch for the proposed design is 0.9nJ.Multiple techniques are proposed to achieve high throughput in column parallel ADCs used for image sensors.Time interleaved sharing DAC technique reduces the number of measurement required by a factor of 4. Sharing the amplifier between neighboring column parallel ADCs reduces the power by 64%.The proposed architecture can be used for wearable devices with ultra-low power requirements.Our design and simulation results show 87f J/conv.step with an average power of 31µW.

Fig. 14 Fig. 14 .
Fig. 14 shows the Stage 1 of the proposed SAR-Pipeline ADC.64 inputs are acquired from S1 and S2.Residue is fed into an amplifier with gain of 4. Stage

Fig. 21
Fig.18shows the 64 inputs applied to ADC at each sampling cycle.Each 64 inputs corresponds to CS multiplexor output (Product of input vector with random number).Fig.19shows the ADC and accumulator outputs at each conversion cycles.For a particular case study, as shown in the figure, an ideal averaging without quantization results in a output of 270.11mV.The proposed ADC after accumulated 4 samples each provides an output of 269.53mV which is less than 1LSB of error.Fig.20shows the SNDR of the proposed ADC from input frequency range of 0.2MHz to to 0.98MHz.The ENOB at Nyquist frequency is 7.56.This ENOB achieves Walden FOM[19] of 85fJ/conv.step.Fig. 21 shows the DNL and INL of the proposed ADC across 256 digital codes.The worst case DNL is within 0.4LSB.INL is within 1LSB across all digital codes.

Fig. 20 .Fig. 21 .
Fig. 20.Simulation result of SNDR vs Input frequency at Fs=2MHz Even though the total power consumed from the supply is 50µW, since the amplifier is shared between two ADC, the power for individual ADC's is 31µW.The number of conversion cycles required for 16 * 16 patch size with compression ratio of 16 is (16 * 16 * 16)/64 = 64.
Table .2.The power number is w.r.to patch size of 16*16 and a compression ration (CR) of 16.

Table 2 .
Power and capacitance contribution from individual blocks 8 Comparison with Reported WorksTable. 3 shows the comparison of the proposed design with state of the art CS architecture.Proposed design is scalable and can handle a large number of inputs at the same time.Due to parallelism achieved by sharing DACs between columns parallel ADCs high energy efficiency per patch is achieved.

Table 3 .
Comparison with reported works