sensors - MDPI

3 downloads 0 Views 10MB Size Report
Nov 17, 2018 - detected photon and excellent timing response, have been designed ... system, equipped with a dual-axis galvanometer scanner and a low ... Coincidence detection circuits with smaller logic could be achieved ...... 556–559.
sensors Article

A CMOS SPAD Imager with Collision Detection and 128 Dynamically Reallocating TDCs for Single-Photon Counting and 3D Time-of-Flight Imaging Chao Zhang 1, *,† , Scott Lindner 2,3,† , Ivan Michel Antolovic 3 , Martin Wolf 2 and Edoardo Charbon 3,4 1 2 3 4

* †

Quantum and Computer Engineering, Delft University of Technology, Mekelweg 4, 2628CD Delft, The Netherlands Biomedical Optics Research Laboratory, University of Zurich, Rämistrasse 71, 8006 Zürich, Switzerland; [email protected] (S.L.); [email protected] (M.W.) Advanced Quantum Architecture Laboratory, École Polytechnique Fédérale de Lausanne (EPFL), Route Cantonale, 1015 Lausanne, Switzerland; [email protected] (I.M.A.); [email protected] (E.C.) Kavli Institute of Nanoscience, 2628CJ Delft, The Netherlands Correspondence: [email protected]; Tel.: +31-1527-83663 These authors contributed equally to this work.

Received: 5 October 2018; Accepted: 15 November 2018; Published: 17 November 2018

 

Abstract: Per-pixel time-to-digital converter (TDC) architectures have been exploited by singlephoton avalanche diode (SPAD) sensors to achieve high photon throughput, but at the expense of fill factor, pixel pitch and readout efficiency. In contrast, TDC sharing architecture usually features high fill factor at small pixel pitch and energy efficient event-driven readout. While the photon throughput is not necessarily lower than that of per-pixel TDC architectures, since the throughput is not only decided by the TDC number but also the readout bandwidth. In this paper, a SPAD sensor with 32 × 32 pixels fabricated with a 180 nm CMOS image sensor technology is presented, where dynamically reallocating TDCs were implemented to achieve the same photon throughput as that of per-pixel TDCs. Each 4 TDCs are shared by 32 pixels via a collision detection bus, which enables a fill factor of 28% with a pixel pitch of 28.5 µm. The TDCs were characterized, obtaining the peak-to-peak differential and integral non-linearity of −0.07/+0.08 LSB and −0.38/+0.75 LSB, respectively. The sensor was demonstrated in a scanning light-detection-and-ranging (LiDAR) system equipped with an ultra-low power laser, achieving depth imaging up to 10 m at 6 frames/s with a resolution of 64 × 64 with 50 lux background light. Keywords: single-photon avalanche diode; SPAD; time-of-flight; dynamic reallocation; time-to-digital converter; collision detection bus; image sensor; light detection and ranging; LiDAR

1. Introduction The demand for 3D imaging systems is growing rapidly, with applications such as facial recognition, robotics, bioimaging, and LiDAR. One of the widely used ranging approaches is triangulation, which, in current implementations, is limited in range due to the base-line parameters [1]. The time-of-flight (TOF) technique, on the other hand, measures the delay between the emitted and backscattered light directly (dTOF) or indirectly (iTOF) [2–10]. The iTOF is based on per-pixel photon-demodulators which achieves high resolution but over relatively short ranges, usually within 20 m [2–4]. Whilst dTOF has some key advantages in comparison with iTOF, e.g., longer range, and

Sensors 2018, 18, 4016; doi:10.3390/s18114016

www.mdpi.com/journal/sensors

Sensors 2018, 18, 4016

2 of 19

multi-echo detection [5–10]. The implementation of dTOF involves the capability of resolving the arrival time of reflected photons, which requires photodetectors with a high gain. Single-photon avalanche diodes (SPADs) with the ability to produce digital pulse from a single detected photon and excellent timing response, have been designed for various applications, including TOF imaging. With the single-photon sensitivity, one SPAD can be coupled with a TDC, thus large format of pixel arrays can be designed to perform high spatial resolution 3D imaging [5,11,12]. However, with this per-pixel TDC architecture, the fill factor is limited due to the large area occupancy of the TDC circuitry, which reduces the photon detection efficiency. For instance, Ref. [11] reports a large SPAD array (160 × 128) with low fill factor (1%) in a large pixel pitch of 50 µm. On the other hand, in the readout stage, to avoid the data transmission of the pixel addresses, all the TDCs will be output sequentially regardless of data validity, which reduces the readout efficiency both in time and energy due to the null events communication. An event-driven readout method has been presented in [12], which skips the null events by pre-processing the data with a high speed pipeline before readout. Whereas the total output bandwidth of 42 Gbps was achieved at the cost of average power of 8.79 W, which meant the system required a liquid cooling system. In fact, due to the digitization of photons, a large volume of data can be generated then read off-chip via bandwidth limited IOs. In most situations, instead of the TDC number, the IO bandwidth is the major limitation to the photon throughput [5,8,11–13]. Therefore, to avoid the distortion due to photon pileup, the pixel activity must be restricted to 1–3%. In this case, only a small proportion of the TDCs can be triggered, which implies the per-pixel TDC architecture is not the optimal option in terms of the readout efficiency. In this paper, we report on the design and characterization of a SPAD sensor featuring a TDC sharing architecture that performs dTOF imaging for low light level applications, e.g., fluorescence lifetime imaging microscopy (FLIM), indoor facial recognition. It features an array of 32 × 32 pixels, where each 32 pixels in one column share 4 TDCs via a timing line respectively. A collision detection bus is used to detect two or more SPAD events. TDCs are shared in a dynamic reallocation scheme to detect events sequentially. The TDC number is determined based on the analysis of the IO readout bandwidth to achieve the same photon throughput as that of per-pixel TDC architecture. The readout efficiency is improved with an event-driven readout method that only the valid events will be transmitted off-chip, thus enabling a photon throughput of 222 million counts per second (Mcps) and 465 Mcps in photon timestamping and photon counting modes, respectively. The sensor was firstly demonstrated with flash imaging that achieved millimetric depth resolution. To extend the spatial resolution, the sensor was measured and characterized in a 2D scanning LiDAR system, equipped with a dual-axis galvanometer scanner and a low power 637 nm diode laser (with average and peak power of 2 mW and 500 mW, respectively). In this setup, all the pixels were combined as one component for TCSPC detection. Instead of transmitting every event to the computer, the histogram of each point was constructed in the field-programmable gate array (FPGA) and transmitted through a universal serial bus 3 (USB3) for final distance calculation and image display, thus reducing the required transmission bandwidth. Real-time and accurate range images were obtained with a resolution of 64 × 64 pixels, at 6 frames/s, and within a range of 10 m on a 40% reflectivity target. Distance measurements up to 50 m revealed 6.9 cm non-linearity and 0.62 cm precision. Meanwhile, to improve the background light suppression, a new sensor architecture is proposed based on the collision detection bus. Coincidence detection circuits with smaller logic could be achieved to enable a large pixel array for high background light applications, such as automotive LiDAR. The paper is organized as follows: the sensor architecture is described in detail in Section 2. Section 3 shows the sensor characterization and the experimental results. The proposed sensor architecture is discussed in Section 4. Finally, conclusions are drawn in Section 5.

Sensors 2018, 18, 4016 Sensors 2018, 18, x FOR PEER REVIEW

3 of 19 3 of 19

2. Sensor Design 2.1. Sensor Architecture The block diagram of the sensor is illustrated in Figure 1, where the pixel, collision detection bus and address also applied in [14], butbut with a different readout scheme and TDC addresslatching latchingmechanism mechanismare are also applied in [14], with a different readout scheme and architecture. The pixel array is connected to the timing circuitry with shared bus architecture [15,16]. TDC architecture. The pixel array is connected to the timing circuitry with shared bus architecture In this instance, all 32 pixels eachin column share ashare singlea single address bus and a timing line. The [15,16]. In this instance, all 32in pixels each column address bus and a timing line.pixel The addresses are coded in a collision detection approach where collision events lead to an invalid pixel addresses are coded in a collision detection approach where collision events lead to an address invalid output, allowing collisions to be identified. Due to Due the TDC architecture, each event address thus output, thus allowing collisions to be identified. to thesharing TDC sharing architecture, each occupies the busthe forbus a set period, the bus the dead time. Totime. reduce duration, a monostable circuitry event occupies for a set period, bus dead Tothis reduce this duration, a monostable is included in the pixel structure. Furthermore, the shared architecture also implies that very circuitry is included in the pixel structure. Furthermore, the shared architecture also implies thatnoisy very SPADs could could occupy the bus large periods, thus reducing the sensitivity of theofcolumn to real noisy SPADs occupy thefor bus for large periods, thus reducing the sensitivity the column to photon arrivals. Therefore, a set of rowofand masking shift registers were implemented to shut real photon arrivals. Therefore, a set rowcolumn and column masking shift registers were implemented down to the DCR level. to shutnoisy downpixels noisyaccording pixels according to the DCR level. Column Masking

ADDR[0:6]

TIMING

VPU

Row Masking

SPAD[0]

. .

.

.

.

.

.

32 × 32 SPADs

.

.

.

.

.

.

. . SPAD[31]

Data bus EN1 Addr_Latch_1

ALT1

ALT1

ALT1

ALT1

TDC1 EN2 Addr_Latch_2

ALT2

ALT2

ALT2

ALT2

TDC2

32 × 4 ALT bank ALT3

ALT3

EN3

ALT3

ALT3

Addr_Latch_3 TDC3

EN4 Addr_Latch_4

ALT4

ALT4

ALT4

ALT4

TDC4

ALT Selector

Readout

Readout

Readout

Readout

32 Readout channels

FIFO

Serializer

Counting Mode Enable

32 × 160 MHz output pads

Figure 1. Image sensor architecture.

At bank of of 128 128 address-latch-and-TDC address-latch-and-TDC (ALTDC) (ALTDC) functional At the the bottom bottom of of the the columns, columns, aa bank functional blocks blocks are Instead of of aa fixed are used used to to capture capture the the pixel pixel address address and and to to measure measure photon photon arrival arrival time. time. Instead fixed pixel pixel to to TDC approach was implemented where events are TDC connection connectionarchitecture, architecture,a adynamically dynamicallyreallocating reallocating approach was implemented where events distributed sequentially among 4 TDCs to improve the the detection throughput. TheThe TDCs employ an are distributed sequentially among 4 TDCs to improve detection throughput. TDCs employ architecture based on a ring oscillator, the frequency of which is set via an external voltage. In this case, an architecture based on a ring oscillator, the frequency of which is set via an external voltage. In this the TDC a 12-bit ring-oscillator is operated at 2.5 GHz case, the has TDC has a range 12-bitwhere range the where the ring-oscillator is operated at to 2.5achieve GHz toa least-significant achieve a leastbit (LSB) of 50 ps. significant bit (LSB) of 50 ps. Time-of-flight Time-of-flight data data is is read read from from each each column column with with aa dedicated dedicated readout readout block, block, which which serializes serializes data and streams it off-chip via a column-wise 160 MHz data pad. Each one works in an event-driven data and streams it off-chip via a column-wise 160 MHz data pad. Each one works in an event-driven readout a readout approach approach such such that thatonly onlythe theALTDCs, ALTDCs,which whichhave havedetected detectedphotons photonswill willbebereadout readoutthrough through a tri-state bus. The data is firstly pushed into a first-in-first-out (FIFO) block and then a serializer

Sensors 2018, 18, 4016

4 of 19

tri-state bus. The data is firstly pushed into a first-in-first-out (FIFO) block and then a serializer reads the events out in UART format. Compared with the frame based readout method in [11] that reads all the TDCs regardless of data validity and the power-hungry pipelined datapath readout method in [12], Sensors 2018, 18, x FOR PEER REVIEW 4 of 19 a higher energy efficiency is achieved with our approach. The readout can operate in either photon timestamping or photon counting (PC) modes.with In the PT frame mode, both the TOF information reads(PT) the events out in UART format. Compared based readout method in [11] that and pixel reads all the TDCs regardless of data validity and the power-hungry pipelined datapath readout address is read out from the sensor. A transmitted event comprises 23 bits including 1-bit start flag, method in [12], a higher energy efficiency is achieved with our approach. The readout can operate in 2-bit TDC identification number, 12-bit TDC code, 7-bit address code, and 1-bit stop flag. While in PC either photon timestamping (PT) or photon counting (PC) modes. In PT mode, both the TOF mode, theinformation sensor only 1-bit start address code, andevent 1-bit stop flag andtransmits pixel address is read outflag, from7-bit the sensor. A transmitted comprises 23 that bits the data includingto 1-bit TDC identificationphoton number, 12-bit TDC code, code,465 and Mcps can length is reduced 11start bits.flag, As2-bit such, a maximum throughput of7-bit 222 address Mcps and 1-bit in PC mode, the sensor only transmits 1-bit start flag, 7-bit address code, and 1be achieved instop PT flag. andWhile PC mode, respectively. bit stop flag that the data length is reduced to 11 bits. As such, a maximum photon throughput of 222 Mcps and 465 Mcps can be achieved in PT and PC mode, respectively.

2.2. Pixel Schematic and Collision Detection Bus

2.2. Pixel Schematic aand Collision Detection Busstructure reported in [17]. In order to achieve both high The sensor employs SPAD with a p-i-n The sensor employs a SPAD with a p-i-n structure reported [17]. In to achieve both high PDP and fill factor, a cascoded quenching circuit, Figure 2, isinused toorder allow the SPAD to operate at PDP and fill factor, a cascoded quenching circuit, Figure 2, is used to allow the SPAD to operate at excess bias voltages up to 5.2 V without exceeding the 3.6 V reliability limit across the gate-source, excess bias voltages up to 5.2 V without exceeding the 3.6 V reliability limit across the gate-source, gate-draingate-drain and drain-source junctions of of any [18].Since Since technique only uses transistors, and drain-source junctions anydevice device [18]. thisthis technique only uses transistors, the layout the is very achieving an overall of28% 28% with a pixel layoutdense, is very dense, achieving an overallfill fillfactor factor of with a pixel pitch pitch of 28.5 of µ m.28.5 µm.

VOP = VBD + VEX

VDD M5

M6

MASK VM

To Bus Lines M9

VA

M12

3.6V

M1

M10

VCLAMP M2

D1

VQ

M7

M11

VX 0.2V

M4

M3

MASK

M8 GND

Figure 2. Pixel schematic cascaded quenching. Figure 2. Pixel schematic with with cascaded quenching. In the pixel, quenching passive quenching and recharge is controlledby byvoltage voltage VQ, VQ, which In the pixel, passive and recharge is controlled whichis istypically typically biased biased at 0.8 V leading to a 50 ns dead time. Noisy pixels are disabled with transistors M3, M4 and at 0.8 V leading to a 50 ns dead time. Noisy pixels are disabled with transistors M3, M4 and M6. M6. If voltage MASK is set as low, M3 operates in cut off region and the impedance is typically at the If voltage level MASK is set asthus low, M3 operates in from cut off region However, and the the impedance is typically at the of giga ohm, preventing the SPAD recharging. leakage current from SPAD may thus accumulate at the anode increase therecharging. voltage of VA over the tolerant Over time level of giga ohm, preventing the and SPAD from However, thelimit. leakage current from could cause at M1the to breakdown. Toincrease ensure thethe safety of the pixel, a parallel transistor M4, limit. with itsOver time SPAD maythis accumulate anode and voltage of VA over the tolerant gate biased at 0.2 V, is used to provide a lower impedance path to drain out the leakage current and this could cause M1 to breakdown. To ensure the safety of the pixel, a parallel transistor M4, with to prevent VA from increasing. Furthermore, a diode D1 clamps VX at a safe voltage VCLAMP, its gate biased at 0.2 V,V,istoused to M3 provide lower drain out the leakage current normally at 1.8 protect and M4afrom highimpedance voltage in anypath case.to A configurable monostable circuit VA comprising M9, M10, M11 and a NOR gate implemented to reduce pulsevoltage durationVCLAMP, and to prevent from increasing. Furthermore, a was diode D1 clamps VX atthe a safe time. simulations indicate that pulse in the 0.4–5.5 can be achieved normally at 1.8Post-layout V, to protect M3 and M4 from highwidths voltage in region any case. A ns configurable monostable through adjustment of VM. As such, the column is only occupied by one firing pixel for a short time, circuit comprising M9, M10, M11 and a NOR gate was implemented to reduce the pulse duration time. which allows photons from multiple pixels to be detected during the same cycle. Post-layout simulations that pulse widthsinineach thecolumn region can be The collision indicate detection bus was implemented to 0.4–5.5 share the ns address linesachieved between through 32 pixels, which enables event collision detection when two or more pixels fire simultaneously adjustment of VM. As such, the column is only occupied by one firing pixel for a short[15]. time, which The diagram the bus ispixels shown to in Figure 1, whereduring 7-bit address lines ADDR allows photons from of multiple be detected the same cycle.[0:6] are connected to all the ALTDCs for address latching, and the TIMING line is shared by 4 TDCs for the conversion The collision detection bus was implemented in each column to share the address lines between 32 pixels, which enables event collision detection when two or more pixels fire simultaneously [15]. The diagram of the bus is shown in Figure 1, where 7-bit address lines ADDR [0:6] are connected to all the ALTDCs for address latching, and the TIMING line is shared by 4 TDCs for the conversion start triggering. Collision detection is achieved by implementing the pixel address in a winner-take-all

Sensors 2018, 18, 4016

5 of 19

(WTA) circuit such that each code has three ‘1’s and four ‘0’s, as is shown in Table 1. Since each pixel pulls down a different combination of address lines, if two or more pixels firing within the same pull down period, invalid addresses with more than three ‘1’s will be generated and distinguished. Table 1. Address code table for 32 pixels. SPAD

ADDR

SPAD

ADDR

SPAD

ADDR

SPAD

ADDR

0 1 2 3 4 5 6 7

1110000 1100100 1100001 1100010 1101000 1001100 1001001 1001010

8 9 10 11 12 12 14 15

1000011 1000101 1000110 1010100 1010001 1010010 1011000 0011100

16 17 18 19 20 21 22 23

0011001 0011010 0010011 0010101 0010110 0000111 0001011 0001101

24 25 26 27 28 29 30 31

0001110 0101100 0101001 0101010 0100011 0100101 0100110 0110100

2.3. Dynamic Reallocation of Time-to-Digital Converters Since the detected photons have to be read off-chip for processing, the IO bandwidth determines the maximum photon throughput, so as the number of the photons that can be read out in each cycle. In this case, a per-pixel TDC architecture exhibits low efficiency in both power consumption and area occupancy, due to the sparse photon detection [13]. To improve the fill factor, instead of a per-pixel TDC architecture, a TDC sharing scheme is employed in this design. Nevertheless, the TDC bank is sized to achieve the same detection efficiency as that of per-pixel TDC architecture. Assuming the activity of each pixel is statistically independent, the light incident can be modeled with a Poisson distribution, given by Equation (1): PN (k) =

k Ncolumn ∗ exp− Ncolumn k!

(1)

where PN (k) is the probability of a number of k events detected in the column, while Ncolumn represents the average event rate of one column in one detection cycle. As each column is read out via a GPIO at 160 MHz and the event data length is 23 bits, thus a Ncolumn of 0.17, 0.34, 0.69 and 1.39 can be obtained at 40, 20, 10 and 5 MHz illumination frequency, respectively, which covers the complete TDC dynamic range. The probability distribution and cumulative distribution of PN (k) is shown in Figure 3. We can see that more than 95% of the events can be detected with only three TDCs per column in all the cases.

(a)

(b)

Figure (a) Poisson distribution cumulative distributionofofPPN(k) (k)at atcolumn column activity activity of Figure 3. (a)3.Poisson distribution andand (b)(b) cumulative distribution of 0.17, 0.17, N 0.34, 0.34, 0.69 0.69 and and 1.39.1.39.

TDC sharing architectures have been implemented in some works [16], where one TDC is shared or multiplexed with a set of pixels. This limits the photon throughput due to the fact that one pixel firing will occupy the TDC and prevent other pixels in the same cluster to detect photons. In this paper,

Sensors 2018, 18, x FOR PEER REVIEW Sensors 2018, 18, x FOR PEER REVIEW Sensors 2018, 18, 4016

6 of 19 6 of 19 6 of 19

firing will occupy the TDC and prevent other pixels in the same cluster to detect photons. In this firing will occupy the TDC and prevent other pixels in the same cluster to detect photons. In this paper, we propose a new TDC sharing architecture that dynamically reallocates 4 ALTDCs in one paper, we propose a new TDC architecture sharing architecture that dynamically in one we propose a new TDC sharing that dynamically reallocatesreallocates 4 ALTDCs 4inALTDCs one column for column for address latching and TOF measurement. column for address and TOF measurement. address latching andlatching TOF measurement. The block diagram of ALT is shown in Figure 4. The idea is to connect ALTDCs in a daisy chain The block diagram of ALT is shown in Figure 4. The idea is to connect ALTDCs in a daisy chain approach and each ALTDC is enabled sequentially. At any time only one ALTDC is enabled for approach and andeach eachALTDC ALTDC is enabled sequentially. At time any only time one onlyALTDC one ALTDC is enabled for approach is enabled sequentially. At any is enabled for address address latching and TOF detection, and each ALTDC is enabled by the previous block and then reset address latching and TOF detection, andALTDC each ALTDC is enabled byprevious the previous block reset latching and TOF detection, and each is enabled by the block andand thenthen reset by by subsequent block after data readout. As such, the ALTDCs are enabled and reset in sequence, by subsequent block readout. As such, the ALTDCs are enabled andinreset in sequence, subsequent block afterafter datadata readout. As such, the ALTDCs are enabled and reset sequence, driven driven by the column photon detection. TDC conversion is stopped by the signal STOP which is driven by the column photon detection. TDC conversion is by stopped by the signal STOP whichby is by the column photon detection. TDC conversion is stopped the signal STOP which is shared shared by the whole TDC array and synchronized with the laser clock. However, to prevent the entire shared by the whole TDC array and synchronized with the laser clock. However, to prevent the entire the whole TDC array and synchronized with the laser clock. However, to prevent the entire ALTDC ALTDC chain being reset by detection of four events in one cycle, there is always one ALTDC keeping ALTDC chain being reset by detection of fourinevents in one cycle, there isone always one keeping ALTDC inactive, keeping chain being reset by detection of four events one cycle, there is always ALTDC inactive, which limits the maximum number of photons that can be detected in one cycle to be 3. inactive, which the maximum number ofthat photons that can be in detected in one which limits thelimits maximum number of photons can be detected one cycle to becycle 3. to be 3. STOP STOP ALT_EN ALT_EN

ALT_RESb ALT_RESb

ALTDC1 ALTDC1

ALT_EN ALT_EN

ALT_RESb ALT_RESb

ALTDC2 ALTDC2 ALT_EN ALT_EN

ALT_RESb ALT_RESb

ALTDC3 ALTDC3 ALT_EN ALT_EN

ALT_RESb ALT_RESb

ALTDC4 ALTDC4

Figure 4. ALTDC daisy chain block diagram. Figure 4. ALTDC daisy chain block diagram.

7 7

D

Q

D

Q

G_RSTb G_RSTb CLR

7 7

ADDR_L ADDR_L

D

Q D CLR Q

EOC and Reset circuit EOC and Reset circuit D CLR Q D CLR Q

ALT_EN ALT_EN

VALID VALID STOP STOP

Serializer Serializer

D CLR Q D CLR Q

G_RSTb G_RSTb CLR

EOC EOCSYS_CLK SYS_CLK

FIFO FIFO

D

Q D CLR Q

EOC EOC

D CLR Q D CLR Q

Address latches Address latches ADDR ADDR

Tri_EN

TDC TDC

Tri_EN Tri_EN

ALT_EN ALT_EN ALT_RSTb ALT_RSTb

STOP ALT_EN STOP TIMING ALT_EN ALT_RSTb TIMING EOC ALT_RSTb EOC

O_ADDR O_ADDR

ALTDC ALTDC

12 4 12 4

4 4

O_TDC O_TDC

ADDR ADDR

7 7

TIMING TIMING

A simplified simplified ALTDCschematic schematic is shown Figure 5, which is enabled by ALT_EN. Since inin Figure 5, which is enabled by ALT_EN. Since the A simplifiedALTDC ALTDC schematicisisshown shown in Figure 5, which is enabled by ALT_EN. Since the load capacitance inADDR the ADDR and TIMING lines are different, which is due mainly due to the load capacitance in the and TIMING lines are different, which is mainly to the different the load capacitance in the ADDR and TIMING lines are different, which is mainly due to the different WTA circuit connection of thethe pixels, the propagation these lines have woulda WTA circuit connection pattern ofpattern the pixels, propagation delay ondelay theseon lines would different WTA circuit connection pattern of the pixels, the propagation delay on these lines would have a certain skew, so if the event addresses are latched synchronously at the rising edge of the certain if theso event addresses are latched at the rising edge of the TIMING have a skew, certainsoskew, if the event addresses are synchronously latched synchronously at the rising edge of the TIMING signal, addresses incorrect addresses could bedue captured the time skew and insufficient flip-flop signal, incorrect could be captured to the due timeto skew and insufficient flip-flop setup time. TIMING signal, incorrect addresses could be captured due to the time skew and insufficient flip-flop setup time.prior Therefore, prior to the ADDR latching, addresses are firstly captured with a setlogic of fast Therefore, to the ADDR latching, addresses are firstly captured with a set of fast dynamic in setup time. Therefore, prior to the ADDR latching, addresses are firstly captured with a set of fast dynamic logic in synchronization with the rising edge of TIMING signal, where the correct addresses synchronization the rising edge TIMING signal, thesignal, correctwhere addresses can beaddresses captured dynamic logic in with synchronization withofthe rising edge of where TIMING the correct canlong be captured as long asisthe time skew isthe smaller than the After pulse that width. After that the dynamic logic as as the time skew smaller than pulse width. the dynamic logic outputs are can be captured as long as the time skew is smaller than the pulse width. After that the dynamic logic outputs are latched at the falling edge of TIMING signal. With this method, the timing margin can be latched the fallingatedge of TIMING With signal. this method, the method, timing margin can be extended to outputsat are latched the falling edge signal. of TIMING With this the timing margin can be extended to thewidth, entire which pulse enables width, which enables shorter pulseson to the be bus, detected on the bus, thus the entire pulse shorter pulses to be detected thus reducing the bus extended to the entire pulse width, which enables shorter pulses to be detected on the bus, thus reducing bus dead time. dead time.the reducing the bus dead time.

12 Tri_EN12 12 12

7 7

Tri_EN Tri_EN7

ALT_RSTb ALT_RSTb

7

7 7

ALTDC selector ALTDC selector

IO_CLK IO_CLK SYS_CLK SYS_CLK

Figure 5. 5. ALT ALT schematic schematic with with the the main main functionality functionality circuit Figure circuit and and interface interface to to the the readout readout block. block. Figure 5. ALT schematic with the main functionality circuit and interface to the readout block.

Sensors 2018, 18, 4016 Sensors 2018, 18, x FOR PEER REVIEW

7 of 19 7 of 19

The ALTDC ALTDCoperation operation timing diagram, associated photon detection by ALTDC is The timing diagram, associated withwith photon detection by ALTDC is shown shown in6,Figure is initialized afterreset, global When pixela detects photon, T1 in in Figure which6,iswhich initialized after global T0.reset, WhenT0. one pixelone detects photon,aT1 in Figure 6, 6, it agenerates a short pulse online, TIMING line,edge the rising the triggers pulse then the itFigure generates short pulse on TIMING the rising of the edge pulseofthen the triggers conversion conversion Attime, the same time, ALT_EN thethe reset of the dynamic logic, which of TDC.of AtTDC. the same ALT_EN deasserts deasserts the reset of dynamic logic, which enables enables capturing address capturing ADDR bus. At the falling edge ofT2, TIMING, T2, ALT_EN rises to address on ADDRon bus. At the falling edge of TIMING, ALT_EN rises to logic high, logic high, which (1) enables ALTDC for photon detection; (2) latches the address to ADDR_L; which (1) enables ALTDC for photon detection; (2) latches the address to ADDR_L; (3) triggers (3) triggers VALID signal to begin event-driven readout process. At the endT3, ofTDC the cycle,isT3, TDC VALID signal to begin event-driven readout process. At the end of the cycle, stopped by is stopped by the rising edge of STOP, and signal EOC is generated to indicate the availability of the rising edge of STOP, and signal EOC is generated to indicate the availability of events, latching events, latching the value address TDC for value into The registers forblock readout. The readout block is the address and TDC intoand registers readout. readout is synchronized with clock synchronized with which is phase alighted with STOP to make sure the EOC signal SYS_CLK, which is clock phaseSYS_CLK, alighted with STOP to make sure the EOC signal can be sampled correctly. canthe berising sampled At the edge on of EOC, SYS_CLK,Tri_EN T4, depending on EOC, Tri_EN is At edgecorrectly. of SYS_CLK, T4, rising depending is asserted to enable ALTDC asserted to through enable ALTDC read out through two shared tri-state Meanwhile, buses, O_ADDR and O_TDC. to read out two sharedtotri-state buses, O_ADDR and O_TDC. ALT_RSTb is Meanwhile, ALT_RSTb is asserted reset TDC, EOC andreleased ALT_EN. While theynext are asserted to reset TDC, EOC andto ALT_EN. While they are from reset at the released from reset at the edge SYS_CLK, and the dataare onregistered O_ADDRinto andthe O_TDC rising edge of SYS_CLK, T5,next andrising the data onof O_ADDR andT5, O_TDC buses FIFO buses areoutput. registered into the FIFO for serial output. for serial The minimum minimum time time interval interval between between photons that can be latched is limited by two factors: The ADDR/TIMING pulse width and propagation propagation delay delay of of ALTDC ALTDC chain. chain. Because Because of of the load capacitance capacitance ADDR/TIMING mismatch between between TIMING TIMING and and ADDR ADDR buses, buses, pulse pulse skew skew and and non-uniformity non-uniformity make make it it difficult difficult to to mismatch latch correct addresses when using short pulses. A minimum photon interval of 1.2 ns is obtained latch correct addresses when using short pulses. minimum 1.2 ns is obtained from post-layout post-layout simulation. simulation. To improve improve the the readout readout efficiency, efficiency, event-driven event-driven readout readout method method was was from implemented, where only the ALTDCs that detect photons will be read out. No power is dissipated implemented, where only the ALTDCs that detect photons will be read out. No power is dissipated communicatingnull null events, events, where where no no photon photon impinges, impinges, which which is is typically typically the the case case for for TDC TDC per-pixel per-pixel communicating architectures [11]. On excellent scalability that anyany length of the architectures Onthe theother otherhand, hand,this thisapproach approachshows shows excellent scalability that length of daisy chain can be built by simply cascading ALTDCs, without changing any signals except Tri_EN the daisy chain can be built by simply cascading ALTDCs, without changing any signals except and EOC forEOC the for readout, which reduces the complexity of building larger arrays bus sharing Tri_EN and the readout, which reduces the complexity of building largerinarrays in bus architectures. sharing architectures. T0

T1 T2

T3

T4

T5 STOP SYS_CLK TIMING

127

15

127

0

100

0

ADDR 0

15

TDC ADDR_L ALT_EN ALT_EN EOC ALT_RSTb Tri_EN

0

15

O_ADDR

0

100

O_TDC

Figure 6. 6. ALTDC ALTDCtiming timingdiagram diagramassociated associatedwith withphoton photondetection detectionby byALTDC. ALTDC. Figure

The on aon differential four stage ring oscillator shown in Figure 7. Synchronizers TheTDC TDCis based is based a differential four stage ring (RO), oscillator (RO), shown in Figure 7. were designed to reduce the metastability when the asynchronous signals TIMING and BUSY switch Synchronizers were designed to reduce the metastability when the asynchronous signals TIMING from ‘0’ to switch ‘1’ [15].from A thick NMOS transistor M1 is used to regulate the voltage supply for the and BUSY ‘0’ tooxide ‘1’ [15]. A thick oxide NMOS transistor M1 is used to regulate the voltage ring-oscillator to mitigate against frequency variations due to IR drops in the ALTDC array. A 9-bit supply for the ring-oscillator to mitigate against frequency variations due to IR drops in the ALTDC counter connected to the RO clock operates at 2.5 GHz, which provides coarse resolution of 400 ps. array. A 9-bit counter connected to the RO clock operates at 2.5 GHz, which provides coarse A phase discriminator resolves the 8-bit thermometer coded phases andcoded converts them a 3-bit resolution of 400 ps. A phase discriminator resolves the 8-bit thermometer phases andtoconverts binary code, leading to a fine resolution of 50 ps. The 128 column TDCs sharing one common control them to a 3-bit binary code, leading to a fine resolution of 50 ps. The 128 column TDCs sharing one voltage VBIAS are voltage externally biased, process-voltage-temperature (PVT) compensation can be common control VBIAS arewhere externally biased, where process-voltage-temperature (PVT) implemented chip via an on-chipoff replica RO.an on-chip replica RO. compensationoff can be implemented chip via

Sensors 2018, 2018, 18, 4016 x FOR PEER REVIEW Sensors Sensors 2018, 18, 18, x FOR PEER REVIEW

of 19 19 88 of

ALT_RSTb ALT_RSTb D D

ALT_EN ALT_EN

CLR CLR

ALT_RSTb ALT_RSTb BUSY BUSY

Q Q

D CLR Q D CLR Q

Synchronizer Synchronizer

STOP STOP

1.8 V 1.8 V

VBIAS VBIAS

M1 M1 VDD_TDC VDD_TDC

TDC_ENb TDC_ENb TDC_EN TDC_EN ALT_RSTb ALT_RSTb

ENb ENb EN EN RSTb RSTb

-

TDC_EN TDC_EN TDC_ENb TDC_ENb

Synchronizer Synchronizer

TIMING TIMING

+ +

STOP_REG STOP_REG

+ +-

+ + -

+ +-

+ + -

+ +-

RO_BUF RO_BUF + + -

+ +-

8b 8b

Ring Oscillator Ring Oscillator

9-bit Counter 9-bit Counter 3-bit Phase Decoder 3-bit Phase Decoder

VDD_TDC VDD_TDC

ENb ENb

ENb ENb

OUTOUTIN+ IN+

OUT+ OUT+ ININEN EN

Figure ring oscillator. oscillator. Figure 7. 7. TDC TDC schematic schematic based based on on aa four-stage four-stage differential differential ring oscillator.

2.4. Chip Realization 2.4. Chip Realization The sensor was fabricated in a a180 nm CMOS technology, andand a microphotograph of the chip with The sensor sensor was was fabricated fabricated in in a 180 180 nm nm CMOS CMOS technology, technology, and aa microphotograph microphotograph of of the the chip chip dimension of 5 mm × 2 mm ismm shown in Figure 8. An array of 32 ×of3232pixels was implemented, where with dimension of 5 mm × 2 is shown in Figure 8. An array × 32 pixels was implemented, with dimension of 5 mm × 2 mm is shown in Figure 8. An array of 32 × 32 pixels was implemented, 4where pixels are notare connected to the main array array and only used for SPAD characterization purposes. where 44 pixels pixels are not not connected connected to to the the main main array and and only only used used for for SPAD SPAD characterization characterization purposes. purposes.

Figure 8. Chip Chip microphotograph. Figure Figure 8. 8. Chip microphotograph. microphotograph.

3. Measurement Results 3. 3. Measurement Measurement Results Results 3.1. Chip Pixel Characterization 3.1. 3.1. Chip Chip Pixel Pixel Characterization Characterization The SPAD in this design stands as one of the best CMOS SPADs in terms of DCR, yield and PDP SPAD in stands one best CMOS terms of PDP so farThe reported The breakdown was measured at 22 V. in DCR measurement at 5and V excess The SPAD[19–23]. in this this design design stands as as voltage one of of the the best CMOS SPADs SPADs in terms of DCR, DCR, yield yield and PDP so far reported [19–23]. The breakdown voltage was measured at 22 V. DCR measurement at 55 V bias voltage of the whole The arraybreakdown is shown involtage Figure was 9, where the median is 113 cps with an at active so far reported [19–23]. measured at 22 value V. DCR measurement V excess bias voltage of the whole array is shown in Figure 9, where the median value is 113 cps with 2 2 ◦ C. area of bias 231 µm , which corresponds DCR density of 0.49 cps/µm at the temperature excess voltage of the whole arraytoisashown in Figure 9, where the median value is 113 of cps20with 2, which corresponds to a DCR density of 0.49 cps/µ m2 at the temperature of an of µµ m 2 at the Furthermore, DCR is achieved, than them SPADs have a DCR less an active active area areahigh of 231 231 m2uniformity , which corresponds to where a DCRmore density of 94% 0.49 of cps/µ temperature of 20 °C. Furthermore, high DCR uniformity is achieved, where more than 94% of the SPADs have than No obvious was observed with 25 nsmore dead than time 94% at 5 Vofexcess bias voltage. 20 °C.1 kcps. Furthermore, highafterpulsing DCR uniformity is achieved, where the SPADs have aa DCR less than 1 kcps. No obvious afterpulsing was observed with 25 ns dead time at 5 V excess This in agreement with where the afterpulsing of the same device an DCRisless than 1 kcps. No[17], obvious afterpulsing was observed with 25 nswas deadmeasured time at 5at V 0.08% excessatbias bias voltage. This is in agreement with [17], where the afterpulsing of the same device was measured excess voltage of 4 V. Notably, the result in [17] was also achieved without an integrated quenching voltage. This is in agreement with [17], where the afterpulsing of the same device was measured at at 0.08% 0.08% at at an an excess excess voltage voltage of of 44 V. V. Notably, Notably, the the result result in in [17] [17] was was also also achieved achieved without without an an integrated integrated

Sensors2018, 2018,18, 18,4016 x FOR PEER REVIEW Sensors Sensors 2018, 18, x FOR PEER REVIEW

of19 19 99of 9 of 19

quenching circuit, this increases the capacitance at the SPAD anode and degrades the afterpulsing quenching circuit, the at capacitance at SPAD anode degrades theperformance afterpulsing circuit, this increases theincreases capacitance the during SPAD anode and degrades the afterpulsing performance due tothis increased carrier flow anthe avalanche [24]. and performance duecarrier to increased carrier during[24]. an avalanche [24]. due to increased flow during anflow avalanche

(a) (a)

(b) (b)

Figure 9. (a) DCR map and (b) DCR cumulative proportion of the whole array with 5 V excess bias Figure (a) DCR map and (b) DCR cumulative proportion of the whole array with 5 V excess bias Figure9.9. voltage at(a) 20 DCR °C. map and (b) DCR cumulative proportion of the whole array with 5 V excess bias ◦ C. voltage at 20 voltage at 20 °C.

The photon detection probability (PDP) characterization is shown in Figure 10, where a peak The photon detection probability (PDP) is shown in Figure 10, where photonwas detection probability (PDP) characterization characterization shown Figure 10,importantly, where aapeak peak valueThe of 47.8% achieved at a wavelength of 520 nm with is 5V excessinbias. More a value of 47.8% was achieved at a wavelength of 520 nm with 5 V excess bias. More importantly,a value of 47.8% was achieved at a wavelength of 520 nm with 5 V excess bias. More importantly, high PDP of 8.4%, 4.7% and 2.8% was achieved at 840, 900 and 940 nm respectively, which provides ahigh high PDP 8.4%, 4.7% 2.8% was achieved 840, 900 nm respectively, which of of 8.4%, andand 2.8% was achieved at 840,at900 and 940and nm940 respectively, which provides morePDP flexibility for 4.7% 3D imaging at near infrared wavelengths. More than 50% peak PDP was achieved provides more flexibility for 3D imaging at near infrared wavelengths. More than 50% peak PDP was more forvoltage, 3D imaging at the nearreliability infrared wavelengths. Morecircuit than 50% peak PDP was achieved at 7 Vflexibility excess bias while of the quenching is not guaranteed. The fullachieved at 7 V excess bias voltage, while the reliability of the quenching circuit is not guaranteed. at 7 V excess bias voltage, while the reliability of the quenching is not guaranteed. fullwidth-at-half-maximum (FWHM) jitter was measured with a 637circuit nm laser, which reveals aThe jitter of The full-width-at-half-maximum (FWHM) jitter was measured with a 637 nm laser, which reveals a width-at-half-maximum (FWHM) jitter was measured with a 637 nm laser, which reveals a jitter of 106 ps at 5 V excess bias. Since the jitter of the laser is 40 ps, the pixel jitter, including SPAD, quenching jitter of 106 ps at 5 V excess bias. Since the jitter of the laser is 40 ps, the pixel jitter, including SPAD, 106 ps atand 5 VIO excess bias. Since the jitter of laser circuit, buffer, can be extracted to the be 98 ps.is 40 ps, the pixel jitter, including SPAD, quenching quenching circuit, and IO buffer, can be extracted to be 98 ps. circuit, and IO buffer, can be extracted to be 98 ps.

Figure 10. 10. PDP PDP measurement measurement at at excess excess bias bias voltage voltage of of 33 V, V,55VVand and77V.V. Figure Figure 10. PDP measurement at excess bias voltage of 3 V, 5 V and 7 V.

3.2. 3.2.TDC TDCCharacterization Characterization 3.2. TDC Characterization In Inorder orderto tocharacterize characterizethe theTDCs, TDCs,aacode codedensity densitytest testwas wasused. used. SPAD SPADpixels pixelsare areemployed employed to to generate uncorrelated START signals to trigger the conversion of the TDCs. The STOP signal (Figure 5), In order to characterize the TDCs, a code density test was used. SPAD pixels are employed to generate uncorrelated START signals to trigger the conversion of the TDCs. The STOP signal (Figure which is generated in the FPGA andand fedtofed totrigger each TDC through a latency balanced tree, is used to generate uncorrelated START signals theTDC conversion ofa the TDCs. Theclock STOP signal (Figure 5), which is generated in the FPGA to each through latency balanced clock tree, is used stop the conversion. If one or less event is detected during the conversion period, the distribution of 5), which is generated in the FPGA and fed to each TDC through a latency balanced clock tree, is used to stop the conversion. If one or less event is detected during the conversion period, the distribution times of the arrival should be The simplest way ofway generating uncorrelated signals to stop If uniformly one lessdistributed. eventdistributed. is detected during the conversion period, theuncorrelated distribution of times of conversion. arrival should be or uniformly The simplest of generating 4 events with SPADs is to detect the dark count events. By acquiring enough events, e.g., >10 per bin, of times of arrival should be uniformly distributed. The simplest way of generating uncorrelated 4 signals with SPADs is to detect the dark count events. By acquiring enough events, e.g., >10 events 4can the TDC resolution (1 LSB), differential non-linearity (DNL) and integral non-linearity (INL) be signals with SPADs is to detect the dark count events. By acquiring enough events, e.g., >10 events per bin, the TDC resolution (1 LSB), differential non-linearity (DNL) and integral non-linearity (INL) per bin, the TDCon resolution (1 LSB), differential non-linearity (DNL) and integral non-linearity (INL) calculated based the code histogram. can be calculated based on the code histogram. can be calculated based on the code histogram.

Sensors 2018, 18, x FOR PEER REVIEW Sensors2018, 2018,18, 18,4016 x FOR PEER REVIEW Sensors Sensors 2018, 18, x FOR PEER REVIEW

10 of 19 1010ofof1919 10 of 19

The nominal bin size or resolution (LSB) of the TDC is 50 ps, where the RO operates at 2.5 GHz. The nominal bin size or resolution (LSB) of the TDC is 50 ps, where the RO operates at 2.5 GHz. size or128 resolution of theinTDC is 50 where the RO operates at the 2.5 whole GHz. The The LSB nominal variationbin of all the TDCs is(LSB) presented Figure 11;ps, a standard deviation across sizethe or128 resolution of theinTDC is 50 where the RO operates at the 2.5 whole GHz. TheThe LSBnominal variationbin of all TDCs is(LSB) presented Figure 11;ps, a standard deviation across The LSB all the 128 TDCs presented in Figure 11; results a standard deviation thedensity whole array of variation 0.48 ps isof achieved. The DNLisand INL measurement obtained fromacross the code The LSB all the 128 TDCs is and presented in Figure 11; results a standard deviation the density whole array ofvariation 0.48 ps isofachieved. The DNL INL measurement obtained fromacross the code array 0.48 psinisFigure achieved. The DNL and INLLSB measurement results obtained from theachieved code density test isofshown 12, where −0.07/+0.08 DNL and −0.38/+0.75 LSB INL were with array 0.48 psinisFigure achieved. The DNL and INLLSB measurement results obtained from the achieved code density test isofshown 12, where −0.07/+0.08 DNL and −0.38/+0.75 LSB INL were with test shown in Figure where LSBaDNL and DNL/INL −0.38/+0.75nonlinearity LSB INL were achievedcan with a 20isMHz STOP signal.12, From the −0.07/+0.08 measurement, periodic component be test shown in Figure 12,From where −0.07/+0.08 LSB aDNL and − 0.38/+0.75 LSB INL were achievedcan with a 20isMHz STOP signal. the measurement, periodic DNL/INL nonlinearity component be aobserved; 20 MHz STOP signal. From the a periodic DNL/INLtononlinearity can be this behavior is due tomeasurement, a weak coupling of the SYS_CLK the RO biascomponent voltage. Figure 13 aobserved; 20 MHz STOP signal. From thetomeasurement, a periodic DNL/INLtononlinearity can be this behavior is due a weak coupling of the SYS_CLK the RO biascomponent voltage. Figure 13 observed; this behavior is due to a weak coupling of the SYS_CLK to the RO bias voltage. Figure 13 shows the peak-to-peak (p2p) DNL and INL cumulative distribution of all the TDCs. As expected, observed; behavior is(p2p) due toDNL a weak of the SYS_CLK to the voltage. 13 shows thethis peak-to-peak andcoupling INL cumulative distribution ofRO all bias the TDCs. AsFigure expected, shows the INL peak-to-peak (p2p) DNL andTDC INL cumulative all the TDCs. expected, the p2p is proportional to the conversion distribution time, since ofmore noise is As coupled and shows the peak-to-peak (p2p) DNL and INL cumulative distribution of all the TDCs. As expected, the p2p INL is proportional to the TDC conversion time, since more noise is coupled and the p2p INL Even is proportional to p2p the DNL TDC and conversion time, since noise is achieved coupled at and accumulated. so, a median INL of 0.21 LSB and more 0.92 LSB were 20 the p2p INL is proportional to the TDC time,ofsince is coupled andachieved accumulated. accumulated. Even so, a median p2p conversion DNL and INL 0.21more LSB noise and 0.92 LSB were at 20 accumulated. Even so, a median and INL across of 0.21the LSB and 0.92 LSB were achieved at 20 MHz STOP signal, which shows p2p highDNL homogeneity image sensor despite the fact that no Even a median p2p DNLshows and INL of homogeneity 0.21 LSB and 0.92 LSB were achieved 20 MHz STOP MHzso, STOP signal, which high across the image sensoratdespite the fact signal, that no MHz STOP signal, which shows to high across the image sensor despite the fact that no PVT compensation was applied thehomogeneity TDCs. which shows high homogeneity the image sensor despite the fact that no PVT compensation PVT compensation was appliedacross to the TDCs. PVT compensation was applied to the TDCs. was applied to the TDCs.

Figure 11. LSB distribution of the 128 TDCs shows a standard deviation of 0.48 ps. Figure 11. LSB distribution of the 128 TDCs shows a standard deviation of 0.48 ps. Figure Figure11. 11.LSB LSBdistribution distributionof ofthe the128 128TDCs TDCsshows showsaastandard standarddeviation deviationof of0.48 0.48ps. ps. (a) (b) (a) (b) (a) (b)

Figure 12. TDC (a) DNL and (b) INL measurement with different STOP frequency. Figure Figure12. 12.TDC TDC(a) (a)DNL DNLand and(b) (b)INL INLmeasurement measurementwith withdifferent differentSTOP STOPfrequency. frequency. Figure 12. TDC (a) DNL and (b) INL measurement with different STOP frequency. (a) (a) (a)

(b) (b) (b)

Figure Figure13. 13.Peak-to-peak Peak-to-peak(a) (a)DNL DNLand and(b) (b)INL INLcumulative cumulativedistribution distributionwith withdifferent differentSTOP STOPfrequency. frequency. Figure 13. Peak-to-peak (a) DNL and (b) INL cumulative distribution with different STOP frequency. Figure 13. Peak-to-peak (a) DNL and (b) INL cumulative distribution with different STOP frequency.

The single-shot timing characterization was was obtained by illuminating the sensor with TheSPAD-TDC SPAD-TDC single-shot timing characterization obtained by illuminating the sensor The SPAD-TDC single-shot timing characterization was obtained by illuminating the sensor a with pulsed laser at laser wavelength of 637timing nm. 5V excess voltage was applied, achieving minimum The SPAD-TDC characterization wasbias obtained bywas illuminating the sensora a pulsed atsingle-shot wavelength of A 637 nm. A 5 bias V excess voltage applied,a achieving with a pulsed laser at wavelength of 637 nm. A 5 V excess bias voltage was applied, achieving a with a pulsed laser at wavelength of 637 nm. A 5 V excess bias voltage was applied, achieving a

Sensors 2018, 18, 4016 Sensors 2018, 18, x FOR PEER REVIEW

11 of 19 11 of 19

minimum jitter(114 of 2.28 ps),inasFigure is shown Figure 14a. The jitter distribution the FWHM jitterFWHM of 2.28 LSB ps), LSB as is(114 shown 14a. in The jitter distribution of the 32 pixels of with 32 pixels withofrespect to TDCs each of the fourinTDCs shown in Figure 14b, where uniformity respect to each the four is shown Figureis14b, where good uniformity is good achieved with theis Sensors 2018, 18, x FOR PEER REVIEW 11 of 19 achieved with the average andof standard deviation of0.15 2.68LSB LSB(7.5 (134 and 0.15 No LSBobvious (7.5 ps) average and standard deviation 2.68 LSB (134 ps) and ps)ps) respectively. respectively. No obvious degradation of the jitter is observed during the signal propagation through degradation of the jitter is observed during the signal propagation through the complete length minimum FWHM jitter of 2.28 LSB (114 ps), as is shown in Figure 14a. The jitter distribution of the of thebus complete length of thetochain. bus ALTDC daisyis chain. Asinis discussed inofSection 3.1, thepixel jitterisis of 40 the the daisy Asthe is four discussed in Section 3.1, the14b, jitter the laser and 32 and pixelsALTDC with respect eachand of TDCs shown Figure where good uniformity laser and is 40the and 98 TDC ps, and respectively; TDCatquantization error at FWHM isLSB 34 ps. Therefore, and 98 ps, pixel respectively; the quantization error FWHM is 34(134 ps. ps) Therefore, can obtain achieved with average standard the deviation of 2.68 LSB and 0.15we (7.5 ps) the respectively. No obvious degradation ofentire the jitter is observed during theand signal propagation through we can obtain the average jitter from the collision bus ALTDC chain is only average jitter from the entire collision detection bus and detection ALTDC daisy chain is onlydaisy 75 ps. It implies the complete length of the bus and ALTDC daisy chain. As is discussed in Section 3.1, the jitter of the 75 ps. It impliescould this architecture scaled to build this architecture be scaled tocould build be larger pixel arrays.larger pixel arrays. laser and pixel is 40 and 98 ps, respectively; the TDC quantization error at FWHM is 34 ps. Therefore, we can detection bus and ALTDC daisy chain is only (a) obtain the average jitter from the entire collision (b) 75 ps. It implies this architecture could be scaled to build larger pixel arrays. (a)

(b)

Figure Figure14. 14.(a) (a)Single Singleshot shotSPAD-TDC SPAD-TDCtiming timingjitter jittermeasurement measurementwith witha aminimum minimumFWHM FWHMofof2.28 2.28LSB LSB (b) jitter distribution of all the pixels at each TDC measurement, leading to the average and standard (b) jitter distribution of all the pixels at each TDC measurement, leading to the average and standard Figure 14. (a) Single shot SPAD-TDC timing jitter measurement with a minimum FWHM of 2.28 LSB deviation deviationofof2.68 2.68LSB LSBand and0.15 0.15LSB, LSB,respectively. respectively. (b) jitter distribution of all the pixels at each TDC measurement, leading to the average and standard deviation of 2.68 LSB and 0.15 LSB, respectively.

3.3. Flash 3D Imaging 3.3. Flash 3D Imaging 3.3.validate Flash 3D the Imaging To sensor, a flash 3D imaging measurement was performed, where a target was To validate the sensor, a flash 3D imaging measurement was performed, where a target was illuminated with a diffused laser and 3D theimaging reflected light collected a per-pixel basis. An objective To validate the sensor, a flash measurement wason performed, where a target illuminated with a diffused laser and the reflected light collected on a per-pixel basis. An was objective was placed in front of the sensor, enabling a field-of-view (FOV) of 40 degree × 40 degree. A Xilinx illuminated with a diffused laser and the reflected light collected on a per-pixel basis. An objective was placed in front of the sensor, enabling a field-of-view (FOV) of 40 degree × 40 degree. A Xilinx wasbased placedFPGA in frontevaluation of the sensor, enabling a field-of-view (FOV) of 40 degree 40 degree. A Xilinx Kintex-7 board (XEM7360, Opal Kelly, Portland, OR,×USA) was used to read Kintex-7 based FPGA evaluation board (XEM7360, Opal Kelly, Portland, OR, USA) was used to read Kintex-7 based FPGA OR, USA) was aused read can out the TOF events, thenevaluation transmit board to the(XEM7360, computerOpal via Kelly, a USBPortland, 3 interface. Finally, 3D to image out the TOF events, then transmit to the computer via a USB 3 interface. Finally, a 3D image can be out the TOFbyevents, then transmit to TOF the computer via a pixel. USB 3 interface. Finally, awas 3D image canfor be LSB be constructed histogramming the data of each TDC calibration applied constructed by by histogramming the TOF data of each eachpixel. pixel.TDC TDC calibration was applied for LSB constructed histogramming the TOF data of calibration was applied for LSB variations among different TDCs, as well as time offset due to the skew of STOP clock. As is shown variations among different TDCs, as well as time offset due to the skew of STOP clock. As is variations among different TDCs, as well as time offset due to the skew of STOP clock. As is shownshown in Figure 15, a 3D image was obtained, where a person wearing a laser protection glass and with the in Figure 15, 15, a 3D image was personwearing wearing a laser protection and the with the in Figure a 3D image wasobtained, obtained, where where aaperson a laser protection glassglass and with right right handhand raised standing at at a distance of 0.7 m awayfrom from the sensor. The target was illuminated raised standing distance of of 0.7 0.7 m the sensor. TheThe target was was illuminated at right hand raised standing at a adistance maway away from the sensor. target illuminated at at a wavelength ofof637 nm. Due to the limited laser power,the themeasurement measurement performed at dark a wavelength 637 nm. Dueto tothe the limited limited laser waswas performed at dark a wavelength of 637 nm. Due laserpower, power, the measurement was performed at dark conditions and and withwith an exposure time ofofa afew Millimetricdetail detail can observed thanks conditions exposure time fewseconds. seconds. can be be observed thanks to toto conditions and with anan exposure time of a few seconds.Millimetric Millimetric detail can be observed thanks the jitter ofofthe system and high ratio (SBR). thetiming low timing jitter of the system and highsingle-to-background single-to-background ratio (SBR). thelow low timing jitter the system and high single-to-background ratio (SBR).

Figure 15. Flash 3D imaging a humansubject subjectat at distance distance of 2D2D intensity image inset.inset. Figure 15. Flash 3D imaging of of a human of0.7 0.7mmwith with intensity image

Figure 15. Flash 3D imaging of a human subject at distance of 0.7 m with 2D intensity image inset.

Sensors 2018, 18, 4016 Sensors 2018, 18, x FOR PEER REVIEW

12 of 19 12 of 19

3.4.3.4. Scanning LiDAR Experiment Scanning LiDAR Experiment order to extend image resolution, the sensor was demonstrated in a scanning LiDAR In In order to extend the the image resolution, the sensor was demonstrated in a scanning LiDAR system. perform scanthe imaging, the entire a single detection component, To system. performTo scan imaging, entire pixel arraypixel wasarray used was as a used singleasdetection component, where the where the mismatching between TDCs is accumulated time. the To improve accuracy of the mismatching between TDCs is accumulated with time. Towith improve accuracythe of the measurement, measurement, calibration has to be applied to each TDC and SPAD. The single shot timing response calibration has to be applied to each TDC and SPAD. The single shot timing response of the whole of the whole Figure 16,by was acquired by electrically the at STOP signal at astep phase array, Figure 16,array, was acquired electrically sweeping the sweeping STOP signal a phase shift ofshift 25 ps. step of 25 ps. As expected, the jitter is proportional to the TDC value, as more nonlinearity error is accumulated As expected, the the jitterlinearity, is proportional theperformed TDC value, as major more parameters, nonlinearitycomprising error is with distance. To improve calibrationtowas to two accumulated with distance. To improve the linearity, calibration was performed to two major the TDC LSB variation and signal skew. Prior to the calibration, one SPAD in the center of the array parameters, comprising the TDC LSB variation and signal skew. Prior to the calibration, one SPAD (row 16 and column 16) and the first TDC in the column are used as a reference for the calibration in the center of the array (row 16 and column 16) and the first TDC in the column are used as a alignment of all the pixels and TDCs. For the TDC LSB variation, since the LSB of every TDC is reference for the calibration alignment of all the pixels and TDCs. For the TDC LSB variation, since characterized in Figure 11, the TDC code offset can be calculated with respect to the reference TDC. the LSB of every TDC is characterized in Figure 11, the TDC code offset can be calculated with respect The signal skew, including the delay in the pixel circuit, TIMING and STOP signals, is calibrated by to the reference TDC. The signal skew, including the delay in the pixel circuit, TIMING and STOP illuminating the sensorbywith a laser and delay offset of and eachthe SPAD with respect toSPAD each TDC signals, is calibrated illuminating the the sensor with a laser delay offset of each with is measured and stored look-up table for calibration. Astable is shown in Figure As 16, is after calibration respect to each TDCin is ameasured and stored in a look-up for calibration. shown in Figurethe FWHM jitter is stabilized and reduced from 10.63 LSB to 5.87 LSB in average. However, the average 16, after calibration the FWHM jitter is stabilized and reduced from 10.63 LSB to 5.87 LSB in average. jitter of a single pixel from a single TDC is 2.68 LSB, which is smaller than that of the system However, the average jitter of a single pixel from a single TDC is 2.68 LSB, which is smaller thanjitter that of 5.87 Two main to the calibration degradation. The first one is the calibration ofLSB. the system jitterfactors of 5.87 contribute LSB. Two main factors contribute to the calibration degradation. The first quantization error. Thequantization calibration error. coefficient is stored in look-up istables ininthe FPGAtables for real-time one is the calibration The calibration coefficient stored look-up in the calibration imaging. However, toimaging. reduce the complexity of the the firmware, the value the coefficient FPGA forand real-time calibration and However, to reduce complexity of theof firmware, the value of the coefficient roundedwhich to the introduces nearest integers, which introduces and is rounded to the nearestisintegers, quantization error andquantization reduces the error calibration reducesAnother the calibration Another reason is the temperature voltage dependence thethe accuracy. reasonaccuracy. is the temperature and voltage dependenceand of the calibration. Sinceofall calibration. Since all the ROs operate in open loop, the frequency is varying over temperature and ROs operate in open loop, the frequency is varying over temperature and voltage, leading to a TDC voltage, leading to a TDCtolinearity, difficult to calibrate with a constant coefficient. A similar linearity, which is difficult calibratewhich with is a constant coefficient. A similar situation also occurs to the situation also occurs to the propagation delay of SPAD output, TIMING and STOP signals. propagation delay of SPAD output, TIMING and STOP signals. Nevertheless, the oscillation of ROs Nevertheless, theby oscillation of ROs could be stabilized by locking the frequency with improves an externalthe could be stabilized locking the frequency with an external phase-locked loop, which phase-locked loop, which improves the frequency tolerance to temperature and voltage variation. frequency tolerance to temperature and voltage variation. While for the skew calibration, different While for the skew calibration, different measurement could be performed at different operating measurement could be performed at different operating conditions to retrieve temperature-voltage conditions to retrieve temperature-voltage dependent look-up tables. dependent look-up tables.

Figure 16.16. Jitter measurement wherethe theaverage averagejitter jitterisisreduced reduced from Figure Jitter measurementbefore beforeand andafter after calibration, calibration, where from 10.63 LSB 5.87 LSB. 10.63 LSB to to 5.87 LSB.

Single-point performed with withthe thesame samelaser laseratat4040 MHz Single-pointtelemetry, telemetry,shown shownin in Figure Figure 17, 17, was performed MHz repetition rate, 2 mWaverage averagepower, power,and and40 40ps pspulse pulse width. width. Even range repetition rate, 2 mW Even though thoughthe theunambiguous unambiguous range with a 40 MHz laser is 3.75 larger range can characterizedby byexploiting exploitingthe theprior priorknowledge knowledge of with a 40 MHz laser is 3.75 m,m, larger range can bebecharacterized the distance offset. In this the linearity of the system was characterized andshown shownininFigure Figure17. theofdistance offset. In this way,way, the linearity of the system was characterized and 17. Areflectivity 60% reflectivity target was measured upm, towhere 50 m, each wheredistance each distance was measured for 10 A 60% target was measured up to 50 was measured for 10 repeated repeated times in dark conditions, achieving a maximum non-linearity and worst-case precision (σ)cm times in dark conditions, achieving a maximum non-linearity and worst-case precision (σ) of 6.9 of 6.9 cm and 0.62 cm respectively, over the entire range. Instead of controlling the frame time, a and 0.62 cm respectively, over the entire range. Instead of controlling the frame time, a constant

Sensors 2018, 18, x FOR PEER REVIEW

13 of 19

Sensors 2018, 18,18, 4016 Sensors 2018, x FOR PEER REVIEW

1319 of 19 13 of

constant amount of 50 k photons were collected in every measurement, which gives a high SBR and detection case, the object signal bemeasurement, distinguished,which evengives though the SBR distance constantreliability. amount of In 50 this k photons were collected in can every a high and is 50detection m andofthe laser peak is only 0.5 W.signal However, the background light is high the frame amount 50 k photons were collected in every measurement, which gives athough high SBR detection reliability. In power this case, the object can beifdistinguished, even theand distance is time is limited, less signal photons will be acquired and the performance will be degraded with reliability. In this case, the object signal can be distinguished, even though the distance is 50 m and the 50 m and the laser peak power is only 0.5 W. However, if the background light is high and the frame distance. laser peak power isless only 0.5 W. However, the background light is high and will the frame time is limited, time is limited, signal photons willifbe acquired and the performance be degraded with less signal photons will be acquired and the performance will be degraded with distance. distance. (a) (a)

(b) (b)

(c) (c)

17.17.(a)(a)Measured functionof ofthe theactual actualdistance; distance; (b) The maximum Figure Measureddistance distance up up to to 50 50 m as aa function (b) The maximum Figure 17. function of the actual distance; (b) The maximum (c) worst-case precision were achieved achievedat at6.9 6.9cm cmand and0.62 0.62cm cm respectively. non-linearity and(c) (c)worst-case worst-caseprecision precision were achieved respectively. non-linearity and at 6.9 cm and 0.62 cm respectively.

Based comprising dual-axis galvanometer scanner Basedononthe thesensor, sensor,aaascanning scanning LiDAR LiDAR system system scanner sensor, scanning LiDAR system comprising comprisingaaadual-axis dual-axisgalvanometer galvanometer (GVS012, Thorlabs,Newton, Newton,NJ, NJ, USA), USA), an arbitrary arbitrary waveform (AWG, 33600A, Keysight, (GVS012, Thorlabs, waveformgenerator generator (AWG, 33600A, Keysight, generator Santa Rosa, CA, USA),and andaaa637 637nm nmlaser lasersource sourcewas was built, as shown inin Figure 18.18. Two channels Santa Rosa, CA, USA), in Figure 18. channels of Rosa, CA, USA), and 637 nm laser source wasbuilt, built,as asisisisshown shown Figure Two channels step signals aregenerated generated by the AWG todrive drive the scanner totoperform raster scan on on step signals areare generated byby the AWG toto perform configurable raster scan of of step signals the AWG drivethe thescanner scannerto performconfigurable configurable raster scan target. thethe target.

Y-controller Y-controller

X-controller X-controller

Dual-axis Dual-axis scanner scanner

Waveform Waveform generator generator

Vertical resolution

Laser Laser

Vertical resolution Laser clock

Scanner trigger Scanner trigger

Laser clock Sensor chip & FPGA Sensor chip

Horizontal resolution

Horizontal resolution

& FPGA USB3.0

USB3.0

Figure 18. Block diagram of the LiDAR system. Figure 18. Block diagram of the LiDAR system.

The scan experiment was performed in dark conditions, where a mannequin was placed 1.3 m The scan experiment was performed in conditions, where aa mannequin was placed away sensor with curved background. The facial image of the mannequin was in m Thefrom scanthe experiment was performed in dark dark conditions, where mannequin wasobtained placed 1.3 1.3 m away from the sensor with curved background. The facial image of the mannequin was obtained Figure 19,the where bothwith depth and intensity imagesThe were acquired with a resolution of 128 × 128 at the in away from sensor curved background. facial image of the mannequin was obtained in Figure where depth intensity were with aaresolution ×× 128 same19, time. The both scanner wasand operated at aimages low frequency of 1 KHz, which ensures of more 10at Figure 19, where both depth and intensity images wereacquired acquired with resolution of128 128than 128 atKthe the same time. The scanner scanner was operated at point, low thus frequency of a11high KHz,SBR. which ensures more than 10 K K photons detected was at each scanning enabling Theensures distancemore of each point same time.being The operated at aa low frequency of KHz, which than 10

photons photons being being detected detected at at each each scanning scanning point, point, thus thus enabling enabling aa high high SBR. SBR. The The distance distance of of each each point point was calculated by averaging the bins around the peak of the histogram. Millimetric depth resolution

Sensors 2018, 18, x FOR PEER REVIEW Sensors 2018, 18, 4016

14 of 19 14 of 19

was calculated by averaging the bins around the peak of the histogram. Millimetric depth resolution was achieved, where details of the face can be clearly recognized, which proves the high linearity of was achieved, where details of the face can be clearly recognized, which proves the high linearity of the scanning system. the scanning system.

Sensors 2018, 18, x FOR PEER REVIEW

15 of 19

Table 2. Cont. LiDAR experiment 637 nm 2 (average) mW Figure 19. ScanIllumination imaging of apower mannequin at distance of500 1.3(peak) m with a resolution of 128 mW× 128, where Figure 19. Scan imaging of arate mannequin distanceat ofthe 1.3same m6 with a resolution of 128 both the depth and intensity images wereatobtained time. Frame fps× 128, where both the depth andImage intensity images were obtained at the same time. resolution 64 ×64 Illumination wavelength

Furthermore,Field real-time imaging 505 lux of background light at a resolution of view (H × V) was carried out with 5× degree Furthermore, imaging carried with 50 of background resolution of 64 × 64, wherereal-time the scanner was was operated at out a frequency of 24.5 KHz. As islight shown Figure 20, Target reflectivity 40lux % at ain Distance range 10 m inthe ofa64 × 64, where scanner was operated a frequency 24.5toKHz. As is shown Figure a human subjectthe (reflectivity of(LiDAR) about 40%)at standing 10 m of away the sensor, waving right20, hand Background light 50 lux human subjectaround, (reflectivity of about 40%) standing 10 mFOV awaywas to the sensor,towaving the right hand and turning was recorded at 6 frames/s. The adjusted be 5 degree × 5 degree, Chip power consumption 0.31 (@ 35.5 photon throughput) W and turning was recorded at 6of frames/s. The Mcps FOV wasX adjusted to be 5 degree × 5 degree, which givesaround, a fine angular resolution 0.078 degree in both and Y directions, corresponding to a which gives a fine angular resolution of 0.078 degree in both X and Y directions, corresponding to a scanning step of 1.36 cm per point. To improve the SBR, a bandpass optical filter with FWHM of 10 nm Table 2 summarizes the results of the whole system, including the chip characteristics, distance scanning step of 1.36 cmthe perbackground point. To improve the SBR, a bandpass optical filter with FWHM ofsharp 10 was used to suppress light. Thanks to the high PDP and photon throughput, measurement thewith LiDAR system performance. The total consumption, is strongly nm was used toand suppress the Thanks to the high throughput, images were recorded anbackground average and light. peak laser power aspower low asPDP 2 mWand andphoton 500which mW, respectively. dependent the operating environment, hasand beenwere measured at 0.31 W where photon sharp images were recorded an average peak laser power low as the 2 mW and 500 mW, Since a lowon power laser andwith visible wavelength employed inas the experiment, we throughput believe the is about 35.5 Mcps. The ALTDC array, readout logic, IO interface, pixel array and debugging circuits respectively. Since a low laser and improved visible wavelength in the experiment, we ranging performance canpower be significantly by using awere highemployed power near-infrared laser, without contributed 30%, 28%, 27%, and respectively. believe theother ranging performance can be4% significantly improved by using a high power near-infrared affecting aspects of the11% system. laser, without affecting other aspects of the system.

(a)

t = 0 s frame 1

t = 0.166 s frame 2

t = 0.332 s frame 3

Table 2. Performance summary of the sensor and LiDAR system Parameter

Value Unit Chip characteristics Array resolution 32 ×32 Technology 180 nm CMOS Chip size 5 ×2 mm2 Pixel pitch 28.5 µm t = 0.664 s frame 5 t = 0.498 s frame 4 t = 0.83 s frame 6 Pixel fill-factor 28 % SPAD break down voltage 22 V SPAD median DCR 113 (Vex = 5 V, 20 °C) cps SPAD jitter 106 (Vex = 5 V) ps SPAD PDP 47.8 (Vex = 5 V @520 nm) % TDC LSB 50 ps TDC resolution 12 bit t = 1.162 s frame 8 128 t = 1.328 s frame 9 t = 0.996 s No. frameTDC 7 TDC area 4200 µ m2 Readout bandwidth 5.12 Gbps 222 (PT mode) Mcps Maximum photon throughput 465 (PC mode) Mcps Distance measurement Measurement range 50 m Non-linearity (Accuracy) 6.9 (0.14%) cm Precision (σ) (Repeatability) 0.62 (0.01%) cm (b) Figure 20. Cont.

FoV

Cm

t = 0.996 s frame 7

t = 1.328 s frame 9

Sensors 2018, 18, 4016

15 of 19

(b)

FoV

Figure20. 20.(a) (a)Nine Nineconsecutive consecutiveframes frameswere wererecorded recordedatat frames/swith withresolution resolutionofof6464××64 64at at10 10m, m, Figure 6 6frames/s whereaahuman humansubject subjectwas waswaving wavinghis hisright righthand handand andturning turningaround; around;(b) (b)image imagecaptured capturedwith withaa where commercialcamera. camera. commercial

Table 2 summarizes the results of the whole system, including the chip characteristics, distance measurement and the LiDAR system performance. The total power consumption, which is strongly dependent on the operating environment, has been measured at 0.31 W where the photon throughput is about 35.5 Mcps. The ALTDC array, readout logic, IO interface, pixel array and debugging circuits contributed 30%, 28%, 27%, 11% and 4% respectively. Table 2. Performance summary of the sensor and LiDAR system Parameter

Value

Unit

Chip characteristics Array resolution Technology Chip size Pixel pitch Pixel fill-factor SPAD break down voltage SPAD median DCR SPAD jitter SPAD PDP TDC LSB TDC resolution No. TDC TDC area Readout bandwidth Maximum photon throughput

32 × 32 180 nm CMOS 5×2 28.5 28 22 113 (Vex = 5 V, 20 ◦ C) 106 (Vex = 5 V) 47.8 (Vex = 5 V @520 nm) 50 12 128 4200 5.12 222 (PT mode) 465 (PC mode)

mm2 µm % V cps ps % ps bit µm2 Gbps Mcps Mcps

Distance measurement Measurement range Non-linearity (Accuracy) Precision (σ) (Repeatability)

50 6.9 (0.14%) 0.62 (0.01%)

m cm cm

LiDAR experiment Illumination wavelength Illumination power Frame rate Image resolution Field of view (H × V) Target reflectivity Distance range (LiDAR) Background light Chip power consumption

637 2 (average) 500 (peak) 6 64 × 64 5×5 40 10 50 0.31 (@ 35.5 Mcps photon throughput)

nm mW mW fps degree % m lux W

Sensors 2018, 18, 4016 Sensors 2018, 18, x FOR PEER REVIEW

16 of 19 16 of 19

4. Proposed Background Light Suppression Architecture In order to improve the tolerance to the background light in SPAD sensors, coincidence photon detection has been applied, whereas only events with more than one photon detected in a coincidence coincidence time window are processed by the sensor. In [6,7], referred to as method 1, the authors implemented multiple stages of adders to quantify coincidence photons. In this method, one continuously counts the number of events detected by by aa set set of of SPADs SPADs in a predefined time window. By compensating the propagation delay between between the the signal signal outputs outputs and and carrier carrier outputs outputs of of the the adders, adders, TDCs TDCs can can only only be be triggered by the second photon of the coincidence event. While While for for a number number of of N SPADs, the same number number of bits has to be summed up, which requires a large number of adders thus limiting the suitability of the approach in large arrays. In In [5], [5], SPADs SPADs were were combined combined onto onto aa single single output output via via independent monostables monostables and a balanced OR-tree. OR-tree. In this method, referred to as method 2, the output of the OR-tree drives a series of of shift shift registers registers to to count count the the events events and and validate validate coincidence coincidencedetection. detection. Since the silicon area of the OR-tree is much smaller smaller than than that that of of the the adder-tree, adder-tree, aa64 64 × × 64 pixel array could be implemented. However, when multiple photons are detected at a time distance shorter than the pulse generated by the monostable, the OR-tree can only output one pulse, pulse, which which results in events events missed and reduced SBR. Another drawback drawback is is that that the the TDCs TDCs are are always always triggered triggered by by the the first first event, event, while in case of uncorrelated uncorrelated photons the the TDCs TDCs will will be bereset resetafter afterthe thecoincidence coincidencewindow. window. Therefore, Therefore, a high TDC activity and power consumption can be expect with strong background background light. light. Even though though background backgroundlight lightsuppression suppressionisisnot not explicitly implemented in this design, explicitly implemented in this design, withwith the the intrinsic capability of coincidence photon detection, the collision detection bus can be seen as intrinsic capability of coincidence photon detection, the collision detection bus can be seen as such, such, since the coincidence window withfirst thephoton first photon detected and ends a user-determined since the coincidence window beginsbegins with the detected and ends a user-determined time time later.implicit This implicit coincidence is anapproach effectivetoapproach to suppression background delay delay later. This coincidence window window is an effective background suppression methods, sinceevents coincident source) can be within recognized within the methods, since coincident (lightevents source)(light can be recognized the window andwindow signals and signals (noise and background) be easily outside suppressed (noise and background) can be easilycan suppressed it. outside it. The proposed sensor architecture architecture is shown in Figure 21, where a group of 32 pixels pixels are are employed employed for coincidence detection. detection. As As is is discussed discussed in in Section Section 2.2, collision collision events events will generate generate an address output with when coincidence events are detected, the with more more than thanthree threenon-zero non-zerobits. bits.Therefore, Therefore, when coincidence events are detected, most-significant bit (MSB) of the adder, Z, will rise to high, which can be directly used for the the most-significant bit (MSB) of the adder, Z, will rise to high, which can be directly used triggering of the TDC. for the triggering of the TDC.

Collision detection bus

32 pixels Adder 7 A

S

TDC Z

TDC_OUT

Z Z

START STOP

STOP_CLK

Figure 21. 21. Proposed sensor architecture with coincidence event detection among 32 pixels, based on Figure collision detection bus. bus. collision detection

In comparison comparison with withmethod method11and and2,2,instead insteadofof3232 bits, only seven bits have to be processed, In bits, only seven bits have to be processed, so so a much smaller coincidence detection circuitry can be constructed. More specifically, to a much smaller coincidence detection circuitry can be constructed. More specifically, to perform perform coincidence detection tree with 13 13 full-adders, fourfour half-adders, one AND gate coincidence detectionwith with32 32pixels, pixels,ananadder adder tree with full-adders, half-adders, one AND and one 18-input OR gate is required in method 1, while an OR-tree with 31 NAND/NOR gates for gate and one 18-input OR gate is required in method 1, while an OR-tree with 31 NAND/NOR gates method 2. Instead, the proposed approach only needs three full-adders and four half-adders. On the for method 2. Instead, the proposed approach only needs three full-adders and four half-adders. On other hand, since it isitbased on an thethe event-miss problem in method 2 would notnot happen in the other hand, since is based on adder, an adder, event-miss problem in method 2 would happen this approach. Furthermore, in comparison with method 2, since the TDC can only be triggered by the in this approach. Furthermore, in comparison with method 2, since the TDC can only be triggered by coincidence events with thethe output of of Z, to low low the coincidence events with output Z,low lowTDC TDCactivity activitycan canbe beachieved achieved thus thus leading leading to power consumption. In comparison to the current sensor architecture, only a minor modification with power consumption. In comparison to the current sensor architecture, only a minor modification with the implementation required, which prevents the the features of the array array from being the implementationofofthe theadders addersis is required, which prevents features ofpixel the pixel from being affected. Therefore, with the proposed approach, a SPAD sensor with a large pixel array, higher fill factor and high background light suppression is expected.

Sensors 2018, 18, 4016

17 of 19

affected. Therefore, with the proposed approach, a SPAD sensor with a large pixel array, higher fill factor and high background light suppression is expected. 5. Conclusions In this work, we presented a 32 × 32 SPAD imager, fabricated in a 180 nm CMOS technology, where each 32 pixels in one column are connected to a collision detection bus. With the bus-sharing scheme, a fill factor of 28% and a pixel pitch of 28.5 µm were achieved. To improve the photon throughput, a scalable ALTDC mechanism was implemented to dynamically reallocate TDCs for TOF events detection. This enables the same photon throughput as that of per-pixel TDC architectures. The events are read off-chip in an event-driven readout method with high energy efficiency, where 32 channels are employed operating at a bandwidth of 5.12 Gbps, which enables a maximum throughput of 222 Mcps and 465 Mcps in PT and PC mode, respectively. The SPAD exhibits 47.8% PDP at 520 nm, 113 cps median DCR, 106 ps FWHM jitter and negligible afterpulsing was characterized at an excess bias of 5 V. Ranging measurement at a distance of 50 m achieved 6.9 cm non-linearity (0.14% accuracy) and 0.62 cm precision (σ = 0.01%). Based on the sensor, a scanning LiDAR system achieving depth imaging up to 10 m at 6 frames/s with a resolution of 64 × 64 pixels was demonstrated with 50 lux of background light. The average and peak illumination power was as low as 2 mW and 500 mW respectively. This sensor provides flexibility for applications in which low light imaging and high timing resolution are required, such as quantum imaging, biological imaging, as well as indoors flash and scanning LiDAR. To improve the background light suppression, a new sensor architecture based on the concept of collision detection bus is proposed. Compared to other methods in literature, the proposed method has the benefit of reduced coincidence detection circuitry area and low TDC power consumption, which provides an approach of designing SPAD sensors with a large pixel array and high fill factor for TOF imaging applications in high background light environment, such as automotive LiDAR. Author Contributions: This sensor was a collaborative design with a division of labor among different circuit blocks. C.Z. designed the ALTDC dynamic reallocation scheme, the event-driven readout, firmware for the FPGA, built the scanning LiDAR system, and proposed the background light suppression architecture; S.L. designed and measured the TDC, and carried out the flash 3D imaging measurement; I.M. Antolovic was responsible for the SPAD pixel design and characterization. M.W. co-directed the work. E.C. co-designed the sensor and the system, co-directed the work. Funding: This research was funded by the Netherlands organization for scientific research (NWO) under project number 12807. Acknowledgments: The authors would like to acknowledge Juan Mata Pavia and Augusto Ronchini Ximenes for the design and experiment support. Conflicts of Interest: The authors declare no conflict of interest.

References 1. 2.

3.

4.

Oike, Y.; Ikeda, M.; Asada, K. A 375 × 365 high-speed 3-D range-finding image sensor using row-parallel search architecture and multisampling technique. IEEE J. Solid-State Circuits 2005, 40, 444–453. [CrossRef] Seo, M.W.; Shirakawa, Y.; Masuda, Y.; Kawata, Y.; Kagawa, K.; Yasutomi, K.; Kawahito, S. A programmable sub-nanosecond time-gated 4-tap lock-in pixel CMOS image sensor for real-time fluorescence lifetime imaging microscopy. In Proceedings of the ISSCC, San Francisco, CA, USA, 5–9 February 2017; pp. 70–71. Shcherbakova, O.; Pancheri, L.; Dalla Betta, G.F.; Massari, N.; Stoppa, D. 3D camera based on linear-mode gain-modulated avalanche photodiodes. In Proceedings of the ISSCC, San Francisco, CA, USA, 17–21 February 2013; pp. 490–491. Bronzi, D.; Villa, F.; Tisa, S.; Tosi, A.; Zappa, F.; Durini, D.; Weyers, S.; Brockherde, W. 100,000 Frames/s 64 × 32 Single-Photon Detector Array for 2-D Imaging and 3-D Ranging. IEEE J. Sel. Top. Quantum Electron. 2014, 20, 354–363. [CrossRef]

Sensors 2018, 18, 4016

5.

6. 7.

8.

9.

10.

11.

12. 13.

14.

15.

16.

17. 18.

19. 20.

21.

18 of 19

Perenzoni, M.; Perenzoni, D.; Stoppa, D. A 64 × 64-Pixel Digital Silicon Photomultiplier Direct ToF Sensor with 100 MPhotons/s/pixel Background Rejection and Imaging/Altimeter Mode with 0.14% Precision up to 6 km for Spacecraft Navigation and Landing. IEEE J. Solid-State Circuits 2017, 52, 151–160. [CrossRef] Niclass, C.; Soga, M.; Matsubara, H.; Kato, S.; Kagami, M. A 100-m range 10-Frame/s 340×, 96-pixel time-of-flight depth sensor in 0.18-µm CMOS. IEEE J. Solid-State Circuits 2013, 48, 559–572. [CrossRef] Niclass, C.; Soga, M.; Matsubara, H.; Ogawa, M.; Kagami, M. A 0.18-m CMOS SoC for a 100-m-Range 10-Frame/s 200× 96-pixel Time-of-Flight Depth Sensor. IEEE J. Solid-State Circuits 2014, 49, 315–330. [CrossRef] Villa, F.; Lussana, R.; Bronzi, D.; Tisa, S.; Tosi, A.; Zappa, F.; Mora, A.D.; Contini, D.; Durini, D.; Weyers, S.; et al. CMOS imager with 1024 SPADs and TDCS for single-photon timing and 3-D time-of-flight. IEEE J. Sel. Top. Quantum Electron. 2014, 20, 364–373. [CrossRef] Ximenes, A.R.; Padmanabhan, P.; Lee, M.; Yamashita, Y.; Yaung, D.N.; Charbon, E. A 256 × 256 45/65 nm 3D-Stacked SPAD-Based Direct TOF Image Sensor for LiDAR Applications with Optical Polar Modulation for up to 18.6 dB Interference Suppression. In Proceedings of the ISSCC, San Francisco, CA, USA, 11–15 February 2018; pp. 27–29. Lindner, S.; Zhang, C.; Antolovic, I.M.; Pavia, J.M.; Wolf, M.; Charbon, E. Column-Parallel Dynamic TDC Reallocation in SPAD Sensor Module Fabricated in 180 nm CMOS for Near Infrared Optical Tomography. In Proceedings of the 2017 International Image Sensor Workshop, Hiroshima, Japan, 30 May–2 June 2017; pp. 86–89. Veerappan, C.; Richardson, J.; Walker, R.; Li, D.; Fishburn, M.W.; Maruyama, Y.; Stoppa, D.; Borghetti, F.; Gersbach, M.; Henderson, R.K.; et al. A 160 × 128 Single-Photon Image Sensor with On-Pixel 55ps 10b Time-to-Digital Converter. In Proceedings of the ISSCC, San Francisco, CA, USA, 20–24 February 2011; pp. 312–314. Field, R.M.; Realov, S.; Shepard, K.L. A 100 fps, time-correlated single-photon-counting-based fluorescence-lifetime imager in 130 nm CMOS. IEEE J. Solid-State Circuits 2014, 49, 867–880. [CrossRef] Acconcia, G.; Cominelli, A.; Rech, I.; Ghioni, M. High-efficiency integrated readout circuit for single photon avalanche diode arrays in fluorescence lifetime imaging. Rev. Sci. Instrum. 2016, 87, 113110. [CrossRef] [PubMed] Lindner, S.; Zhang, C.; Antolovic, I.M.; Wolf, M.; Charbon, E. A 252 × 144 SPAD pixel FLASH LiDAR with 1728 Dual-clock 48.8 ps TDCs, Integrated Histogramming and 14.9-to-1 Compression in 180 nm CMOS Technology. In Proceedings of the IEEE VLSI Symposium, Honolulu, HI, USA, 18–22 June 2018. Pavia, J.M.; Scandini, M.; Lindner, S.; Wolf, M.; Charbon, E. A 1 × 400 Backside-Illuminated SPAD Sensor with 49.7 ps Resolution, 30 pJ/Sample TDCs Fabricated in 3D CMOS Technology for Near-Infrared Optical Tomography. IEEE J. Solid-State Circuits 2015, 50, 2406–2418. [CrossRef] Niclass, C.; Sergio, M.; Charbon, E. A CMOS 64 × 48 Single Photon Avalanche Diode Array with Event-Driven Readout. In Proceedings of the ESSCIRC, Montreux, Switzerland, 19–21 September 2006; pp. 556–559. Veerappan, C. Single-Photon Avalanche Diodes for Cancer Diagnosis. Ph.D. Thesis, Delft University of Technology, Delft, The Netherlands, March 2016. Lindner, S.; Pellegrini, S.; Henrion, Y.; Rae, B.; Wolf, M.; Charbon, E. A High-PDE, Backside-Illuminated SPAD in 65/40-nm 3D IC CMOS Pixel with Cascoded Passive Quenching and Active Recharge. IEEE Electron Device Lett. 2017, 38, 1547–1550. [CrossRef] Xu, H.; Pancheri, L.G.; Betta, D.; Stoppa, D. Design and characterization of a p+/n-well SPAD array in 150 nm CMOS process. Opt. Express 2017, 25, 12765–12778. [CrossRef] [PubMed] Gyongy, I.; Calder, N.; Davies, A.; Dutton, N.A.W.; Dalgarno, P.; Duncan, R.; Rickman, C.; Henderson, R.K. 256 × 256, 100 kfps, 61% Fill-factor time-resolved SPAD image sensor for time-resolved microscopy applications. IEEE Trans. Electron Devices 2018, 65, 547–554. [CrossRef] Lee, M.; Ximenes, A.R.; Member, S.; Padmanabhan, P.; Member, S.; Wang, T.; Huang, K.; Yamashita, Y.; Yaung, D.; Charbon, E. High-Performance Back-Illuminated Three-Dimensional Stacked Single-Photon Avalanche Diode Implemented in 45-nm CMOS Technology. IEEE J. Sel. Top. Quantum Electron. 2018, 24, 1–9. [CrossRef]

Sensors 2018, 18, 4016

22.

23.

24.

19 of 19

Bronzi, D.; Villa, F.; Bellisai, S.; Markovic, B.; Tisa, S.; Tosi, A.; Zappa, F.; Weyers, S.; Durini, D.; Brockherde, W.; et al. Low-noise and large-area CMOS SPADs with Timing Response free from Slow Tails. In Proceedings of the IEEE ESSDERC, Bordeaux, France, 17–21 September 2012; pp. 230–233. Sanzaro, M.; Gattari, P.; Villa, F.; Tosi, A.; Croce, G.; Zappa, F. Single-Photon Avalanche Diodes in a 0.16 µm BCD Technology with Sharp Timing Response and Red-Enhanced Sensitivity. IEEE. J. Sel. Top. Quantum Electron. 2018, 24, 1–9. [CrossRef] Cova, S.; Ghioni, M.; Lacaita, A.; Samori, C.; Zappa, F. Avalanche photodiodes and quenching circuits for single-photon detection. Appl. Opt. 1996, 35, 1956–1976. [CrossRef] [PubMed] © 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).