

## Optical Look Up Table

Zhen Li, Sébastien Le Beux, Christelle Monat, Xavier Letartre, Ian O'Connor

### ▶ To cite this version:

Zhen Li, Sébastien Le Beux, Christelle Monat, Xavier Letartre, Ian O'Connor. Optical Look Up Table. Design, Automation & Test in Europe Conference & Exhibition (DATE), 2013, Mar 2013, Grenoble, France. pp.873-876, 10.7873/DATE.2013.184. hal-00823743

## HAL Id: hal-00823743 https://inria.hal.science/hal-00823743

Submitted on 17 May 2013  $\,$ 

**HAL** is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L'archive ouverte pluridisciplinaire **HAL**, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d'enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

# Optical Look Up Table

Zhen Li, Sébastien Le Beux, Christelle Monat, Xavier Letartre, Ian O'Connor

Lyon Institute of Nanotechnology, INL-UMR5270 Ecole Centrale de Lyon, Ecully, F-69134, France

{name.surname}@ec-lyon.fr

*Abstract*— The computation capacity of conventional FPGAs is directly proportional to the size and expressive power of Look Up Table (LUT) resources. Individual LUT performance is limited by transistor switching time and power dissipation, defined by the CMOS fabrication process. In this paper we propose OLUT, an optical core implementation of LUT, which has the potential for low latency and low power computation. In addition, the use of Wavelength Division Multiplexing (WDM) allows parallel computation, which can further increase computation capacity. Preliminary experimental results demonstrate the potential for optically assisted on-chip computation.

#### Index Terms-silicon photonic architectures, WDM, LUT

#### I. INTRODUCTION

The computation capacity of high-performance FPGAs is directly proportional to the size and expressive power of Look Up Table (LUT) resources. However, advances in their design rely on the CMOS fabrication process and are thus limited by transistor switching time and power dissipation. New design paradigms are thus mandatory to replace traditional, slow and power consuming, electrical interconnects and computing circuits. On-chip optical resources could offer an attractive alternative to electrical architectures due to the intrinsic properties of light propagation naming low latency and high bandwidth. While many related works already addressed the use of silicon photonics technology for designing high performance on-chip interconnects [1][2], only few evaluated its potential for realizing computing architectures [3]. This approach is particularly relevant in the context of reconfigurable computing, where current electronic solutions suffer from high power consumption overhead.

In this paper, we propose an optical core implementation of LUT named OLUT (which stands for Optical LUT). Our objective is to take advantage of silicon photonics to create an efficient, optically assisted, LUT. Traditional (i.e. fully electrical) n-LUTs [4] interface n data inputs and one data output, i.e. with a single operation performed at a time (see Fig. 1. a). Instead of multiplexing electrical signals by switching-on or -off transistors as in LUTs, OLUTs rely on the routing of optical signals across an optical network defined by interconnected waveguides and optical switches, along a path specified by the electrical input data. Wavelength Division Multiplexing (WDM) is used in order to realize simultaneous operations on the same input data. An m operation OLUT (socalled *n*-*m*-OLUT) thus interfaces *n*- electrical data inputs to *m* electrical data outputs, using m optical signals at distinct wavelengths ( $\lambda_0, ..., \lambda_{m-1}$ ), (see Fig. 1. b). Hence, in addition of low latency and low power computation characteristics, WDM can further increase the OLUT computation capacity through parallelism. To our knowledge, this work is the first proposing an optical implementation of LUTs.

The paper is organized as follows. In Section II, we introduce optical devices used for implementing OLUTs and related works in silicon photonics computing architectures. The proposed OLUT architecture is presented in Section III. In Section IV, a preliminary evaluation of the OLUT performance highlights its potential for on-chip low latency and low power computation. Section V concludes the paper and gives perspectives to this work.



Fig. 1. General representation of a) *n*-LUT and b) *n*-*m*-OLUT

#### II. BACKGROUND AND RELATED WORKS

A photonic-based computing architecture typically relies on a few optical devices such as lasers, switches and photodetectors. We first introduce possible implementations of such functions and then present directed logic (DL) optical architectures based on these devices.

#### A. Optical devices

An optical DL network is primarily composed of bistable optical switching elements, whose output can be described as two logic states, typically associated with the presence of light (logic TRUE ('1')) or its absence (logic False ('0')). SOI-based add-drop filters, which consist of a silicon microring resonator side-coupled to two straight waveguides (see Fig. 2), are commonly used as switching elements. These devices can be electrically controlled, are compact (~2µm diameter), and typically provide low switching energy (~3fJ/bit [3]), and fast switching times (~ns), through exploiting relatively high Qfactor (~10,000) resonances [5]. The associated SOI technology is also CMOS compatible. The add-drop filter resonant wavelengths and the free spectral range (FSR) between adjacent resonances are primarily related to the ring diameter and refractive index. By applying a voltage through a p-i-n junction built across the ring [5], the resonant wavelengths can be shifted spectrally within the ns timescale. When considering an input optical signal  $(o_i)$  at  $\lambda_i$  (close to one particular resonance  $\lambda_0$ ), resonant and non-resonant states, with  $\lambda_0$  equal to or detuned from  $\lambda_i$ , respectively, can thus be configured according to the value of the input voltage *e<sub>i</sub>*:

• *The resonant* state (Fig.2.b) is obtained when  $e_i=1$ . Light travelling into the first (horizontal) waveguide at  $\lambda_i$  (equal to  $\lambda_0$ ) is coupled into the microring and is redirected to the second (vertical) waveguide, providing



Fig. 2. (a) Schematic representation of the microring SOI add-drop filter, with (b) the resonant and (c) the non resonant states

logic states '1' and '0' on the two optical outputs,  $o_1$  and  $o_2$ , respectively.

 The non-resonant state (Fig. 2.c) is obtained when e<sub>i</sub>=0. The ring resonance λ<sub>0</sub> is tuned away from the wavelength of the incoming light λ<sub>i</sub>, so that it continues on the same waveguide, resulting in logic states '1' and '0' for output 1 and output 2, respectively.

The add-drop filter can thus be considered either as an electrically controlled optical spatial router or as an electrically controlled optical switch that may change the direction of the optical signal. We use both functionalities for the OLUT architecture proposed in section III. Finally, using different microring resonators, schemes with multiple optical signals at

#### distinct wavelengths can be implemented (see session III).

The input optical signals are generated from continuouswave lasers acting as a power supply (on-chip or external). For single wavelength architectures, we could employ a standard laser diode delivering ~10mW at 1550nm. For multiwavelength systems, multiplexed diode lasers or a single diode laser producing a frequency comb could be envisaged [6].

Finally, the O/E converters could be made of on-chip CMOS compatible Germanium photodiodes having a high efficiency and a high speed response. The generated photocurrent is relatively independent of the signal wavelength.

#### B. Architecture of existing directed logic circuits

Directed logic (DL) was introduced as a logic architecture based on optical Fredkin-like gates [7]. A DL architecture is based on optical switches interconnected through waveguides, and while it improves computation latency, the interconnections are fixed and the optical switches are nonconfigurable, leading to an application specific architecture.

The Reconfigurable DL (RDL) architecture provides additional flexibility since the computing operation can be reconfigured at will [3]. The architecture is composed of two planes of (re)configurable add-drop based cells. It allows logic functions (written as sum-of-products operations) to be mapped; the first and second planes are configured to implement products and sums, respectively. However, RDL does not allow multiple and distinct operations to be computed simultaneously using different wavelengths. Yet, parallelism exploiting WDM is a major advantage of optics for computation. As presented in section III, the proposed OLUT makes the most of silicon photonic technologies, including the use of WDM, for creating more powerful architectures.

#### III. DESCRIPTION OF THE PROPOSED OLUT ARCHITECTURE

#### A. Single wavelength OLUT architectures

OLUTs are directly inspired from electrical LUTs commonly used in FPGA. An *n*-input LUT is interfaced through *n* data inputs, one data output and  $2^n$  configuration inputs connected to  $2^n$  bits RAM memory. Computation is achieved by directly indexing, from data inputs, the operation result stored in the memory. Figure 3a illustrates a 2-LUT. The main advantages of LUTs are the constant computation time and their ability to realize any Boolean function, leading to highly flexible architectures.

Here, we propose OLUT, an optical core implementation of the LUT, while keeping input and output data in electrical form. An example of OLUT architecture providing a similar behaviour to the 2-LUT is presented in Fig. 3b.



Fig. 3. Schematic representation of (a) a 2-LUT and (b) its equivalent OLUT Similarly to the LUT, the OLUT is composed of two stages:

1) The "Routing" part (left half of Fig. 3b): based on the electrical data inputs, a set of interconnected optical routers drive the optical signal into one of the multiple waveguides.

2) The "Memorization" part (right half of Fig. 3b): the OLUT configuration is stored in electrical RAM memories which can take either the '1' or '0' logic values depending on the required Boolean function. Each memory governs the bias applied to the microring, thereby controlling the state of the optical switch.

The OLUT thus described uses an optical signal at a single wavelength  $\lambda_0$ , providing an equivalent behavior to LUTs (i.e. a single operation is computed). For clarity, we used different symbols for the optical routers and the optical switches, in the routing and memorization parts respectively, although these all consist of similar add-drop filters. The relevance of this

distinction will become more apparent when introducing the use of WDM in OLUT architectures in order to parallelize computations.

#### B. Multi-wavelength OLUT architectures

WDM can be used in OLUTs in order to perform simultaneous operations on the same input data. An m operation OLUT (so-called *n*-*m*-OLUT) interfaces *n* data to *m* output ports, using *m* optical signals at distinct wavelengths. A general representation is illustrated on Fig. 4. . In the routing part, the *m* optical signals follow the same optical path, which is specified by the data inputs. This could be achieved, for instance, by having each of the multiplexed signal wavelengths individually matched to the adjacent resonances of the (identical) microrings that compose the router unit cells, i.e. the frequency spacing between the multiplexed optical signals coincides with the ring FSR. By contrast, the memorization part is replicated *m* times, with each memorization stage being wavelength specific. A practical example is shown in Fig. 5 for a 2-2-OLUT, where  $\lambda_0$  and  $\lambda_1$  signals are selectively coupled within the first and second stages of the memorization part, respectively. This could be achieved through using distinct microring diameters at each column (and with a different FSR than for the routing stage unit cell).



#### IV. PRELIMINARY EVALUATION OF THE OLUT PERFORMANCE

In this section, we introduce key metrics to evaluate the area size, the performance and the power consumption of n-m-OLUTs. The scalability of the OLUT architecture with the number of data using WDM is estimated and the performance of OLUTs is compared with RDL for the full adder case [3].

#### A. Evaluation metrics for the n-m-OLUT

The area size can be estimated from the total number of add-drop filters  $(N_{AD})$  in the *n*-*m*-OLUT, as the sum of the number of routers in the routing part  $(N_R)$  and the number of switches in the memorization part  $(N_S)$  (see Fig. 4.):

$$N_{AD} = N_R + N_S = 2^n \cdot 1 + m \times 2^n$$

The second metric is the latency. The main contributions consist of the O/E conversion time ( $\tau_{conv}$ ), the switching time for the router stage ( $\tau_{sw}$ ) and the time to cross each add-drop in the resonant state ( $\tau_{res}$ ~10ps for a microring with Q~10000), both in the routing and the memorization parts. Note that  $\tau_{sw}$  is equal to the switching time of a single router unit cell (~1ns) since all the router cell states are changed by the data in parallel. By considering the worst case scenario, where light passes through *n* resonant routers in the routing part and one resonant switch in the memorization part, we estimate the associated total *n-m*-OLUT latency to be  $\tau_{conv} + \tau_{sw} + (n+1) \times \tau_{res}$ .

Ultimately, the power consumption of *n*-*m*-OLUTs is related to the number of active devices in the system, i.e. the number of lasers (*m*), photodetectors (*m*) and add-drop filters ( $N_{AD}$ ). Note that for increasing *n* and *m* values, add-drop filters rapidly predominate in the whole *n*-*m*-OLUT architecture, and become the critical building block.

#### B. Scalability of the OLUT architecture using WDM



Fig. 6. Total number of add-drop filters  $(N_{AD})$  versus the number *n* of input data, required for computing 8 operations simultaneously. The results for *n*-*m*-OLUTs replicated *m*/8 times are shown, with m varying between 1 and 8, as specified in the legend

Here, we study how the use of multiple wavelengths in OLUTs allows us to reduce the hardware resource (i.e.  $N_{AD}$ ) to perform simultaneous computations, and how this scales with the number of input data, n. The result for the eight-operation case is plotted in Fig. 6. The latter provides a comparison of the total number of add-drop filters in *n*-*m*-OLUT systems that need to be replicated 8/m times (with *m* varying between 1 and 8) in order to perform the eight calculations simultaneously. As expected, the number of add-drop filters in n-m-OLUTs increases with the number n of input data. However, the increase rate is lower for the n-8-OLUT that exploits 8 multiplexed wavelength signals. Indeed, the higher the number of tolerated wavelengths, the higher the compactness and computation capacity that can be realized by WDM. In addition, the highest benefit of WDM in OLUTs increases with the number of data to be handled. This can be readily understood from Fig. 4, which highlights that the hardware and energy resources of the *n*-*m*-OLUT routing part are shared by the *m* wavelengths, with the complexity of this part increasing with the number of input data.

#### C. Comparison k-bit full carry adder

Even though a complete evaluation of the performance of the proposed OLUT will be the subject of future work, we study the potential of OLUTs in terms of low latency and power consumption on a specific application example, namely a *k*-bit full adder. We compare the OLUT needed to perform this operation with the previously discussed RDL architecture, based on reference [3].

 TABLE I.
 RDL and OLUT PERFORMANCE FOR FULL CARRY ADDERS

| k-bit<br>full<br>adder | Numbers of Active Devices |                     |                      | Latency                                                 |
|------------------------|---------------------------|---------------------|----------------------|---------------------------------------------------------|
|                        | Lasers                    | Photo-<br>detectors | Micro-<br>rings      |                                                         |
| RDL                    | $k^{2}+2k+2$              | $k^{2}+2k+2$        | $\sim 9k^3$          | $2\tau_{conv}+2\tau_{sw}+(k^2+3k+2)\times_{\tau_{res}}$ |
| OLUT                   | <i>k</i> +1               | <i>k</i> +1         | $\sim 2k \times 4^k$ | $\tau_{conv} + \tau_{sw} + (2k+2) \times \tau_{res}$    |

The results for the number of active devices and the latency are shown in TABLE I. From [3], RDL requires  $\sim 9k^3$ microrings for the expanded 2x2 unit cell structure, but  $\sim 6k \times 4^k$ microrings when using 1x1 unit cell based structures. For the full adder case, the former RDL structure has better scalability, as it requires only  $k^2+k+1$  products that generate k+1 outputs. Since each product or sum requires one laser and one photodetector in RDL, it requires  $k^2+2k+2$  lasers and photodetectors in total. By considering the worst case scenario where light passes through a maximum of 2k+1 on-resonant switches at the product stage and  $k^2+k+1$  switches at the sum stage before the output, the RDL total latency is  $(k^2+k+1+2k+1)\times\tau_{res}+2\times\tau_{sw}$  plus  $2\times\tau_{conv}$  which accounts for two O/E conversions, and  $2\tau_{sw}$  for the switching time in both stages.

When considering the implementation of the same full adder application in an *n*-m-OLUT (i.e. n = 2k+1 and m=k+1), with the formula presented in session IV.B,  $\sim 2k \times 4^k$  microrings and k+1 lasers are required, and the latency is equal to  $\tau_{conv}+\tau_{sw}+(2k+2)\times\tau_{res}$ , We note that the number of microrings grows exponentially in OLUTs, but this is not too dramatic for a low number of inputs (less than 4), making it advantageous to split larger functions into several smaller OLUTs.



Fig. 7. Illustration of OLUT configured for 1bit full adder when inputs (a)  $x=1,y=1,c_{in}=1$  and (b)  $x=1,y=0,c_{in}=0$ 

Figure 7 illustrates a schematic example of the 1bit full adder with a 3-2-OLUT, with the latency equal to  $\tau_{conv}+\tau_{sw}+4\times\tau_{res}$  and including 23 add-drops. The first and second memorization stages are configured to realize sum and  $c_{out}$  computations on  $\lambda_0$  and  $\lambda_1$  optical signals respectively. In scenario (a), the three inputs are set to '1' and the optical signals are thus driven, through the router part, into the lowest waveguide of the memorization part. Since both switches dedicated to  $\lambda_0$  and  $\lambda_1$  are configured to the resonant state (i.e. the attached RAM memories contain the '1' value), both optical signals are coupled into the vertical waveguides, resulting in '1' logic value on  $c_{out}$  and sum output ports. In scenario (b), a single input is set to '1'; the optical signal will thus be driven to the upper waveguide for which only the  $\lambda_0$  microring is set to the resonant state. Hence, light will be coupled into a single vertical waveguide, resulting in value '1' and '0' on sum and  $c_{out}$  ports, respectively.

Therefore, we may conclude that OLUTs allow for lower latency and a reduced number of optical devices, hence reducing the power consumption in this full adder configuration when compared to RDL systems. This example also illustrates the potential for further parallel computation through using a higher number of multiplexed wavelengths, which is possible by exploiting integrated photonic technology.

#### V. CONCLUSIONS

In conclusion, we proposed OLUTs, where the intrinsic advantages of optics are exploited into the core architecture of LUTs to realize photonic assisted computing systems with low latency and that are readily reconfigurable. When compared to previously proposed RDL architectures, OLUTs do not require intermediate slow and power hungry O/E conversions. In addition, OLUTs can be uniquely combined with WDM, allowing several computations to be performed simultaneously. This provides a way to save energy and hardware resource through implementing parallelism, a key advantage of optics for computing.

Future directions of our work include the more systematic evaluation of the OLUT performance and the scaling of the OLUT size and power consumption with the input and bit dimension of data. The upper speed at which these architectures can be driven will be also more accurately calculated. In particular, we will investigate the use of alloptical switches that should enable, in principle, much faster switching times. This preliminary work demonstrates the potential of silicon photonics CMOS compatible technology for optically assisted on-chip computation.

#### REFEFRENCES

- A. Shacham, K. Bergman, L.P. Carloni, "Photonic Networks-on-Chip for Future Generations of Chip Multi-Processors," IEEE Transactions on Computers 57 (9), pp. 1246-1260, 2008
- [2] C. Batten et A. Joshi, V. Stojanovic and Krste Asanovic, "Designing chip-level nanophotonic interconnection networks," IEEE Journal On Emerging and Selected Topics in Circuits and Systems vol.2(2), pp. 137-153, 2012
- [3] Q. Xu and R. Soref, "Reconfigurable optical directed-logic circuits using microresonator-based optical switches," Optics Express, vol. 19, no. 6, pp. 5244–5259, 2011
- [4] J. Rose, R.J. Francis, D. Lewis and P. Chow, "Architecture of Programmable Gate Arrays: The Effect of Logic Block Functionality on Area Efficiency," IEEE J. Solid State Circuits, pp. 1217-1225, 1990
- [5] Q. Xu, B. Schmidt, S. Pradhan, and M. Lipson, "Micrometer-scaler silicon electro-optic modulators," Nature 435, pp. 325-327, 2005
- [6] G. Wojcik et al., A single comb laser source for short reach WDM interconnects,SPIE Photonics West, 7230-21, 2009
- [7] J. Hardy and J. Shamir, "Optics inspired logic architecture," Optics Express, Vol. 15, pp. 150–165, 2007