Line Search-Based Inverse Lithography Technique for Mask Design

As feature size is much smaller than the wavelength of illumination source of lithography equipments, resolution enhancement technology (RET) has been increasingly relied upon to minimize image distortions. In advanced process nodes, pixelated mask becomes essential for RET to achieve an acceptable resolution. In this paper, we investigate the problem of pixelated binary mask design in a partially coherent imaging system. Similar to previous approaches, the mask design problem is formulated as a nonlinear program and is solved by gradient-based search. Our contributions are four novel techniques to achieve significantly better image quality. First, to transform the original bound-constrained formulation to an unconstrained optimization problem, we propose a new noncyclic transformation of mask variables to replace the wellknown cyclic one. As our transformation is monotonic, it enables a better control in flipping pixels. Second, based on this new transformation, we propose a highly efficient line search-based heuristic technique to solve the resulting unconstrained optimization. Third, to simplify the optimization, instead of using discretization regularization penalty technique, we directly round the optimized gray mask into binary mask for pattern error evaluation. Forth, we introduce a jump technique in order to jump out of local minimum and continue the search.

I would like to use this opportunity to thank those who helped me on my research and this thesis.
First of all, thank Dr. Chris Chu to give me this great opportunity to study my Master Degree in Iowa State University. And thank him to bring me such a advanced research topic in VLSI CAD area. During these years, his very kind guidance shows me the way how to setup a research topic, how to model a research problem, and how to solve a problem with a creative method. He gave me so many advices that would be very helpful for the rest of my life. Finally, I would like to dedicate this thesis to my wife without whose support I would not have been able to complete this work. viii ABSTRACT Following moore's law, microelectronic fabrication techniques have been developed to fabricate deep-submicron devices. Device feature size on wafer turns to be much smaller than the illumination source of nowadays widely used lithography equipments, which is 193 nm wavelength of UV(ultraviolet) light. Diffraction effects can not be avoided when transfer patterns from masks to wafers in the process of lithography because of the extremely small size of features. So the patterns transferred from masks to wafers surface are distorted very much, and it causes many problems, such as poly line end shortening or bridging which result in leakage or short circuit.
The industry has been investigating various alternatives, such as EUV(extreme ultraviolet) illumination source. However, the next generation of illumination source, EUV with a wavelength of about 13.5 nm, still has a long way to be put into practice. As a result, Resolution Enhancement Technology (RET) has been increasingly relied upon to minimize image distortions. In advanced process nodes, pixelated mask becomes essential for RET to achieve an acceptable resolution. In this thesis, we investigate the problem of pixelated binary mask design in a partially coherent imaging system. Similar to previous approaches, the mask design problem is formulated as a nonlinear program and is solved by gradient based search. Our contributions are four novel techniques to achieve significantly better image quality than state-of-the-art technology. First, to transform the original bound-constrained formulation to an unconstrained optimization problem, we propose a new non-cyclic transformation of mask variables to replace the well-known cyclic one. As our transformation is monotonic, it enables a better control in flipping pixels. Second, based on this new transformation, we propose a highly efficient ix line search based heuristic technique to solve the resulting unconstrained optimization problem. Third, we introduce a jump technique. As gradient based search techniques will get trapped at a local minimum, we introduce a new technique named jump in order to jump out of the local minimum and continue the search. It increases the chance to achieve a better result. Fourth, to simplify the optimization, instead of using widely used discretization regularization penalty technique, we directly round the optimized gray mask into binary mask for pattern error evaluation. Experiment results show that the results of state-of-the-art algorithm implemented by Ma and Arce [5] are 8.55% to 358.8% higher than ours.  This thesis focuses on the problem of photo lithography process. For manufacturing microelectronic circuits on the wafer, the structure information of a circuit need to be transferred on to wafer surface in photo lithography process. Fig. 1.3 shows a detailed PMOS device with cross section view and top view. The top view information will be converted into layout then made into mask.
A typical photo lithography processing system is shown in Fig. 1.4. A photo mask is  projected to the wafer which is covered by photoresists. Photoresists is a photosensitive materials. After photo-imaging and in developing process, it resists the action of certain chemicals in desired areas. In some area on wafer, if the light intensity received is below the threshold of the photoresist, the patterns will not be printed after development.
Otherwise, the patterns can be printed on wafers.

Optical Proximity Correction (OPC)
As semiconductor manufacturers move to advanced process nodes (especially 45 nm process and below), lithography has become a great challenge due to the fundamental constraints of optical physics. Because feature size is much smaller than the wavelength of illumination source (currently 193 nm), the image formed on wafer surface is distorted more and more seriously due to optical diffraction and interference phenomena. In a typical optical lithography processing system, as illustrated in Fig. 1   The image transformed from a mask onto a wafer surface is distorted seriously due to diffraction of the light. As well known that diffraction occurs whenever the wavelength is on the order of or smaller than the diffracting objects' size. So when the light passes the small features on the mask which are a scale similar to or smaller than its wavelength, diffraction makes the result image on the wafer blurred, and the light intensity seems to be sprayed out of the original region where it should be printed on the wafer surface.
The light does not travel in "ray" paths, as we usually assume, but can bend around the obstruction. While nowadays using 193nm wavelength of light source in the optical lithography processing system, the size of the features on the mask reached 45nm or even 22nm for The industry has been investigating various alternatives (e.g., EUV lithography, Ebeam lithography) but none of them is ready in foreseeable future. As a result, semiconductor manufacturers have no choice but to keep using the existing equipments in patterning the progressively smaller features.
Given the limitation of lithography equipments, and because the character of diffraction in optical lithography processing system can not be avoided, a method of modify the input mask before projecting is a most promising technology to improve the shape of the resulting image on the wafer.
The Resolution Enhancement Technology (RET) such as optical proximity correction (OPC), phase shift mask (PSM) and double patterning has been increasingly relied upon to minimize image distortions [12]. In recent years, pixelated mask, which allows great flexibility in the mask pattern, has become essential for OPC to achieve better resolution.  [2,9,10,3,4,6,5]. Granik [2] considered a constrained nonlinear formulation. Poonawala and Milanfar [9,8,10] proposed a unconstrained nonlinear formulation and employed a regularization framework to control the tone and complexity of the synthesized masks. Ma and Arce [3,5] presented a similar unconstrained nonlinear formulation targeting PSM. Ma and Arce [4,5] focused on partially coherent illumination and used singular value decomposition to expand the partially coherent imaging equation by eigenfunctions into a sum of coherent systems (SOCS). All works discussed above utilized the steepest descent method to solve the nonlinear programs, Ma and Arce [6] demonstrated that the conjugate gradient method is more efficient. Yu and Pan [13] is an exception to the mathematical programming approach. Instead, a model-based pixel flipping heuristic is proposed.

Contributions of This Thesis
The inverse lithography technique for OPC mask design has been proposed in [2] and [8] in 2006, and has been widely discussed in recent years as semiconductor manufacturers move to advanced process nodes. But so far there is not an effective search method proposed because of the complicated solution space of this problem. We introduce a novel transformation for mask pixel, which enables an effective line search technique.
In this thesis, we focus on design of pixelated binary mask in a partially coherent imaging system 1 . Similar to previous approaches, we formulate the problem as an unconstrained non-linear program and solve it by iterative gradient-based search. The main contributions of this paper are listed below: To transform the problem formulation from a bounded optimization to an unconstrained one, we propose a new non-cyclic transformation of mask variables to replace the widely-used cyclic one. Our transformation is monotonic and allows a better control of flipping pixels.
Based on this new transformation, we present a highly efficient line search based technique to solve the resulting unconstrained optimization. Because of the noncyclic nature of the transformation, it is not easy for our line search to be trapped at local minimum. Therefore our algorithm can find much better binary masks for the inverse lithography problem.
We introduce a jump technique. As gradient based search techniques will be trapped at a local minimum, we introduce a new technique named jump in order to jump out of the local minimum and continue the search.
We apply a direct rounding technique to regularize gray masks into binary ones instead of adding a discretization regularization penalty to the cost function as in [9] and [5]. This simplifies the computation and achieves better results as the experiment results show.
The rest of this thesis is organized as follows: The formulation of the inverse lithography for OPC problem is explained in Chapter 2. Chapter 3 describes in details the flow of our algorithm and the four novel techniques that we proposed. Chapter 4 presents the experimental results. The thesis is concluded in Chapter 5.

CHAPTER 2. Problem Formulation
In an optical lithography system, a photo mask is projected to a silicon wafer through optical lens. An aerial image of the mask is then formed on the wafer surface, which is covered by photoresist. After developing and etching, a pattern similar to the one on the mask is formed on the wafer surface. To simulate the pattern formation on the wafer surface for a given mask, we first describe a projection optics model and a photoresist model below. After that, we present the formulation of the mask design problem. In this thesis, we use pixelated binary mask.

Projection Optics Model
The projection optics system is modeled by three kinds of imaging system. They are coherent imaging system, incoherent imaging system and partially coherent imaging system.
For the coherent imaging system, the illumination source is simplified to be just a single point. The intensity distribution of this kind of imaging system is modeled by Eq. (2.1), in which I is the aerial image, which means light intensity distribution when projecting a pixelated mask M, h is the spatial kernel for convolution, and ⊗ indicates convolusion operation. This computation can also be illustrated in Fig (2.1) The Hopkins diffraction model [1] is widely used to approximate this kind of optics systems. To reduce the computational complexity of the Hopkins diffraction model, the Fourier series expansion model [11] is a common approach. In this thesis, we followed this model.
The Fourier series expansion model approximates the partially coherent imaging system as a sum-of-coherent system (SOCS). Based on this model, the computation of the aerial image I of a pixelated mask M is given in Eq. (2.4) and illustrated in Fig. 2.4 [5].
In here, the dimensions of the pixelated mask and the image are m × n. The illumination source is partitioned into N × N sources. u and h are the Fourier series coefficients and spatial kernels, respectively.
Note that the convolution h ⊗ M can be achieved in frequency domain using Fast Fourier transform FFT and inverse Fast Fourier transform FFT −1 as shown below

Photoresist Model
To model the reaction of the photoresist to the intensity of light projected on it, we use the constant threshold model below (2.6) where I i and z i are the light intensity and the corresponding reaction result of the photoresist at pixel i on the wafer surface, respectively, and t r is the threshold of the photoresist. This means if the light intensity received at pixel i is higher then the threshold t r , the photoresist will react and the image pattern will be printed on this pixel.
Thus the pattern z formed on the wafer surface can be expressed as a function of the mask M based on Eq. (2.5) and Eq. (2.6). In order to make z differentiable so that gradient based search can be applied, we approximate the constant threshold model above with the sigmoid function where the parameter a determines the steepness of the sigmoid function around x = v.
The larger value of a is, the steeper and hence the closer to the constant threshold model the sigmoid function will be. The sigmoid function with a = 10, v = 0 is illustrated in Using the sigmoid function, the reaction of the photoresist at pixel i for a mask M is (2.8)

Our Inverse Lithography Problem Formulation
Inverse lithography treats mask design as an inverse problem of imaging. Given a target patternẑ, the problem is to find a mask M * such that the corresponding pattern z(M * ) on the wafer surface is as close toẑ as possible [7].
The error between the target patternẑ and the generated pattern z(M) for any mask M is commonly defined as So the inverse lithography problem is to find a mask to minimize the error between the target pattern and the generated pattern. It is formulated as  In this thesis, we apply this iterative gradient-based search method, which is outlined in Fig. 3 Determine the step size S // Sec. 3.2 5.

Novel Transformation for Mask Pixel
As explained above, to convert the inverse lithography problem into an unconstrained optimization problem, we need a transformation T : R → [0, 1]. Then we can use an unbounded variable β i to represent each pixel based on M i = T (β i ).
One such transformation is proposed by Poonawala and Milanfar [9]: This idea is widely adopted by later works [10,3,4,6]. We call it the cosine transformation.
In gradient-based search, a line search along the search direction is typically performed to determine the step size S to get to a local minimum (step 4 in Fig. 3.1).
The line search will be more effective if the function E(M) along the search direction is smooth and, better yet, convex. However, the cosine transformation is a cyclic function.
It is clearly not an one-to-one transformation. By increasing the value of β i , M i changes its value between 0 and 1 periodically. As a result, when β is moving along any direction, E(M) may keep jumping up and down as M i keeps switching between 0 and 1.
To illustrate this, we consider the algorithm described in Chapter 7 of Ma and Arce [5], which solved the same problem formulation as our paper. It also applied the steepest descent method but it used the cosine transformation. The pattern error function Eq.
(2.9) is turned into Eq. (3.2) below Using the software code and the target pattern (as shown in Fig. 3.2) provided by [5], when β is moved along the negative gradient direction of Eq. is to set the step size to some fine-tuned constant instead of computed by line search.
We propose a new transformation for M i based on the sigmoid function (Eq. (2.7)): where A is the steepness control parameter and T R specifies the transition point of the function. A larger A will cause the pixel values to be closer to 0 or 1. T R can be set to any value and is set to 0 in this paper. We call this the sigmoid transformation. As the sigmoid transformation is a strictly increasing function, when β is moved along any direction, each mask pixel is flipped at most once.  Image Pattern

Error
Step Size  Step Size Based on the sigmoid transformation, the gradient of Eq. (2.9) is where 1= [1, ..., 1] T ∈ R m×n , is the element-by-element multiplication operator and h *

Highly Efficient Line Search Technique
In this section, we present a highly efficient line search technique to determine the step size in Step 4 of the algorithm in Fig. 3.1 to minimize pattern error. We observe that in each iteration, the shape of the function E(β) along the direction of negative gradient is almost always like the curve shown in Fig. 3.5. We employ golden section method for line search. Golden section search is an iterative technique which successively narrows the search range.
Because the final optimized mask should be a binary one, we need to round the gray mask, which is given by Eq. (3.3), to binary according to some rounding threshold t m .
In other words, where M binary is the resulting binary mask. Here we simply set t m to 0.5.
When moving along the negative gradient direction, as the value of each pixel M i is changed monotonically due to our new transformation, by setting a step size S, we can easily control the number of pixels flipped (i.e., changed from below t m to above t m or vice versa) during line search. This idea is explained below.
Given the current mask specified by β and the negative gradient direction d, Eq.

(3.3) can be written as a function of S as
(3.6) By substituting Eq. (3.6) into Eq. (3.5) and rearranging, we get is the threshold on step size for flipping pixel i. At the current mask, if a pixel's value is less than t m and its negative gradient is positive, or if a pixel's value is larger than t m and its negative gradient is negative, then the pixel will be flipped when we apply a step size S larger than S i . Other pixels are un-flippable no matter how large step size S we use. So it is easy to determine how many pixels can be flipped. To control the number of pixels flipped during golden section search, we first mark all flippable pixels along the negative gradient direction. Then we calculate the threshold on step size, S i , for each flippable pixel i. By sorting these thresholds from smallest to largest, the number of pixels flipped can be controlled by setting the value of S. For example, by using the 50th value of the sorted thresholds as the step size S, 50 pixels will be flipped along the negative gradient. In golden section search, the minimum and maximum sorted thresholds can be used to define the search region. In this thesis, we use a segment in the region from the minimum to maximum sorted thresholds as our search region. The detail will be discussed in Section 4

Jump Technique
By comparing the Fig. 3.3 and Fig. 3.5, it is so obvious that by applying our new sigmoid transformation, along the negative gradient direction, our efficient algorithm of line search technique is not so easy to be trapped into local minima. Because of the non-cyclic nature of our transformation, the solution space is not so rugged. However, the solution space is still extremely complicated with many local minima because of the complexity of the problem itself. As that gradient based search techniques will be trapped at a local minimum, we introduce a new technique named jump in order to jump out of the local minimum and continue the search. A simple illustration in 1-dimension is shown in Fig. 3.7. That is, during the line search process, if the algorithm cannot find a better solution along the search direction (i.e., be trapped at some local minimum), instead of terminating, it will jump along the search direction with a large step size to a probably worse solution. Then the algorithm will continue the gradient based search starting from the new solution. If the step size is large enough, it is likely that the algorithm will not converge to the previous local minimum. At the end, the algorithm will return the best local minimum that it has found.

Directly Rounding of Gray Mask
In order to apply gradient-based approach, it is unavoidable to relax the integer constraints as discussed at the beginning of Chapter 3. As a result, the optimized mask becomes a gray one. Because our goal is to generate a binary mask, the optimized gray mask has to be rounded to a binary one at last. A regularization framework was proposed in Section IV.A of [9] and also in Chapter 6.1 of [5] to bias the output gray mask to be closer to binary. This regularization approach adds to the objective function (i.e., Eq. (2.9)) a quadratic penalty term for each pixel. However, it is still hard to control the change in the image pattern error caused by the rounding of the gray mask at the end.
The optimized gray mask may achieve a low pattern error. However, after rounding the gray mask into binary, the pattern error often increases dramatically. Instead of using the quadratic penalty regularization framework, we propose to directly round the optimized gray mask into a binary one in each iteration before evaluating the pattern error. In this way, we simplify the objective function and also guarantee that our search will not be misled by inaccurate pattern error values.

CHAPTER 4. Experimental Results
We compare an implementation of our algorithm with the program developed by Ma and Arce [5]. Both of the programs are coded in Matlab and executed on an Intel Xeon(R) X5650 2.67GHz CPU. The runtime reported is CPU time and the programs are restricted to use a single core when running in Matlab.
In [5], the program uses cosine transformation and a preset step size of 2. Besides, it applies regularization with a quadratic penalty term as mentioned in Section 3.4 and complexity penalty term which can be found in Section IV.B of [9] and also in Chapter 6.2 of [5]. To have a fair comparison, our program applies the complexity penalty regularization too. But as we mentioned in Section 3.4, our program does not apply the quadratic penalty regularization, but directly rounds the optimized gray mask into binary one instead whenever pattern error is evaluated. In [5], the stopping criteria of gradient-based search is set in an ad hoc manner according to the target pattern. In order to fairly compare the two programs on various masks, the same stopping criteria is applied to both programs. The criteria is that if the average pattern error over the last 30 iterations is larger than the average pattern error over the 30 iterations before that, the program will stop. For evaluation of pattern error in both programs, in each iteration, the optimized gray mask is rounded using Eq. (3.5). We use the same convolution kernel h as the Matlab program of [5]. For our program, based on our observation of many experiments, for the first iteration of gradient-based search, the minimum pattern error can almost always be achieved in the region by flipping less than 10% of all pixels. One example is showed in Fig. 3.5 and Fig. 3.6, where the minimum pattern error is at about 7.7%. Then in the later iterations, this region remains about the same or keeps shrinking. So for the first 2 iterations, we set the initial search region of golden section search to be the region in which the first 10% of overall pixels can be flipped along the negative gradient direction. Our program keeps recording the minimum location which is found in each iteration to guide the search region for next iteration. For example, if in the current iteration, the minimum error is found at 5% of the overall pixels flipped, to be on the safe side, the search region of the next iteration will be automatically set as 1.5 times of 5%, which is 7.5%, of the pixels flipped. To prevent this search region to shink too small, we set a minimum as 2%. For the stop criteria of the golden section search, we set it as 0.25%, which means when the search region shrinks to or below 0.25%, our program will stop searching. As mentioned above for the jump technique, if our program cannot find a better solution along the search direction until reach the stop searching criteria(i.e., be trapped at some local minimum), our program will take the best solution except the starting point of that line search as a new solution, although it is actually a worse solution. This means one jump. Our program temporally accepts this worse solution as a new solution and continue the gradient based search starting from this new solution. At the end, our grogram will return the best local minimum that it has found.
The comparison of pattern error of optimized binary masks and runtime between our program and that of [5] is shown in Table 4.1. We use 9 binary image patterns as predefined targets. For the illumination source, both programs use annular illumination source as shown in Fig. 2 The pattern errors reported are calculated according to the best binary mask generated for both programs. The runtimes listed in the last two columns are based on the stopping criteria mentioned above. As the table shows, our program always generates better optimized binary mask which has significantly less pattern error. The pattern errors of [5] are higher than ours by 8.55% to 358.80%, with an average of 97.61%.
Moreover, [5]'s program uses 4.49% more runtime than our program on average.
We report the pattern error of the final binary mask generated for [5]'s program with our program's runtime in column 6 of the  [5]'s program will converge to better solutions if more runtime is allowed, we change the stopping criteria to let it run until the same runtime as our program. The result shows that the error gets worse in all 3 cases. If the pattern error of the best binary mask generated is reported instead, the result will be exactly the same as column 5. It indicates that the program fails to get out of the local minima even with more time.
Target pattern No.1 is obtained from [5] as shown in Fig. 3 Because the program we obtained from [5] is fine tuned for the target pattern No.1 which is also obtained from [5], the experiment result of our algorithm is not so much better than [5]'s. However, based on the experiment results of other target patterns which cover multiple mask sizes, pixel sizes, and feature sizes, our algorithm has overwhelming advantage due to the application of line search engine which is enabled by our novel transformation for mask pixel.

CHAPTER 5. Conclusion
In this thesis, we introduced a highly efficient gradient-based search technique to solve the inverse lithography problem. We propose a new non-cyclic transformation of mask variables to replace the well-known cyclic one. Our transformation is monotonic, and it enables a much better control in flipping pixels and the use of line search to minimize the pattern error. we introduced a new technique named jump in order to jump out of the local minimum and continue the search. We used direct rounding technique to simplify the optimization. The experimental results showed that our technique is significantly more effective than the state-of-the-art techniques. It produces better binary masks in a similar runtime. The four techniques we proposed should be applicable to other iterative gradient-based search approaches like conjugate gradient method. We plan to incorporate our techniques into other search methods in the future.