{"title": "Proximity Effect Corrections in Electron Beam Lithography Using a Neural Network", "book": "Advances in Neural Information Processing Systems", "page_first": 443, "page_last": 449, "abstract": null, "full_text": "Proximity Effect Corrections in Electron Beam \n\nLithography Using a Neural Network \n\nRobert C. Frye \nAT &T Bell Laboratories \n600 Mountain A venue \nMurray Hill. NJ 08854 \n\nKevin D. Cummings* \nAT&T Bell Laboratories \n600 Mountain Avenue \nMurray Hill. NJ 08854 \n\nEdward A. Rietman \nAT&T Bell Laboratories \n600 Mountain A venue \nMurray Hill. NJ 08854 \n\nAbstract \n\nWe have used a neural network to compute corrections for images written \nby electron beams to eliminate the proximity effects caused by electron \nIterative methods are effective. but require prohibitively \nscattering. \ncomputation time. We have instead trained a neural network to perform \nequivalent corrections. resulting in a significant speed-up. We have \nexamined hardware \nimplementations using both analog and digital \nelectronic networks. Both had an acceptably small error of 0.5% compared \nto the iterative results. Additionally. we verified that the neural network \ncorrectly generalized the solution of the problem to include patterns not \ncontained in its training set. We have experimentally verified this approach \non a Cambridge Instruments EBMF 10.5 exposure system. \n\n1 INTRODUCTION \nScattering imposes limitations on the minImum feature sizes that can be reliably \nobtained with electron beam lithography. Linewidth corrections can be used to control \nthe dimensions of isolated features (i.e. \nintraproximity. Sewell. 1978). but meet with \nlittle success when dealing with the same features in a practical context. where they are \nsurrounded by other features (i.e. \ninterproximity). Local corrections have been \nproposed using a self-consistent method of computation for the desired incident dose \npattern (parikh. 1978). Such techniques require inversion of large matrices and \nprohibitive amounts of computation time. Lynch et al .\u2022 1982. have proposed an \nanalytical method for proximity corrections based on a solution of a set of approximate \nequations, resulting in a considerable improvement in speed. \nThe method that we present here, using a neural network. combines the computational \nsimplicity of the method of Lynch et al. with the accuracy of the self-consistent \nmethods. The first step is to determine the scattered energy profile of the electron \nbeam which depends on the substrate structure, beam size and electron energy. This is \n\n\u2022 Present address: Motorola Inc. Phoenix Corporate Research Laboratories, 2100 East Elliot Rd. Tempe, \nAZ 85284. \n\n443 \n\n\f444 \n\nFrye, Cummings, and Rietman \n\nthen used to compute spatial vanatlons in the dosage that result when a particular \nimage is scattered. This can be used to iteratively compute a corrected image for the \ninput pattern. The goal of the correction is to adjust the written image so that the \nincident pattern of dose after scattering approximates the desired one as closely as \npossible. We have used this iterative method on a test image to form a training set for \na neural network. The architecture of this network: was chosen to incorporate the basic \nmathematical structure as the analytical method of Lynch et ai., but relies on an \nadaptive procedure to determine its characteristic parameters. \n\n2 CALCULATING PROXIMITY CORRECTED PATTERNS \nWe determined the radial distribution of scattered dose from a single pixel by using a \nMonte-Carlo simulation for a variety of substrates and electron beam energies \n(Cummings, 1989). As an example problem, we looked at resist on a heavy metal \nsubstrate. (These are of interest in the fabrication of masks for x-ray lithography.) For \na 20 KeV electron beam this distribution, or \"proximity function,\" can be approximated \nby the analytical expression \n\nI \n\nfer) = 1t(I+v+~) \n\n[ e-(r/a'i \n\n0.2 \n\n+ \n\nwhere \n\na. = 0.038 Jlm, \n\n'Y = 0.045 Jlm, ~ = 0.36 Jlm, v = 3.49 and ~ = 6.42. \n\nThe unscattered image is assumed to be composed of an array of pixels, Io(x,y). For a \nbeam with a proximity function fer) like the one given above, the image after scattering \nwill be \n\nls(x,y) = L L Io(x-m,y-n) f\u00abm2+n2)'/.), \n\n00 \n\n00 \n\nwhich is the discrete convolution of the original image with the lineshape fer). The \napproach suggested by analogy with signal processing is to deconvolve the image by \nan inverse filtering operation. This method cannot be used, however, because it is \nimpossible to generate negative amounts of electron exposure. Restricting the beam to \npositive exposures makes the problem inherently nonlinear, and we must rely instead \non an iterative, rather than analytical, solution. \nFigure 1 shows the pattern that we used to generate a training set for the neural \nnetwork. This pattern was chosen to include examples of the kinds of features that are \ndifficult to resolve because of proximity effects. Minimum feature sizes in the pattern \nore 0.25 Jlm and the overall image, using 0.125 Jlm pixels, is 180 pixels (22.5 J.lm) on a \nside, for a total of 32,400 pixels. The initial incident dose pattern for the iterative \ncorrection of this image started with a relative exposure value of 100% for exposed \npixels and 0 for unexposed ones. The scattered intensity distribution was computed \nfrom this incident dose using the discrete two-dimensional convolution with the \nsummation truncated to a finite range, roo For the example proximity function 95% of \nthe scattered intensity is contained within a radius of 1.125 J.lm (9 pixels) and this \nvalue was used for roo The scattered intensity distribution was computed and compared \nwith the desired pattern of 100% for exposed and 0 for unexposed pixels. The \n\n\fProximity Effect Corrections in Electron Beam Lithography \n\n445 \n\ndifference between the resulting scattered and desired distributions is the error. This \nerror was subtracted from the dose pattern to be used for the next iteration. However, \nsince negative doses are not allowed, negative regions in the correction were truncated \nto zero. \n\n~ IIIII \n\nIIIII \n11111 \n11111 \n\n\u2022\u2022\u2022\u2022\u2022 \n\u2022\u2022\u2022\u2022\u2022 \n\n1 \n\n180 \npixels \n\n1 \n\n. , \n\n~m \n\nFigure 1: Training pattern \n\nUsing this algorithm, a pixel that receives a dosage that is too small will have a \nnegative error, and on the next iteration its intensity will be increased. Unexposed \npixels (i.e. regions where resist is to be removed) will always have some dosage \nscattered into them from adjacent features, and will consequently always show a \npositive error. Because the written dose in these regions is always zero, rather than \nnegative, it is impossible for the iterative solution to completely eliminate the error in \nthe final scattered distribution. However, the nonlinear exposure properties of the resist \nwill compensate for this. Moreover, since all exposed features receive a uniform dose \nafter correction, it is possible to choose a resist with the optimal contrast properties for \nthe pattern. \n\nAlthough this iterative method is effective, it is also time consuming. Each iteration on \nthe test pattern required about 1 hour to run on a 386 based computer. Four iterations \nwere required before the smallest features in the resist were properly resolved. Even \nthe expected order of magnitude speed increase from a large mainframe computer is \nnot sufficient to correct the image from a full sized chip consisting of several billion \npixels. The purpose of the neural network is to do these same calculations, but in a \nmuch shorter time. \n\n3 NETWORK ARCHITECTURE AND TRAINING \nFigure 2 shows the relationship between the image being corrected and the neural \nnetwork. The correction for one pixel takes into account the image surrounding it. \nSince the neighborhood must include all of the pixels that contribute appreciable \nscattered intensity to the central pixel being corrected, the size of the network was \ndetermined by the same maximum radius, ro = 1.125 Ilm, that characterized the \nscattering proximity function. The large number of inputs would be difficult to manage \nin an analog network if these inputs were general analog signals, but fortunately the \ninput data are binary, and can be loaded into an analog network using digital shift \nregisters. \n\n\f446 \n\nFrye, Cummings, and Rietman \n\nFigure 3 shows a schematic diagram of the analog network. The binary signals from \nthe shift registers representing a portion of the image were connected to the buffer \namplifiers through 10 Kil resistors. Each was connected to only one summing node, \ncorresponding to its radial distance from the center pixel. This stage converted the 19 x \n19 binary representation of the image into 10 analog voltages that represented the \nradial distribution of the surrounding intensity. The summing amplifier at the output \nwas connected to these 10 nodes by variable resistors. This resulted in an output that \nwas a weighted sum of the radial components. \n\nsumming \n\nnode \n\ncorrected \n\npixel \n\nL..--__ networl\\-k -----' \npixel to be \ncorrected \n\nFigure 2: Network configuration \n\nfixed \n\nweights \n1 \n\n1 \no \nO<>-'YV'f-.....:~ \n1 \n1 \nbinary 0 \ninputs 1 \n!a(x,y) 0 \n1 \n1 \no \no \no \n\n\\ 7\"\"~-o-I >-J.{Ir-.. \n\n~~--~~ \n\nanalog \noutput \n\nbuffer \n\namplifiers \n\nFigure 3: Schematic diagram of the analog network \n\n\fProximity Effect Corrections in Electron Beam Lithography \n\n447 \n\nFunctionally. this network does the operation \n\n9 \n\nV O\\1t = ~ wrr' \n\nr-=O \n\nr \n\nwhere wr are the weight coefficients set by the adjustable resistors and <10> are the \naverage values of the pixel intensity at radius r. The form of this relationship is \nidentical to the one proposed by Lynch et al. but uses an adaptive method. rather than \nan analytical one. to detennine the coefficients wr' \nThe prototype analog hardware network was built on a wire wrap board using \n74HCl64 8 bit CMOS static shift registers and LM324 quad operational amplifiers for \nthe active devices. The resistors in the first layer were 10 KO thin-film resistors in \ndual-in-line packages and had a tolerance of 1%. The ten adjustable resistors in the \nsecond layer of the network were 10 turn precision trimmers. Negative weights were \nmade by inverting the sign of the voltage at the buffer amplifiers. For comparison. we \nalso evaluated a digital hardware implementation of this network. It was implemented \non a floating point array processor built by Eighteen Eight Laboratories using an \nAT &T DSP-32 chip operating at 8 MFLOPs peak rate. The mathematical operation \nperfonned by the network is equivalent to a two-dimensional convolution of the input \nimage with an adaptively learned floating point kernel. \nThe adjustable weight values for both networks were determined using the delta rule of \nWidrow and Hoff (1960). For each pixel in the trial pattern of Figure I there was a \ncorresponding desired output computed by the iterative method. Each pixel in the test \nimage. its surroundings and corresponding analog corrected value (computed by the \niterative method) constituted a single learning trial. and the overall image contained \n32,400 of them. We found that the weight values stabilized after two passes through \nthe test image. \n\n4 NEURAL NETWORK PERFORMANCE \nThe accuracy of both the analog and digital networks. compared to the iterative \nsolution. was comparable. Both showed an average error for the test image of 0.5%, \nand a maximum error of 9% on any particular pixel. The accuracy of the networks on \nimages other than the one used to train them was comparable. averaging about 0.5% \noverall. \nConvolution with an adaptively-leamed kernel \nitself a relatively efficient \ncomputational algorithm. The iterative method required 4 hours to compute the \ncorrection for \nthe 32,400 pixel example. Equivalent results were obtained by \nconvolution in about 6.5 minutes using the same computer. Examination of the \nassembled code for the software network showed that the correction for each pixel \nrequired the execution of about 30 times fewer instructions than for the iterative \nmethod. \nThe analog hardware generated corrections for the same example in 13.5 seconds. \nAlmost 95% of this time was used for input/output operations between the network and \nthe computer. It was the time required for the I/O. rather than the speed of the circuit, \nthat limited the dynamic perfonnance of this system. Clearly. with improved I/O \nhardware. the analog network could be made to compute these corrections much more \nquickly. \n\nis \n\n\f448 \n\nFrye, Cummings, and Rietman \n\nThe same algorithm, running on the digital floating point array processor perfonned the \ncorrection for this example problem in 4.5 seconds. The factor of three improvement \nover the analog hardware was primarily a result of the decreased time needed for I/O \nin the DSP-based network. The digital network was not appreciably more accurate than \nthe analog one, indicating that the overall accuracy of operation was determined \nprimarily by the network architecture rather than by limitations in the implementation. \nThese results are summarized in Table 1. \n\nTable 1: Comparison of computational speed for various methods. \n\nMETHOD \n\nIteration \nSoftware network \nAnalog hardware network \nDigital hardware network \n\nSPEED \n\n6 years /mm2 \n100 days /mm2 \n2 days /mm2 \n18 hours /mm2 \n\n5 EXPERIMENTAL VERIFICATION \nRecently, we have evaluated \nthis method experimentally using a Cam bridge \nInstruments EB:r..1F 10.5 exposure system (Cummings, et al., 1990). The test image \nwas 1 mm2 and contained 11,165 Cambridge shapes and 6.7x107 pixels. The substrate \nwas silicon with 0.5 Jlm of SAL601-ER7 resist exposed at 20 KeV beam energy. The \nrange of the scattered electrons is more than three times greater for these conditions \nthan in the tests described above, requiring a network about ten times larger. The \nneural network computations were done using the digital floating point array processor, \nand required about 18 hours to correct the entire image. Input to the program was \nCambridge source code, which was converted to a bit-mapped array, corrected by the \nneural network and then decomposed into new Cambridge source code. \nFigure 4 shows SEM micrographs comparing one of the test structures written with \nand without the neural network correction. This test structure consists of a 10 Jlm \nsquare pad next to a 1 Jlm wide line, separated by a gap of 0.5 Jlm. Note in the \nuncorrected pattern that the line widens in the region adjacent to the large pad, and the \nwebs of resist extending into the gap. This is caused by excess dosage scattered into \nthese regions from the large pad. In the corrected pattern, the dosage in these regions \nhas been adjusted, resulting in a uniform exposure after scattering and greatly \nimproved pattern resolution. \n\n6 CONCLUSIONS \nThe results of our trial experiments clearly demonstrate the computational benefits of a \nneural network for this particular application. The trained analog hardware network \nperformed the corrections more than 1000 times faster than the iterative method using \nthe same computer, and the digital processor was 3000 times faster. This technique is \nreadily applicable to a variety of direct write exposure systems that have the capability \nto write with variable exposure times. \nImplementation of the network on more \nsophisticated computers with readily available coprocessors can directly lead to another \norder of magnitude improvement in speed, making it practical to correct full chip-sized \nimages. \n\n\fProximity Effect Corrections in Electron Beam Lithography \n\n449 \n\nThe performance of the analog network suggests that with improved speed of I/O \nbetween the computer and the network, it would be possible to obtain much faster \noperation. The added flexibility and generality of the digital approach, however, is a \nconsiderable advantage. \n\n1.0._ \n\nH \n\nI \n\n$'~ \n\nFigure 4: Comparison of a test structure written with and without correction \n\nAcknowledgments \nWe thank S. Waaben and W. T. Lynch for useful discussions, suggestions and \ninformation, and J. Brereton who assisted in building the hardware and trial patterns \nfor initial evaluation. We also thank C. Biddick, C. Lockstampfor, S. Moccio and B. \nVogel for technical support in the experimental verification. \n\nReferences \nH. Sewell, \"Control of Pattern Dimensions in Electron Lithography,\" J. Vac. Sci. \nTechnol. 15, 927 (1978). \n\nM. Parikh, \"Self-Consistent Proximity Effect Correction Technique for Resist Exposure \n(SPECTRE),\" J. Vac. Sci. Technol. 15, 931 (1978). \n\nW.T. Lynch, T. E. Smith and W. Fichtner, \"An Algorithm for Proximity Effecl \nCorrection with E-Beam Exposure,\" InCt. Conf. on Microlithography, Microcircuit \nEngineering pp 309-314, Grenoble (1982). \n\nK. D. Cummings \"Determination of Proximity Parameters for Electron Beam \nLithography,\" AT&T Bell Laboratories Internal Memorandum. \n\nB. Widrow and M. E. Hoff, \"Adaptive Switching Circuits,\" IRE WESCON Convention \nRecord, Part 4, 96-104 (1960). \n\nK. D. Cummings, R. C. Frye and E. A. Rietman, \"Using a Neural Network to \nProximity Correct Patterns Written with a Cambridge EBMF 10.5 Electron Beam \nExposure System,\" Applied Phys. Lett. 57 1431 (1990). \n\n\f", "award": [], "sourceid": 377, "authors": [{"given_name": "Robert", "family_name": "Frye", "institution": null}, {"given_name": "Kevin", "family_name": "Cummings", "institution": null}, {"given_name": "Edward", "family_name": "Rietman", "institution": null}]}