{"title": "Signature Verification using a \"Siamese\" Time Delay Neural Network", "book": "Advances in Neural Information Processing Systems", "page_first": 737, "page_last": 744, "abstract": "", "full_text": "Signature Verification using a \"Siamese\" \n\nTime Delay Neural Network \n\nJane Bromley, Isabelle Guyon, Yann LeCun, \n\nEduard Sickinger and Roopak Shah \n\nAT&T Bell Laboratories \n\nHolmdel, N J 07733 \n\njbromley@big.att.com \n\nCopyrighte, 1994, American Telephone and Telegraph Company used by permission. \n\nAbstract \n\nThis paper describes an algorithm for verification of signatures \nwritten on a pen-input tablet. The algorithm is based on a novel, \nartificial neural network, called a \"Siamese\" neural network. This \nnetwork consists of two identical sub-networks joined at their out(cid:173)\nputs. During training the two sub-networks extract features from \ntwo signatures, while the joining neuron measures the distance be(cid:173)\ntween the two feature vectors. Verification consists of comparing an \nextracted feature vector ~ith a stored feature vector for the signer. \nSignatures closer to this stored representation than a chosen thresh(cid:173)\nold are accepted, all other signatures are rejected as forgeries. \n\n1 \n\nINTRODUCTION \n\nThe aim of the project was to make a signature verification system based on the \nNCR 5990 Signature Capture Device (a pen-input tablet) and to use 80 bytes or \nless for signature feature storage in order that the features can be stored on the \nmagnetic strip of a credit-card. \nVerification using a digitizer such as the 5990, which generates spatial coordinates \nas a function of time, is known as dynamic verification. Much research has been \ncarried out on signature verification. Function-based methods, which fit a func(cid:173)\ntion to the pen trajectory, have been found to lead to higher performance while \nparameter-based methods, which extract some number of parameters from a signa-\n\n737 \n\n\f738 \n\nBromley, Guyon, Le Cun, Sackinger, and Shah \n\nture, make a lower requirement on memory space for signature storage (see Lorette \nand Plamondon (1990) for comments). We chose to use the complete time extent \nof the signature, with the preprocessing described below, as input to a neural net(cid:173)\nwork, and to allow the network to compress the information. We believe that it is \nmore robust to provide the network with low level features and to allow it to learn \nhigher order features during the training process, rather than making heuristic de(cid:173)\ncisions e.g. such as segmentation into balistic strokes. We have had success with \nthis method previously (Guyon et al., 1990) as have other authors (Yoshimura and \nYoshimura, 1992). \n\n2 DATA COLLECTION \n\nAll signature data was collected using 5990 Signature Capture Devices. They consist \nof an LCD overlayed with a transparent digitizer. As a guide for signing, a 1 inch by \n3 inches box was displayed on the LCD. However all data captured both inside and \noutside this box, from first pen down to last pen up, was returned by the device. \nThe 5990 provides the trajectory of the signature in Cartesian coordinates as a \nfunction of time. Both the trajectory of the pen on the pad and of the pen above \nthe pad (within a certain proximity of the pad) are recorded. It also uses a pen \npressure measurement to report whether the pen is touching the writing screen or \nis in the air. Forgers usually copy the shape of a signature. Using such a tablet \nfor signature entry means that a forger must copy both dynamic information and \nthe trajectory of the pen in the air. Neither of these are easily available to a forger \nand it is hoped that capturing such information from signatures will make the task \nof a forger much harder. Strangio (1976), Herbst and Liu (1977b) have reported \nthat pen up trajectory is hard to imitate, but also less repeatable for the signer. \nThe spatial resolution of signatures from the 5990 is about 300 dots per inch, the \ntime resolution 200 samples per second and the pad's surface is 5.5 inches by 3.5 \ninches. Performance was also measured using the same data treated to have a lower \nresolution of 100 dots per inch. This had essentially no effect on the results. \n\nData was collected in a university and at Bell Laboratories and NCR cafeterias. \nSignature donors were asked to sign their signature as consistently as possible or \nto make forgeries. When producing forgeries, the signer was shown an example \nof the genuine signature on a computer screen. The amount of effort made in \nproducing forgeries varied. Some people practiced or signed the signature of people \nthey knew, others made little effort. Hence, forgeries varied from undetectable to \nobviously different. Skilled forgeries are the most difficult to detect, but in real life a \nrange of forgeries occur from skilled ones to the signatures of the forger themselves. \nExcept at Bell Labs., the data collection was not closely monitored so it was no \nsurprise when the data was found to be quite noisy. It was cleaned up according to \nthe following rules: \n\n\u2022 Genuine signatures must have between 80% and 120% of the strokes of \nthe first signature signed and, if readable, be of the same name as that \ntyped into the data collection system. (The majority of the signatures were \ndonated by residents of North America, and, typical for such signatures, \nwere readable.) The aim of this was to remove signatures for which only \n\n\fSignature Verification Using a \"Siamese\" Time Delay Neural Network \n\n739 \n\nsome part of the signature was present or where people had signed another \nname e.g. Mickey Mouse. \n\n\u2022 Forgeries must be an attempt to copy the genuine signature. The aim of \nthis was to remove examples where people had signed completely different \nnames. They must also have 80% to 120% of the strokes of the signature . \n\n\u2022 A person must have signed at least 6 genuine signatures or forgeries. \n\nIn total, 219 people signed between 10 and 20 signatures each, 145 signed genuines, \n74 signed forgeries. \n\n3 PREPROCESSING \n\nA signature from the 5990 is typically 800 sets of z, y and pen up-down points. z(t) \nand y(t) were originally in absolute position coordinates. By calculating the linear \nestimates for the z and y trajectories as a function of time and subtracting this from \nthe original z and y values, they were converted to a form which is invariant to the \nposition and slope of the signature. Then, dividing by the y standard deviation \nprovided some size normalization (a person may sign their signature in a variety \nof sizes, this method would normalize them). The next preprocessing step was to \nresample, using linear interpolation, all signatures to be the same length of 200 \npoints as the neural network requires a fixed input size. Next, further features were \ncomputed for input to the network and all input values were scaled so that the \nmajority fell between + 1 and -1. Ten different features could be calculated, but a \nsubset of eight were used in different experiments: \n\nfeature 1 pen up = -1 i pen down = +1, (pud) \nfeature 2 x position, as a difference from the linear estimate for x(t), normalized using \n\nthe standard deviation of 1/, (x) \n\nfeature 3 y position, as a difference from the linear estimate for y(t), normalized using \n\nthe standard deviation of 1/, (y) \n\nfeature 4 speed at each point, (spd) \nfeature 5 centripetal acceleration, (ace-c) \nfeature 6 tangential acceleration, (acc-t) \nfeature 7 the direction cosine of the tangent to the trajectory at each point, (cosS) \nfeature 8 the direction sine of the tangent to the trajectory at each point, (sinS) \nfeature 9 cosine of the local curvature of the trajectory at each point, (cost/J) \nfeature 10 sine of the local curvature of the trajectory at each point, (sint/J) \n\nIn contrast to the features chosen for character recognition with a neural network \n(Guyon et al., 1990), where we wanted to eliminate writer specific information, the \nfeatures such as speed and acceleration were chosen to carry information that aids \nthe discrimination between genuine signatures and forgeries. At the same time we \nstill needed to have some information about shape to prevent a forger from breaking \nthe system by just imitating the rhythm of a signature, so positional, directional \namd curvature features were also used. The resampling of the signatures was such \nas to preserve the regular spacing in time between points. This method penalizes \nforgers who do not write at the correct speed. \n\n\f740 \n\nBromley, Guyon, Le Cun, Sackinger, and Shah \n\nTARGET \n\n..... t ? - - - - - - - - - ' \n:01 \n\n-.... ~ ~ ... beUli \n\nfltt \n\nbe-\n\n11 . \n\n2OOu .... \n\n\u2022 \n\nFigure 1: Architecture 1 consists of two identical time delay neural networks. Each \nnetwork has an input of 8 by 200 units, first layer of 12 by 64 units with receptive \nfields for each unit being 8 by 11 and a second layer of 16 by 19 units with receptive \nfields 12 by 10. \n\n4 NETWORK ARCHITECTURE AND TRAINING \n\nThe Siamese network has two input fields to compare two patterns and one output \nwhose state value corresponds to the similarity between the two patterns. Two \nseparate sub-networks based on Time Delay Neural Networks (Lang and Hinton, \n1988, Guyon et al. 1990) act on each input pattern to extract features, then the \ncosine of the angle between two feature vectors is calculated and this represents the \ndistance value. Results for two different subnetworks are reported here. \n\nArchitecture 1 is shown in Fig 1. Architecture 2 differs in the number and size \nof layers. The input is 8 by 200 units, the first convolutional layer is 6 by 192 \nunits with each unit's receptive field covering 8 by 9 units of the input. The first \naveraging layer is 6 by 64 units, the second convolution layer is 4 by 57 with 6 by 8 \nreceptive fields and the second averaging layer is 4 by 19. To achieve compression in \nthe time dimension , architecture 1 uses a sub-sampling step of 3, while architecture \n2 uses averaging. A similar Siamese architecture was independently proposed for \nfingerprint identification by Baldi and Chauvin (1992). \n\nTraining was carried out using a modified version of back propagation (LeCun, 1989). \nAll weights could be learnt, but the two sub-networks were constrained to have \nidentical weights. The desired output for a pair of genuine signatures was for a \nsmall angle (we used cosine=l.O) between the two feature vectors and a large angle \n\n\fSignature Verification Using a \"Siamese\" Time Delay Neural Network \n\n741 \n\nTable 1: Summary of the Training. \nNote: GA is the percentage of genuine signature pairs with output greater than 0, FR \nthe percentage of genuine:forgery signature pairs for which the output was less than O. \nThe aim of removing all pen up points for Network 2 was to investigate whether the pen \nup trajectories were too variable to be helpful in verification. For Network 4 the training \nsimulation crashed after the 42nd iteration and was not restarted. Performance may have \nimproved if training had continued past this point. \n\n2, arc 1 \n\n3, arc 1 \n\n4, arc 1 \n\n5, arc 2 \n\npu \nacc-c acc-t sp \ncosH sinS cos'\" sin~ \n\nsame as network 3, \nbut a larger training \nset \nsame as 4, except ar(cid:173)\nchitecture 2 was used \n\n(we used cosine= -0.9 and -1.0) if one of the signatures was a forgery. The training \nset consisted of 982 genuine signatures from 108 signers and 402 forgeries of about \n40 of these signers. We used up to 7,701 signature pairsj 50% genuine:genuine pairs, \n40% genuine:forgery pairs and 10% genuine:zero-effort pairs. 1 The validation set \nconsisted of 960 signature pairs in the same proportions as the training set. The \nnetwork used for verification was that with the lowest error rate on the validation \nset. \nSee Table 1 for a summary of the experiments. Training took a few days on a \nSPARe 1+. \n\n5 TESTING \n\nWhen used for verification, only one sub-network is evaluated. The output of this is \nthe feature vector for the signature. The feature vectors for the last six signatures \nsigned by each person were used to make a multivariate normal density model of the \nperson's signature (see pp. 22-27 of Pattern Classification and Scene Analysis by \nDuda and Hart for a fuller description of this). For simplicity, we assume that the \nfeatures are statistically independent, and that each feature has the same variance. \nVerification consists of comparing a feature vector with the model of the signature. \nThe probability density that a test signature is genuine, p-yes, is found by evaluating \n\n1 zero-effort forgeries, also known as random forgeries, are those for which the forger \nmakes no effort to copy the genuine signature, we used genuine signatures from other \nsigners to simulate such forgeries. \n\n\f742 \n\nBromley, Guyon, Le Cun, Sackinger, and Shah \n\n100 \n\n10 \n\n70 \n\nc! \n\nI 10 \nJ 10 \n15 \u2022 40 \nf 30 \n\n20 \n\n50 \n\n10 \n\n0 \nI \n\na. \n\nlie \n\nlie \nPercentage 01 Genuine Signatures Accepted \n\n90 \n\nSI2 \n\n... \n\nsa \n\nsa \n\n82 \n\n80 \n\nFigure 2: Results for Networks 4 (open circles) and 5 (closed circles). The training \nof Network 4 was essentially the same as for Network 3 except that more data \nwas used in training and it had been cleaned of noise. They were both based on \narchitecture 1. Network 5 was based on architecture 2. The signature feature vector \nfrom this architecture is just 4 by 19 in size. \n\nthe normal density function. The probability density that a test signature is a \nforgery, p-no, is assumed, for simplicity, to be a constant value over the range of \ninterest. An estimate for this value was found by averaging the p-yes values for all \nforgeries. Then the probability that a test signature is genuine is p-yesj(p-yes + p(cid:173)\nno). Signatures closer than a chosen threshold to this stored representation are \naccepted, all other signatures are rejected as forgeries. \nNetworks 1, 2 and 3, all based on architecture I, were tested using a set of 63 \ngenuine signatures and 63 forgeries for 18 different people. There were about 4 \ngenuine test signatures for each of the 18 people, and 10 forgeries for 6 of these \npeople. Networks 1 and 2 had identical training except Network 2 was trained \nwithout pen up points. Network 1 gave the better results. However, with such a \nsmall test set, this difference may be hardly significant. \nThe training of Network 3 was identical to that of Network I, except that x and y \nwere used as input features, rather than acc-c and acc-t. It had somewhat improved \nperformance. No study was made to find out whether the performance improvement \ncame from using x and y or from leaving out acc-c and acc-t. Plamondon and \nParizeau (1988) have shown that acceleration is not as reliable as other functions. \nFigure 2 shows the results for Networks 4 and 5. They were tested using a set of \n532 genuine signatures and 424 forgeries for 43 different people. There were about \n12 genuine test signatures for each person, and 30 forgeries for 14 of the people. \nThis graph compares the performance of the two different architectures. \nIt takes 2 to 3 minutes on a Sun SPARC2 workstation to preprocess 6 signatures, \n\n\fSignature Verification Using a \"Siamese\" Time Delay Neural Network \n\n743 \n\ncollect the 6 outputs from the sub-network and build the normal density model. \n\n6 RESULTS \n\nBest performance was obtained with Network 4. With the threshold set to detect \n80% of forgeries, 95.5% of genuine signatures were detected (24 signatures rejected). \nPerformance could be improved to 97.0% genuine signatures detected (13 rejected) \nby removing all first and second signature from the test set 2. For 9 of the remaining \n13 rejected signatures pen up trajectories differed from the person's typical signa(cid:173)\nture. This agrees with other reports (Strangio, 1976 Herbst and Liu, 1977b) that \npen up trajectory is hard to imitate, but also a less repeatable signature feature. \nHowever, removing pen up trajectories from training and test sets did not lead to \nany improvement (Networks 1 and 2 had similar performance), leading us to be(cid:173)\nlieve that pen up trajectories are useful in some cases. Using an elastic matching \nmethod for measuring distance may help. Another cause of error came from a few \npeople who seemed unable to sign consistently and would miss out letters or add \nnew strokes to their signature. \n\nThe requirement to represent a model of a signature in 80 bytes means that the \nsignature feature vector must be encodable in 80 bytes. Architecture 2 was specif(cid:173)\nically designed with this requirement in mind. Its signature feature vector has 76 \ndimensions. When testing Network 5, which was based on this architecture, 50% of \nthe outputs were found (surprisingly) to be redundant and the signature could be \nrepresented by a 38 dimensional vector with no loss of performance. One explana(cid:173)\ntion for this redundancy is that, by reducing the dimension of the output (by not \nusing some outputs), it is easier for the neural network to satisfy the constraint that \ngenuine and forgery vectors have a cosine distance of -1 (equivalent to the outputs \nfrom two such signatures pointing in opposite directions). \nThese results were gathered on a Sun SPARC2 workstation where the 38 values \nwere each represented with 4 bytes. A test was made representing each value in one \nbyte. This had no detrimental effect on the performance. Using one byte per value \nallows the signature feature vector to be coded in 38 bytes, which is well within \nthe size constraint. It may be possible to represent a signature feature vector with \neven less resolution, but this was not investigated. For a model to be updatable \n(a requirement of this project), the total of all the squares for each component of \nthe signature feature vectors must also be available. This is another 38 dimensional \nvector. From these two vectors the variance can be calculated and a test signature \nverified. These two vectors can be stored in 80 bytes. \n\n7 CONCLUSIONS \n\nThis paper describes an algorithm for signature verification. A model of a person's \nsignature can easily fit in 80 bytes and the model can be updated and become more \naccurate with each successful use of the credit card (surely an incentive for people \nto use their credit card as frequently as possible). Other beneficial aspects of this \nverification algorithm are that it is more resistant to forgeries for people who sign \n\n2people commented that they needed to sign a few time to get accustomed to the pad \n\n\f744 \n\nBromley, Guyon, Le Cun, Sackinger, and Shah \n\nconsistently, the algorithm is independent of the general direction of signing and is \ninsensitive to changes in size and slope. \n\nAs a result of this project, a demonstration system incorporating the neural network \nsignature verification algorithm was developed. It has been used in demonstrations \nat Bell Laboratories where it worked equally well for American, European and \nChinese signatures. This has been shown to commercial customers. We hope that \na field trial can be run in order to test this technology in the real world. \n\nAcknowledgements \n\nAll the neural network training and testing was carried out using SN2.6, a neu(cid:173)\nral network simulator package developed by Neuristique. We would like to thank \nBernhard Boser, John Denker, Donnie Henderson, Vic Nalwa and the members of \nthe Interactive Systems department at AT&T Bell Laboratories, and Cliff Moore \nat NCR Corporation, for their help and encouragement. Finally, we thank all the \npeople who took time to donate signatures for this project. \n\nReferences \n\nP. Baldi and Y. Chauvin, \"Neural Networks for Fingerprint Recognition\", Neural Compu(cid:173)\ntation,5 (1993). \n\nR. Duda and P. Hart, Pattern Classification and Scene Analysis, John Wiley and Sons, \nInc., 1973. \nI. Guyon, P. Albrecht, Y. LeCun, J. S. Denker and W. Hubbard, \"A Time Delay Neural \nNetwork Character Recognizer for a Touch Terminal\", Pattern Recognition, (1990). \n\nN. M. Herbst and C. N. Liu, \"Automatic signature verification based on accelerometry\", \nIBM J. Re,. Develop., 21 (1977)245-253. \nK. J. Lang and G. E. Hinton, \"A Time Delay Neural Network Architecture for Speech \nRecognition\", Technical Report CMU-cs-88-152, Carnegie-Mellon University, Pittsburgh, \nPA,1988. \n\nY. LeCun, \"Generalization and Network Design Strategies\", Technical Report CRG-TR-\n89-4 University of Toronto Connectionist Research Group, Canada, 1989. \n\nG. Lorette and R. Plamondon, \"Dynamic approaches to handwritten signature verifica(cid:173)\ntion\", in Computer processing of handwriting, Eds. R. Plamondon and C. G. Leedham, \nWorld Scientific, 1990. \n\nR. Plamondon and M. Parizeau, \"Signature verification from position, velocity and accel(cid:173)\neration signals: a comparative study\", in Pro<;. 9th Int. Con. on Pattern Recognition, \nRome, Italy, 1988, pp 260-265. \n\nC. E. Strangio, \"Numerical comparison of similarly structured data perturbed by random \nvariations, as found in handwritten signatures\", Technical Report, Dept. of Elect. Eng., \n1976. \nI. Yoshimura and M. Yoshimura, \"On-line signature verification incorporating the direction \nof pen movement - an experimental examination of the effectiveness\", in From pixel, to \nfeatures III: frontiers in Handwriting recognition, Eds. S. Impedova and J. C. Simon, \nElsevier, 1992. \n\n\f", "award": [], "sourceid": 769, "authors": [{"given_name": "Jane", "family_name": "Bromley", "institution": null}, {"given_name": "Isabelle", "family_name": "Guyon", "institution": null}, {"given_name": "Yann", "family_name": "LeCun", "institution": null}, {"given_name": "Eduard", "family_name": "S\u00e4ckinger", "institution": null}, {"given_name": "Roopak", "family_name": "Shah", "institution": null}]}