Embedded System for Biometric Online Signature ... - IEEE Xplore

IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 10, NO. 1, FEBRUARY 2014

491

Embedded System for Biometric Online Signature Verification Mariano López-García, Rafael Ramos-Lara, Oscar Miguel-Hurtado, and Enrique Cantó-Navarro

Abstract—This paper describes the implementation on field-programmable gate arrays (FPGAs) of an embedded system for online signature verification. The recognition algorithm mainly consists of three stages. First, an initial preprocessing is applied on the captured signature, removing noise and normalizing information related to horizontal and vertical positions. Afterwards, a dynamic time warping algorithm is used to align this processed signature with its template previously stored in a database. Finally, a set of features are extracted and passed through a Gaussian Mixture Model, which reveals the degree of similarity between both signatures. The algorithm was tested using a public database of 100 users, obtaining high recognition rates for both genuine and forgery signatures. The implemented system consists of a vector floating-point unit (VFPU), specifically designed for accelerating the floating-point computations involved in this biometric modality. Moreover, the proposed architecture also includes a microprocessor, which interacts with the VFPU, and executes by software the rest of the online signature verification process. The designed system is capable of finishing a complete verification in less than 68 ms with a clock rated at 40 MHz. Experimental results show that the number of clock cycles is accelerated by a factor of 4.8 and 11.1, when compared with systems based on ARM Cortex-A8 and when substituting the VFPU by the Floating-Point Unit provided by Xilinx, respectively. Index Terms—Biometrics, embedded systems, field-programmable gate arrays (FPGAs), handwritten recognition.

I. INTRODUCTION

H

ANDWRITEN signature is the most usual method by which a person declares that they accept and take responsibility for a signed document. This method is extensively used by contemporary society and has a solid legal basis accepted by the international community as a personal authentication method. However, handwritten signature has certain disadvantages, which have hindered its widespread use as biometric Manuscript received December 18, 2012; revised March 18, 2013 and April 22, 2013; accepted May 23, 2013. Date of publication June 17, 2013; date of current version December 12, 2013. This work was supported by the Ministerio de Economía y Competitividad in the framework of the Programa Nacional de Proyectos de Investigación Fundamental, under Project TEC2012-38329C02-02. Paper no. TII-12-0858. M. López-García and R. Ramos-Lara are with the Department of Electronic Engineering, Universitat Politècnica de Catalunya, 08800 Vilanova i la Geltrú, Spain (e-mail: [email protected]; [email protected]). O. Miguel-Hurtado is with the R&D Department of INCITA, 28037 Madrid, Spain (e-mail: [email protected]). E. Cantó-Navarro is with the Department of Electronic, Electrical and Automatic Engineering, Universitat Rovira I Virgili, Tarragona, Spain (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TII.2013.2269031

modality. The main challenge currently faced by researchers is that samples taken from the same individual have a large variability in their shapes and over time. Besides, forgery signatures carried out by impostors exhibit a small interclass variation, which makes their identification as intrusive users more difficult [1]–[7]. However, an interesting advantage is that the acquisition process can be readily performed by electronic devices such as pen tablets, touch screens or PDAs [8], [9]. These devices offer not only the possibility of capturing the stroke of the signature (spatial information represented by the horizontal and vertical pen position), but also other measurable characteristics such as pen pressure or pen angle versus time. Online signature verification is known as the method that includes these additional characteristics to increase the interclass variability between genuine and impostor signatures. Fig. 1 shows the architecture adopted by almost all signature verification systems. During the enrolment phase each user provides several training signatures, which are processed for extracting a set of distinguishing features that are stored into a database. Afterwards, during the classification phase, the user presents a new signature whose features are extracted and compared against those previously stored into the database tailored during the enrolment phase. The result of this comparison produces a score or degree of similarity whose value is used to determine the authenticity of the signature. Early work on automatic handwritten signature verification appeared in the mid-sixties [10]. Since then, many researchers have proposed different techniques for online signature verification. A comprehensive survey and a summary about the state-of-the-art of this biometric modality were presented by Impedovo et al. [1]. Signature verification techniques can be roughly classified into three groups: template, statistical and structural matching. Template matching techniques are based on a straightforward comparison between the features of a captured signature and a template stored in a database. The most popular and effective method for template matching is the dynamic time warping (DTW) algorithm [11]. Statistical matching consists of estimating a statistical model that represents the most salient behavioral features of a particular signature. Neural networks (NN), hidden Markov models (HMM) or Gaussian mixture models (GMM) are examples of statistical matching widely used in signature verification as well as other biometric methods such as speaker or face recognition [12]–[16]. Finally, structural matching is based on syntactic approaches for representing signatures that are compared through graph or tree matching techniques [17]. The method presented in this paper for online signature verification consists of three stages. Firstly, a preprocessing is performed to

1551-3203 © 2013 IEEE

492


Fig. 1. Generic architecture for enrolment and matching processes.

obtain a normalized signature. Secondly, a DTW algorithm is applied to align the captured signature with its template. Once this alignment is done, a set of features are extracted and used as input for a GMM matcher whose model was previously obtained by using an expectation-maximization (EM) algorithm [18]. Often, biometric algorithms include a set of functions based on different signal processing techniques. Such functions usually have a high computational cost, since they manage an important amount of data and deal with complex operations carried out in floating-point format. Usually, microprocessors clocked at moderate frequencies (50 MHz–400 MHz) are generally too slow for applications requiring intensive computations. Thus, recently several publications proposed the use of field-programmable gate arrays (FPGAs) as platform for implementing embedded biometric systems with outstanding performance in terms of execution time and cost [19], [20]. Fons et al. [21], [22] developed a complete fingerprint recognition system embedded on FPGA, which was 10 times faster than its equivalent software implementation on an Intel Core2Duo microprocessor clocked at 1.8 GHz. Ramos et al. [23] made an implementation on FPGA of a speaker verification system based on a support vector machine (SVM) classifier, obtaining similar performance, in terms of execution time, to those achieved with a Pentium IV PC. The work published by López et al. [24], presented an FPGA implementation of an embedded system for iris recognition. The performance offered by this system was better than that presented by a portable commercial device based on a high-performance processor. However, almost all these proposals share the same design philosophy regarding the granularity of the hardware-software partitioning. The granularity refers to the size of those system parts that are implemented as software program or as dedicated hardware (coprocessors). Thus, the partitioning level can range from a fine granularity based on simple operations to a coarse granularity based on functions. In previous designs, the hardware-software partitioning was performed at function level, implementing in hardware as many coprocessors as there are critical functions in terms of execution time. The main advantage of this approach is the reduction in communication delays between the dedicated hardware units and the microprocessor, since the formers are usually designed to have direct access to external memory. Nevertheless, their internal architecture is carried out for a specific function, making difficult their usability in a different design [25].

This paper describes the implementation of an embedded online signature verification system on a low-cost FPGA. However, unlike previous biometric hardware implementations, the hardware-software partitioning is carried out at operation level. The system architecture contains a single coprocessor, which accelerates all floating-point computations involved in the whole signature verification algorithm. The coprocessor represents a vector floating-point unit (VFPU), which can be configured at run-time for resolving specific blocks of operations that use as input operands vectors of variable length. Thus, this design aims to provide both the flexibility of software implementations and the speed offered by dedicated hardware accelerators. This paper is organized as follows. Section II describes the basic theory of the algorithm presented based on DTW and GMM. Section III presents the internal structure of the VFPU and the floating-point operations implemented. Section IV shows the experimental results and finally Section V presents the conclusions. II. ONLINE SIGNATURE VERIFICATION ALGORITHM This section describes the main stages involved in the online signature verification algorithm proposed in this paper, which can be divided into: acquisition, signature preprocessing, alignment between captured and template signatures based on DTW, and statistical classifier implemented by a GMM. A. Acquisition The acquisition of a signature is performed by a specialized input device. In our experiments a commercial pen tablet (i.e., Wacom Intuos 2) was used to obtain signature images for testing purposes. It is desirable that these data satisfy the regulations adopted in ISO/IEC 19794-7 [26]. This international standard has been developed to enable the interoperability among products and developments to facilitate their joint integration. It specifies two data interchange formats. The first one, known as full format, is well suited for our application, as it stores the raw data in the form of a multidimensional time series vector with a precision of 2 bytes. The second one, named compact format, is oriented to smart-cards or other tokens with limitations in storage and communication capabilities. B. Pre-Processing The aim of a preprocessing stage is to reduce noise and normalize the signature stroke. This process is widely documented

LÓPEZ-GARCÍA et al.: EMBEDDED SYSTEM FOR BIOMETRIC ONLINE SIGNATURE VERIFICATION

493

Fig. 2. Representation of signature captured by a commercial device.

in [27], so that it will be only briefly reviewed here. The preprocessing stage is carried out following four steps: 1) Filtering: Signals acquired by the electronic device are smoothed by applying a low-pass filter that reduces noise introduced in the capturing process. 2) Equally-spacing: The average signals are transformed to an equally-spaced 256-point temporal sequence by using a linear interpolation. 3) Location and time normalization: The x-axis and y-axis temporal functions are normalized by centering the origin of coordinates at the signature centre of mass with a specific rotation. 4) Size normalization: The x and y strokes of the signature are normalized by using the norm of the 2 dimension vector [x,y]. Moreover, the dynamic characteristics such as pressure and inclination are also normalized by their maximum value. An example of signature is shown in Fig. 2, where the x and y positions related to the pen stroke and the pen pressure are presented. The same signature, acquired by a different electronic device, can be captured ranging from different values or disagreeing with the number of points depending on the sample frequency. But, even using the same device, the signature is often affected by other variations like translations, rotations or duration time of writing. As can be seen in Fig. 3, the preprocessing stage removes these preliminary differences creating a normalized signature unaffected by these factors. C. Dynamic Time Warping (DTW) DTW has been widely used by many researches becoming one of the most popular methods used today in signature competitions. This technique aims to minimize the effects of distortion and time-shift between two signatures collected in different sessions. Thus, the DTW finds a nonlinear elastic transformation

that allows the optimal alignment (minimization of distance) of similar shapes even if they are out of phase in the time axis. A comprehensive analysis of this algorithm can be found in [28], which basically can be summarized as follows: 1) Signatures are represented as a sequence of bidimensional points that represent the horizontal x and vertical y pen position:

(1)

where S and T denote the captured and template signatures to be aligned, respectively. In this particular case, provided that signatures have been previously preprocessed, the length of both sequences N is equal to 256. 2) From these two sequences a distance matrix is built. This matrix represents the Euclidean distance between each pair of elements of S and T according to the following expression:

(2) 3) The warping path P, which is built starting from matrix C, is defined as any sequence of points with , , and . On the other hand, the cost function related to this warping path P is defined as follows: (3) is defined as the warping path 4) The optimal warping path which has a minimal cost function (4)

494


Fig. 3. Captured signature after applying the preprocessing stage.

Fig. 4. Optimal warping path after aligning captured and template signatures. The adjustment window corresponds to the diamond shape located around the diagonal line.

Note that the number of possible warping paths P is very high, so that the assessment of covering all possibilities becomes a task of minimal efficiency. In order to reduce the time needed to calculate , the dynamic programming algorithm described in [28] is used. Besides, the number of operations needed by this algorithm for its resolution could be reduced by introducing some restrictions like adjustment windows (diamond shape in

Fig. 4), slope constraint, etc., in the number of points to be calculated. Fig. 4 represents the accumulative matrix that results after applying the programming algorithm for signature presented in Fig. 3 and its reference template. When two sequences are completely identical the optimal warping path is ideally placed on the diagonal of this matrix. As Fig. 4 shows, time misalignment between signatures is represented by small deviations of the optimal warping path from the diagonal line. In this particular example, this difference is more accentuated at central samples. For instance, it can be seen as sample 100 of captured signature is aligned with sample 80 of template signature. The effect of this alignment is also observable in Fig. 5, which represents the position (x and y) and pressure of both signatures on the same graphic. Note that shapes of captured signature shown in Fig. 3 (only preprocessed) and in Fig. 5 (preprocessed and aligned with template) are completely identical, since the alignment process only changes the temporal position of samples (x-axis), but not its value (y-axis). D. Gaussian Mixture Model (GMM) The use of GMM statistical models in biometrics was firstly used in voice recognition, and subsequently applied by different authors to other biometric modalities. The underlying theory on GMM is widely documented, so this section only summarizes the main steps followed to obtain the probabilistic Gaussian model. 1) GMM Model: The GMM represents the probability that a particular feature belongs to a specific user, whose model is


495

Fig. 5. Alignment of captured signature with template, as result of the application of a DTW algorithm.

represented by . This probability is defined by a weighted sum of Gaussian probabilistic density functions as

(5) where are the weighted coefficients, M represents the number of Gaussians and is the particular density function for each multivariable Gaussian function:

2) Feature Extraction: The feature vector represents the most salient characteristics of a signature. This vector is used as input of the GMM model expressed in (5), and is mainly based on the processing of the primary signals captured by the electronic device (once these signals are aligned by the DTW algorithm). In [29], Ly Van et al. realized that the dynamics of the signing process are very writer-dependent. They proposed to exploit this dependence by processing the primary signals in order to emphasize these distinguishing dynamics by applying the following regression formula:

(6) In this case M is set at 3 and refers to the feature vector of length L. Along with the weighting factors , the user’s model is represented by and , which refer to the mean vector and the covariance matrix of the template signature, respectively. These parameters are calculated by using a set of training signatures and their derived feature vectors. The expectation-maximization (EM) algorithm described in [18] was used for this purpose. Note that this algorithm is executed off-line using a desktop computer, so that the embedded system only performs the calculation of (5) and (6) based on the previous knowledge of model .

(7) where W represents the number of primary signals and the order of the regression . Thus, the feature vector is made up of signals obtained directly from the captured device, others are derived after their processing using (7), and finally some others are extracted from specific parts of the signature [1]. The selection criteria for determining the suited number and type of features can be performed considering different factors or scenarios. For instance, in [30], authors analyze the optimal size of the feature vector that minimizes the effect of aging over

496


the recognition rates, presenting a strategy for addressing concerns about both physical and template aging. Furthermore, in [31] Miguel-Hurtado studied the discriminative power of several features. For each particular signal, two different distributions related to the genuine and forgery signatures are generated, respectively. The common area of these two distributions is calculated and used as selection criteria. The author concludes that the optimal number of signals is . Some of these signals match partially or completely the results found by previous authors in [29] and [30]. These signals are described in Table I and were used to derive the feature vector . This vector is finally extracted by calculating the distance between each of the 25 signals , derived from the captured signature, and its paired obtained from the template stored in the database

TABLE I SIGNALS USED TO CALCULATE THE FEATURE VECTOR

(8) where N is the number of samples fixed at 256 in the preprocessing stage and . III. VECTOR FLOATING-POINT UNIT ARCHITECTURE A. Previous Works Most calculations used in biometric algorithms are solved by mathematical operations involving floating-point numbers. Essentially, the use of this type of arithmetic is necessary, because the resolution of these algorithms requires representing the intermediate and final results with a high accuracy or with a wide dynamic range. The solution adopted in previous works, when implementing these algorithms in microprocessors or FPGAs, roughly aims at two different approaches. The first one solves the algorithm using software, by implementing a microprocessor that contains a floating-point unit (FPU) that carries out operations using scalar numbers [13], [14]. This proposal is quite flexible, since the same hardware architecture is used for resolving different parts of the algorithm. However, often mathematical operations are performed on vectors whose dimension is variable and depends on the processed data. In these cases, due to the inherent functioning of the FPU, augmenting the length of a vector led to an increase of the resolution time due to several factors. 1) The number of CPU fetches, related to the execution of specific FPU instructions, is increased. Thus, the number of program memory accesses also increases along with the number of readings (operands) and writings (results) in data memory. Reducing the number of memory accesses is a crucial task to avoid an excessive execution time, since many calculations such as distances, filtering, variances, etc., are performed by executing a set of basic operations that generate several partial results only needed to obtain a final value. 2) The FPU operations are carried out sequentially as well as the memory accesses required to obtain their operands and the result. Thus, the possibility that the FPU increases its throughput by executing several operations in parallel may be limited.

The second approach proposes designing individual dedicated hardware units that minimize memory accesses and parallelize operations [23], [24]. In this approach the substitution of floating-point computations by fixed-point is a usual procedure, in order to simplify the data-path design and to reduce the hardware resources needed by its implementation. Unfortunately, this approach has also some important drawbacks. 1) Before implementing the hardware units it is necessary to determine the number of bits that forms the integer and fractional part of any variable declared as float and represented in fixed-point format. This analysis is necessary to avoid problems of overflow and accuracy. However, as claimed in [23], the bit-length obtained by this adjusting procedure must be repeated when input data change their value.


497

Fig. 6. Architecture description for both VFPU (left) and FPU (right).

2) Since coprocessors are designed for specific computations they have minimal flexibility and, depending on their complexity, their design could require a large development time. Moreover, small changes introduced in the algorithm could produce substantial modifications in the internal architecture of the coprocessor that may involve its complete redesign. B. VFPU Architecture Description The VFPU presented in this paper was designed to overcome most of the disadvantages and limitations of previous approaches. Its main features can be summarized as follows: 1) The internal architecture of the VFPU is designed to optimize computations, which are defined as a set of basic floating-point operations. In this way, these computations can be performed avoiding additional accesses to external memory. 2) The VFPU executes computations on scalar numbers or vectors of arbitrary length using operands of single precision defined by the standard IEEE-754. 3) Computations can be performed with vectors stored in external memory, scalar numbers provided by the microprocessor or any combination of them. Likewise, the result of any operation can be placed on external memory or read by the microprocessor. Fig. 6 (left) shows the internal architecture of the VFPU, which is divided in 5 blocks: FIFO memory, bus interface, register file, control unit and FPU. The role of FIFO memory is acting as connection with the memory controller for managing the reading and writing of data vectors in external RAM. The elements of these vectors are temporarily stored in the FIFO memory as input operands (namely, to ), which could be subsequently used by the FPU. Likewise, this memory can also store any result that should be transferred to external memory. The bus interface carries out a similar role as the FIFO memory, but in this case managing the reading and writing of scalar numbers between the VFPU and the microprocessor. The register to ) that store the file consists of eight 32-bits registers ( result provided by any operation performed by the FPU. These registers can also be used as operands in subsequent operations. Two multiplexors, which are managed by the control unit, select the proper operands of the FPU from the register file, the FIFO memory or the bus interface. Thus, the VFPU is able to carry out operations using any combination of vectors or scalar numbers. Additionally, the VFPU is arranged with a control unit that is configurable at run-time. This configuration allows the VFPU

to be adapted to any of the stages involved in the algorithm to accelerate the execution time. The internal architecture of the FPU is depicted in Fig. 6 (right). The FPU performs the following basic operations: addition, subtraction, multiplication, division, square root, casting from integer to float, absolute value and negative sign. Except for the division and square root block, the rest of the operations are implemented by combinational circuits, all of them being interconnected through pipeline registers that store the final result. This result is subsequently normalized and rounded for those operations that generate a change in the mantissa. Moreover, any mathematical operation includes a previous denormalization stage, which also checks special floating-point numbers such as zero, infinite, not a number, etc. Therefore, the FPU , except square root solves any arithmetic operation in 4 . Thus, the FPU is designed and division that need 30 in such a way that those operations implemented by combinational circuits are designed to work in pipeline, while the rest are scheduled to provide the results as soon as possible along the processing flow. Then, when implementing computations that do not include division or square root operations, the FPU , achieving a admits a new operation each clock cycle throughput of 1 MFLOP/MHz. C. VFPU Operation and Waveforms This section shows a particular example that reveals the power computation of the VFPU when operating with vectors and scalar numbers. The application of the DTW requires the calculation of a matrix of Euclidean distances defined by (2). Each element of this matrix can be represented as follows: (9) This 256 256 matrix can be obtained by tackling the problem as a group of operations between vectors and scalar numbers. These operations can be expressed by a simple formula as (10) being

(11)

498


Fig. 7. Waveforms and VFPU register file when computing the Euclidean distance.

where C[i] states for the row of matrix C, and are the scalar element related to coordinates x and y of template signature, and and are vectors of 256 elements each one representing the x-axis and y-axis position of the captured signature, respectively. The set of instructions executed by the microprocessor to configure the VFPU for the computation of (10) is represented by the following pseudocode: VFPU_VectorLength(256);

//Vector-length configuration

;

;

//Configuration

of vectors VFPU_States (

, , ;

//Configuring

computation states for (

;

; i++)

{ of scalars

//For each row

;

VFPU_O(C[i]);

;

//Configuration of vector C[i] ;}

VFPU computation

//Configuration

//Starts a

The first three VFPU instructions set the vector length and vector specific base address. This information is used by the Memory FIFO to read, from external memory, each element of and , which are stored in input operands and , respectively. Likewise, scalars and are also read from the input operands and of the bus interface, respectively. The fourth instruction manages the configuration of the control unit, by implementing a finite state machine of six states. These states are associated with the six operations needed to calculate (10): two subtractions, two multiplications, one addition and one square root. The result of each operation is temporally stored in the register file ( to ). Afterwards, a process of 256 iterations is executed: one iteration per each row of the matrix of distances. The loop consists of four instructions that set the information related to: scalar numbers and ; address memory of vector C[i] that stores the result; and finally, the instruction for starting the computation. Fig. 7 shows the waveforms and the register file values for a representative set of clock cycles. There are several features that make this VFPU suited to carry out these operations. 1) Note that to process a complete row, only four fetches to instruction memory are required by iteration. 2) Since operations in the FPU are conveniently scheduled, each element of C[i] is calculated in 30 , which is the time needed by the square root module to obtain the result. Thus, the VFPU is configured in such a way that during the calculation of the element of vector or , the square root of the element is also calculated in parallel with the rest of the operations and memory accesses to vectors. Therefore, the number of clock cycles needed to


499

TABLE II . PERCENTAGE (%) AGAINST AREA AND MAXIMUM CLOCK FREQUENCY TOTAL NUMBER OF RESOURCES IN THE FPGA

calculate the complete matrix of distances can be roughly approximated by . 3) The intermediate results are stored in registers to , which avoids additional accesses devoted to write and read operations in data memory. IV. EXPERIMENTAL RESULTS The complete embedded system was implemented on a XC3S2000 low-cost Spartan 3 FPGA of Xilinx operating at 40 MHz. The system architecture consists of a Microblaze microprocessor, a soft-core developed by Xilinx suitable for designing embedded systems, and the VFPU described in Section III. The VFPU is connected to the Microblaze through the system bus available for this purpose. Data and program are stored in a 2MB SRAM external memory. A memory controller (MC) drives the SRAM and provides access from the microprocessor and the VFPU. Besides, other peripherals such as timers, UARTs, input-output ports, etc., are also implemented as part of the embedded system. Table II presents the area occupied by all these modules and their theoretical maximum frequency reported by the synthesis tool. The fourth row of this table, which refers to the MC and the rest of peripherals, shows that this frequency is limited to 45.4 MHz. This value is due to the interconnection created, through the system bus, between the microprocessor and the MC. A. Recognition Results The accuracy of the proposed signature verification algorithm was tested on the MCyT database, which includes 100 users and contains 25 genuine signatures and 25 skilled forgeries for each one [32]. The protocol for testing this database is described in [14]. As usual, the algorithm performance is evaluated by means of the detection error tradeoff (DET) curve, which is built by representing on the y-axis and x-axis the false nonmatch rate (FNMR) and the false match rate (FMR), respectively. Fig. 8 shows two DET curves with respect to two different set of values (namely GMM and GMM modified) used when initializing the training EM algorithm for calculating the GMM model. Note that the best Equal Error Rate (ERR) (the parameter usually used for determining the quality of a biometric algorithm) is 2.74%, which is a good result for most signature verification systems. This accuracy level is comparable with those obtained by other biometric modalities. For instance, recent international competitions on fingerprint verifica-

Fig. 8. DET curves obtained using the MCyT database.

tion showed an average ERR ranged from 2% to 3% using different databases [33]. Furthermore, in contrast with other physiological biometrics such as iris or face, authentication based on signatures requires the conscious action of an individual during the acquisition process, which represents a natural method for liveness detection. Hence, biometrics based on handwritten signature is very useful in applications where both, the transaction taking place and the user involved in such transaction, shall be authenticated. B. Speed Processing To prove the speed and the performance of this VFPU, the signature verification algorithm was executed on two additional systems: an ARM Cortex-A8 microprocessor clocked at 720 MHz and the Microblaze microprocessor configured with its own FPU provided by Xilinx. The results obtained with these systems serve as reference pattern to find out the real performance of the proposed VFPU. Note that both, the embedded system configured with the VFPU and the one based on the FPU of Xilinx, are built using the same microprocessor and the same memory controller. Hence, the theoretical maximum frequency for both embedded systems is identical and limited to 45.4 MHz (6th column of Table II). Experimental results for both systems are obtained using the nearest value to this frequency as for the development board used in our experiments is 40 MHz. Table III shows the execution speeds in clock cycles and in milliseconds (ms) when executing the complete algorithm. This table also specifies the time consumed for each particular function. It is noteworthy, that the VFPU and the FPU, resolve any floating-point operation involved in any of the stages (pre-processing, DTW or GMM) involved in the whole signature verification algorithm. Note that the DTW stage is the most consuming element for all three embedded systems, representing more than 96% of the total execution time. It is observed that the calculation of the matrix of Euclidean distances by the VFPU-based system takes , which is almost the theoretical value of found in Section III-C. As this table shows, when configuring the Microblaze with its scalar FPU

500


TABLE III AND MS, EXECUTION TIME, IN TERMS OF NUMBER OF CLOCK CYCLES FOR EACH STAGE OF THE SIGNATURE VERIFICATION ALGORITHM

Cortex-A8 and when configuring the FPU provided by Xilinx, respectively. REFERENCES

(3rd column of Table III), the number of clock cycles needed to solve the algorithm is , which is 11.1 times slower than the same system configured with the VFPU. Likewise, the number of clock cycles needed by the ARM microprocessor (4th column of Table III) is , which is about 4.8 times more than the number of clock cycles required by the VFPU. In contrast, the ARM-based system operates at a clock frequency 18 times higher than the VFPU-based system. However, the total execution time for these systems is approximately 18 ms and 67 ms, respectively, which represents only an acceleration factor of 3.72. V. CONCLUSION This paper describes a complete biometric algorithm for signature verification based on three stages. Signature is normalized by means of a preprocessing that removes irrelevant information. Subsequently, the captured signature is aligned with its template by applying a DTW algorithm. From this aligned signature the most salient features are extracted and used as input to a GMM model, whose output is used to confirm or deny the user’s identity. This paper also showed the design of an embedded system for implementing this signature verification algorithm on a low-cost FPGA. The system consists of a generic VFPU unit that accelerates computations based on floating-point operations usually employed in biometric algorithms. The VFPU is capable of performing multiple operations in parallel using vectors of any length as operands. Furthermore, it reduces the number of accesses to program memory and also avoids storing intermediate values only needed to obtain a final result. The performance of the VFPU was compared with those performances offered by the FPU provided by Xilinx and with an ARM Cortex-A8 microprocessor. The verification algorithm was executed on these three systems, demonstrating that the VFPU offers the best performance. The number of clock cycles required by the VFPU to execute the algorithm was , which represents an acceleration factor of 4.8 and 11.1, when compared with systems based on ARM

[1] D. Impedovo and G. Pirlo, “Automatic signature verification: The state of the art,” IEEE Trans. Syst., Man, Cybern.—Part C: Appl. Rev., vol. 38, no. 5, pp. 609–635, Sep. 2008. [2] S. Impedovo and G. Pirlo, “Verification of handwritten signatures: An overview,” in Proc. 14th Int. Conf. Image Anal. Process., Sep. 2007, pp. 191–196. [3] J. J. Igarza, L. Gómez, I. Hernáez, and I. Goirizelaia, Searching for an Optimal Reference System for On-Line Signature Verification Based on (x, y) Alignment, D. Zhang and A. K. Jain, Eds. Berlin, Germany: Springer-Verlag, 2004, pp. 519–525, ICBA 2004, LNCS 3072. [4] D. Impedovo and G. Pirlo, “On-line signature verification by stroke-dependent representation domains,” in Proc. 12th ICFHR, Kolkata, India, Nov. 2010, pp. 623–627, 16–18. [5] G. Pirlo, “Algorithms for Signature Verification,” in Proc. NATO-ASI Series Fund. Handwriting Recognit., S. Impedovo, Ed., Berlin, Germany, 1994, pp. 433–454, Springer-Verlag. [6] V. Di Lecce, G. Dimauro, A. Guerriero, S. Impedovo, G. Pirlo, and A. Salzo, “A multi expert system for dynamic signature verification,” in Proc. 1st Int. Workshop, Multiple Classifier Syst. (MCS 2000), J. Kittler and F. Roli, Eds., Cagliari, Italy, Jun. 2000, vol. 1857, Series: Lecture Notes Comput. Sci., pp. 320–329, Springer-Verlag Berlin Heidelberg. [7] G. Dimauro, S. Impedovo, M. G. Lucchese, R. Modugno, and G. Pirlo, “Recent advancements in automatic signature verification,” in Proc. 9th Int. Workshop Frontier Handwriting Recognit., Oct. 2004, pp. 179–184, IEEE Comput. Society Press. [8] S. Nabeshima, S. Yamamoto, K. Agusa, and T. Taguchi, “MEMOPEN: A new input device,” in Proc. Int. Conf. Companion Human Factors Comput. Syst. (CHI’95), 1995, pp. 256–257. [9] Y. Komiya, T. Ohishi, and T. Matsumoto, “A pen input on line signature verifier integrating position, pressure and inclination trajectories,” IEICE Trans. Inf. Syst., vol. E84 D, no. 7, pp. 833–838, Jul. 2001. [10] A. Mauceri, American Aviation Co., Anaheim, CA, USA, Feasibility Studies of Personal Identification by Signature Verification Space and Information System Division, Tech. Rep. SID 65 24 RADC TR 65 33, 1965. [11] B. Fang, C. H. Leung, Y. Y. Tang, K. W. Tse, P. C. K. Kwok, and Y. K. Wong, “Off-line signature verification by tracking of feature and stroke positions,” Pattern Recognit., vol. 36, no. 1, pp. 91–101, Jan. 2003. [12] R. Bajaj and S. Chaudhury, “Signature verification using multiple neural classifiers,” Pattern Recognit., vol. 30, no. 1, pp. 1–7, Jan. 1997. [13] J. Fierrez-Aguilar, J. Ortega-García, D. D. Ramos, and J. Gonzalez-Rodríguez, “HMM-based on-line signature verification: Feature extraction and signature modeling,” Pattern Recognit. Lett., vol. 28, no. 16, pp. 2325–2334, Dec. 2007. [14] O. Miguel-Hurtado, L. Mengibar-Pozo, and A. Pacut, “A new algorithm for signature verification system based on DTW and GMM,” in Proc. 42nd. Annu. IEEE Int. Carnahan Conf. Security Technol., Oct. 2008, pp. 206–213. [15] J. Y. Kim, D. Y. Ko, and S. Y. Na, “Implementation and enhancement of GMM face recognition systems using flatness measure,” in Proc. IEEE Int. Workshop Robot Human Interact. Commun., Sep. 2004, pp. 247–251. [16] J. Gonzalez-Rodriguez, D. Ramos-Castro, D. Torre-Toledano, A. Montero-Asenjo, J. Gonzalez-Dominguez, I. Lopez-Moreno, J. Fierrez-Aguilar, D. Garcia-Romero, and J. Ortega-Garcia, “Speaker recognition the ATVS-UAM system at NIST SRE 05,” IEEE Aerospace Electron. Syst. Mag., vol. 22, no. 1, pp. 15–21, Jan. 2007. [17] X. H. Xiao and R. W. Dai, “On line Chinese signature verification by matching dynamic and structural features with a quasi-relaxation approach,” in Proc. 5th. Int. Workshop Front. Handwriting Recognit. (IWFHR-5), Colchester, U.K., Sep. 1996. [18] A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihood from incomplete data via the EM algorithm,” J. Roy. Statist. Society. Series B (Methodological), vol. 39, no. 1, pp. 1–38, 1977. [19] E. Monmasson, L. Idkhajine, M. N. Cirstea, I. Bahri, A. Tisan, and M. W. Naouar, “FPGAs in industrial control applications,” IEEE Trans. Ind. Inf., vol. 7, no. 2, pp. 224–243, May 2011. [20] S. Jin, D. Kim, T. T. Nguyen, D. Kim, M. Kim, and J. W. Jeon, “Design and implementation of a pipelined datapath for high-speed face detection using FPGA,” IEEE Trans. Ind. Inf., vol. 8, no. 1, pp. 158–167, Feb. 2012.


[21] M. Fons, F. Fons, and E. Cantó-Navarro, “Fingerprint image processing acceleration through run-time reconfiguration hardware,” IEEE Trans. Circuits Syst. II: Exp. Briefs, vol. 57, no. 12, pp. 991–995, Dec. 2010. [22] M. Fons, F. Fons, E. Cantó-Navarro, and M. López-García, “FPGA based personal authentication using fingerprints,” J. Signal Process. Syst., vol. 66, no. 2, pp. 153–189, Feb. 2012. [23] R. Ramos-Lara, M. López-García, E. Cantó-Navarro, and L. PuenteRodriguez, “Real-Time speaker verification system implemented on reconfigurable hardware,” J. Signal Process. Syst., vol. 71, no. 2, pp. 89–103, May 2013. [24] M. López-García, J. Daugman, and E. Cantó-Navarro, “Hardware-software co-design of an iris recognition algorithm,” IET Inf. Security, vol. 5, no. 1, pp. 60–68, Apr. 2011. [25] J. Liu-Jiménez, R. Sánchez-Reillo, L. Mengibar-Pozo, and O. Miguel Hurtado, “Optimisation of biometric ID tokens by using hardware/software co-design,” IET Biometrics, vol. 1, no. 3, pp. 168–177, Sep. 2012. [26] International Organization for Standardization, International Electrotechnical Commission, ISO/IEC DIS 19794-7: Information Technology—Biometric Data interchange Formats—Part 7: Signature/sign Time Series Data 2013. [27] Y. Sato and K. Kogure, “Online signature verification based on shape, motion, and writing pressure,” in IEEE Proc. 6th Int. Conf. Pattern Recognit., 1982, pp. 823–826. [28] H. Sakoe and S. Chiba, “Dynamic programming algorithm optimization for spoken word recognition,” IEEE Trans. Acoust. Speech, Signal Process., vol. 26, no. 1, pp. 43–49, Feb. 1978. [29] B. Ly Van, S. Garcia-Salicetti, and B. Dorizzi, “On using the Viterby path along with HMM likelihood information for online signature verification,” IEEE Trans. Syst., Man, Cybern. Part B: Cybern., vol. 37, no. 5, pp. 1237–1247, Oct. 2007. [30] M. Erbilek and M. Fairhurst, “Framework for managing ageing effects in signature biometrics,” IET Biometr., vol. 1, no. 2, pp. 136–147, Jun. 2012. [31] O. Miguel-Hurtado, “Online Signature Verification Algorithms and Development of Signature International Standards” Ph.D. dissertation, Universidad Carlos III de Madrid, Madrid, Spain, 2011 [Online]. Available: http://e-rchivo.uc3m.es/bitstream/10016/12580 [32] J. Ortega-García, J. Fierrez-Aguilar, D. Simon, J. Gonzalez, M. Faundez-Zanuy, V. Espinosa, A. Satue, I. Hernaez, J. J. Igarza, C. Vivaracho, D. Escudero, and Q. Moro, “MCyT baseline corpus: A bimodal biometric database,” IEE Proc.—Vision, Image, Signal Process., vol. 150, no. 6, pp. 395–401, Dec. 2003. [33] Fingerprint Verification Competition [Online]. Available: http://bias. csr.unibo.it/fvc2006/ FVC2006 Mariano López-García received the M.S. and Ph.D. degrees in telecommunication engineering from the Technical University of Catalonia, Barcelona, Spain, in 1996 and 1999, respectively. In 1995, he joined the Department of Electronic Engineering where he became an Associate Professor in 2001. He currently teaches courses in microelectronics and advanced digital design. He also taught power electronics, analog electronics and design of PCB Boards at the undergraduate level for several years. He spent one year at Cambridge University in the Faculty of Computer Science and Technology, as a Visiting Scholar collaborating on the implementation of biometric algorithms on low-cost devices. He is currently a Member of the “Subcomité Español CTN 71 SC37” of AENOR. His research interests include signal processing, biometrics, embedded systems, hardware-software codesign, and FPGAs.

501

Rafael Ramos-Lara received the B.S, M.S., and Ph.D. degrees in Telecomunication Engineering from the Universitat Politècnica de Catalunya, Barcelona, Spain, in 1990, 1996, and 2006, respectively. Since 1990, he has been an Assistant Professor with the Department of Electronic Engineering, Universitat Politècnica de Catalunya. His research interests include nonlinear controller, sliding mode control, power electronics, adaptive signal processing, and digital implementation of signal processing systems and biometric algorithms.

Oscar Miguel-Hurtado received the Ph.D. degree from the Carlos III University of Madrid, Madrid, Spain, in 2011. He is currently working toward the Ph.D. degree which has been focused on online signature biometric modality at both algorithms and international standards level. He is currently working at the R&D Department of INCITA, dealing with national and international R&D Projects on biometrics. He has had significant involvement with biometrics standards development being a member of the AENOR Biometrics subcommittee and Spanish Delegate for ISO/IEC JTC1 SC37 (responsible for Working Group 3, Biometric data interchange formats) and CEN TC224 WG18 (Interoperability of biometric recorded data). He also serves as the Spanish expert on signature modalities and is Editor and Coeditor of four international standards in this field.

Enrique Cantó-Navarro received the M.Sc. degree in electronics engineering, in 1995, and the Ph.D. degree, in 2001, both from the Universidad Politècnica de Catalunya (UPC), Barcelona, Spain. He has been an Associate Professor since 1996 at the UPC, and an Assistant Professor with the Universitat Rovira i Virgili (URV), since 2003. He has participated in several National and International research projects related to smart-cards, FPGAs, and biometrics. He has published more than 60 research papers in journals and conferences. His research interests include hardware accelerators for biometric algorithms and run-time reconfigurable embedded systems.