of Hybrid Face and Voice Recognition Methods
for use as an effective identification mechanism in Biometric Systems Security.
Swapna Singh, Cheluri Trivikram, Madhavi. K*
Department of Computer Science Engineering, St.
Peter’s Engineering College, Kompally, Hyderabad, Telangana *[email protected]
Abstract – Biometric Systems has got some consequences that we all are aware
of. For example, the fingerprint scanner cannot scan the finger if it
has some cuts on the finger (generally thumb). Face recognition is a quite
challenging problem and up to date there is no technique that provides a solid
solution to all circumstances. It cannot identify the person in situations
where he/she has worn glasses or if a person has beard, hat etc. Iris and
retina scan biometric techniques results in usage of sophisticated equipment
that would be an expensive affair. Voice recognition technique has got low
accuracy and a person’s illness such as cold can change his/her voice making
absolute identification of that person difficult.
The proposed method includes both the FACE
RECOGNITION and VOICE IDENTIFICATION, as the simultaneous usage of both the
techniques. Practically, there would be no way left to be deceived because both
of them work together. The face recognition technique would identify the face
and the voice recognition technique would identify based on the voice or
speaker. This can be used in industries or places that require high protection
of information and also to those which have high risk of danger from outsiders
who try to sneak in using several means.
Keywords – Face Recognition, Voice recognition, Principle component analysis,
Eigen faces , Eigenvectors, Feature Extraction, Feature Matching, Mel
Frequency Cepstral Coefficient (MFCC), dynamic Time Warping, Support Virtual
Machines(SVM), Neural Networks.
Biometrics in the recent times has become an
essential requirement used to secure the information. Face recognition is the
most commonly used methods in Biometrics for many security issues and are
increasingly being used in various applications.
Face Recognition includes the identification of an
individual by comparing features the live capture to a digital image. The rapid
development of face recognition is due to the factors such as: Active
developments of algorithm, the availability of a large database of facial
images, a method of evaluating the performance of Face Recognition Algorithm. Popular recognition algorithms include
Principal Component Analysis (PCA) 12,Fisher Linear Discriminant
Analysis(FLD) or Linear Discriminant Analysis.3
Voice Identification is another famous biometric technique. It is
the identification of a person from characteristics of voices. It is also called as Speaker
Identification. Speaker recognition has a history dating back some few
decades and uses the acoustic features of speech that have been found to differ
between individuals. The acoustic patterns reflect both anatomy that includes
the size and shape of the throat and mouth as well as learned behavioural
patterns which includes voice pitch and speaking style. Speaker verification
has earned speaker recognition its classification as a “behavioural
biometric”. Voice Recognition Algorithms using Linear Predictive Coding
(LPC), Hidden Markov Model (HMM), Artificial Neural Network (ANN),
nonparametric method uses Mel Frequency Cepstral Coefficient (MFCC) and
non-linear sequence alignment uses Dynamic Time Warping (DTW) Techniques 4 is
being focused in this paper in the following sections
To identify a person in any of the two
techniques, the below to steps are necessary and they are
a. Feature Extraction,
After these two comes,
In Face Recognition, the feature extraction is
done from a 2-Dimensional image, considering the shape and texture parameters.
These features are later classified using the algorithm that suits best.
1.1 Principal Component
Analysis (PCA) or
Eigen faces is the name given to a set of
eigenvectors when they are used in the computer vision issue of human face
recognition. The approach of using Eigen faces for recognition was developed by
Sirovich and Kirby (1987) and was used by Matthew Turk and Alex Pentland in
face classification in the year 1991. This is an approach for the detection and
recognition of human faces and describing the working, using a real-time device
that tracks the head of a person and recognizes by comparing the characteristics
of face to those of known individuals. Face images, usually which are upright
and has a 2-dimensional view and projected into the Feature Space (Face Space).
In Voice Recognition, the feature extraction is
done from a signal which is 1-Dimentional and the parameters considered here
are frequency, amplitude etc. The algorithm used for Face Recognition and Voice
Recognition are two different as they vary in their dimensions. The algorithm
used in Classification could be same or could be different.
The Face Space is defined by the Eigen faces
which are Eigenvectors of the set of faces. This provides the ability to learn and identify the new faces in
an unattended manner 1 and can also be implemented using neural networks
2.When compared to other methods, this one is good in speed,
simplicity, learning capacity and relative insensitivity to small or gradual
changes in face recognition. The performances of the PCA were 88.7% in noisy
circumstances and 94.5% in non-noisy circumstances respectively.3
1.2 Fisher Linear Discriminant Analysis (FLD):
Discriminant Analysis which is also called Linear Discriminant Analysis(LDA)
are methods used in statistics, pattern recognition and machine learning to
linear combination of features which
characterizes or separates two or more classes of objects or events. The
resulting combination may be used as a linear classifier or more commonly, for
dimensionality reduction before later classification. LDA is closely related to
PCA, for both of them are based on linear that is matrix multiplication,
transformations. In case of PCA, the transformation is based on minimizing mean
square error between original data vectors and data vectors that can be
estimated for the reduced dimensionality data vectors. The PCA does not
consider any difference in class but for the case of LDA, the transformation is
based on maximizing a ratio of “between-class variance” to “within-class
variance” with the purpose of reducing data variation in the same class and
rise in the separation between the classes.FLD method equals to 93.8% and 95.5%
for noisy and non-noisy cases respectively.3
2. Voice Identification Algorithms: Voice Recognition Algorithms using Mel Frequency
Cepstral Coefficient (MFCC) and Dynamic Time Warping(DTW) Techniques:
Digital processing of speech signal and voice
recognition algorithm is very crucial .Voice signal identification is the
process which convert a speech waveform into features for further approval.5
The voice of a human basically gives information about the gender, emotion and
identity of the speaker.6The voice recognition in this unit is proved by
introducing many other techniques which is MFCC
and DTW techniques.7
In the technique firstly human voice is converted into digital
signal form to produce digital data. The digitized data or speech samples are
then proceeded further by using MFCC to produce voice features and after this
the voice features are made to Send through DTW to select the pattern that
matches the databases. These both techniques are implemented using the MATLAB.
These both techniques are mainly used to solve the voice recognition based
Algorithm. These algorithm mainly considers two important phases. 8
1. Training sessions
In sound processing, the mel-frequency cepstrum
(MFC) is a representation of the short-term power spectrum of
a sound, based on a linear cosine transform of a log power spectrum on a
nonlinear mel scale of frequency.
Mel-frequency cepstral coefficients (MFCCs) are coefficients that
collectively make up an MFC. They are derived from 6+a type of cepstral
representation of the audio clip (a nonlinear
“spectrum-of-a-spectrum”). The difference between the cepstrum and
the mel-frequency cepstrum is that in the MFC, the frequency bands are equally
spaced on the mel scale, which approximately calculates the human auditory
system responses more closely than the linearly-spaced frequency bands used in
the normal cepstrum. This frequency warping can allow for better depiction of
sound, for example, in audio compression.
MFCCs are commonly derived as follows:
1. Take the Fourier transform of (a windowed
excerpt of) a signal.
Map the powers of the
spectrum obtained above onto the mel scale, using triangular overlapping
Take the logs of the
powers at each of the mel frequencies.
Take the discrete
cosine transform of the list of mel log powers, as if it were a signal.
The MFCCs are the
amplitudes of the resulting spectrum.
This is the algorithm in which the objective of
the algorithm is to match the feature of the voice with the following existing
databases and is based on Dynamic programming 9. It includes the wrapping
between two times series to determine the resemblance between the two time
series. The important principle of the DTW is to compute two dynamic patterns
and measure its similarity by calculating the minimum distance between
This paper has discussed two voice recognition
algorithms which are important in enhancing the voice recognition performance.
The technique was able to verify the particular speaker based on the individual
information that was included in the voice signal. The results show that these
techniques could be used effectively for voice recognition justification.
a) Support Vector Machines(SVM) :
In machine learning, support vector machines
are supervised learning models with connected learning algorithms that examine
data used for classification and regression analysis. Given a set of training
examples, each marked as belonging to one or the other of two classes, an SVM
training algorithm builds a model that assigns new examples to one category or
the other, making it a non-probabilistic binary linear classifier. An SVM model
is a representation of the examples as points in space, mapped so that the
examples of the separate groups are divided by a clear gap that is as wide as
Support vector machines(SVM) and Artificial
Intrusion detection is
a critical component of secure information systems. The issue of identifying
important input features in building an intrusion detection system (IDS). Since
elimination of the insignificant or useless inputs leads to a simplification of
the problem, faster and more precisely in detection may result. Feature ranking
and selection, therefore, is an chief issue in intrusion detection. We apply
the technique of deleting one feature at a time to perform experiments on SVMs
and neural networks to rank the significance of input features for the DARPA
collected intrusion data. Important features for each of the 5 classes of
intrusion patterns in the DARPA data are identified. It is shown that SVM-based
and neural network based IDSs using a reduced number of features can deliver
enhanced or comparable performance.11
Intrusion can be described as any set of
actions that attempt to compromise the integrity, confidentiality or
availability of a facilities. In the situation of information systems,
intrusion refers to any unauthorized access, unauthorized attempt to access or
malicious use of information resources. Intrusion can be categorized into two
groups, anomaly intrusions and misuse intrusions. 12
In addition to performing linear classification,
SVMs can effectively perform a non-linear classification using what is called
the kernel trick, implicitly mapping their inputs into high-dimensional feature
When data are not labeled, supervised learning
is not possible, and an unattended learning approach is required, which
attempts to find natural clustering of the data to groups, and then map new
data to these formed groups. The clustering algorithm which provides an
improvement to the support vector machines is
called support vector clustering and is
repeatedly used in industrial applications either when data are not labeled or
when only some data are labeled as a preprocessing for a classification pass.
b) Neural Networks :
Neural Networks is a field of Artificial
Intelligence (AI) where we, by inspiration from the human brain, find data
structures and algorithms for learning and classification of data. Many tasks
that humans perform naturally fast, such as the recognition of a familiar face,
proves to be a very complicated task for a computer when conventional
programming methods are used. By applying Neural Network techniques a program
can learn by examples, and create an internal structure of rules to classify
different inputs, such as recognising images.
DEMERITS OF THE EXISTING SYSTEMS
The Algorithms used in
Matlab based face recognition the implementation part is no where explained or
has come into existance.
There have been a series
of incidents where
the Face Recognition
failed to be accurate.
technique which includes Face Recognition and Voice Identification as well
would help us a lot more when it comes to security of highly confidential
The Usage of this both
would not be an expensive task as Fingerprint scan is already existing and just
to take it to another level for security we add the Voice Recognition to it.
The Individual biometric
techniques couldn’t make it upto the mark.
But, the usage of this
both could be beneficial.
CONCLUSION and FUTURE WORKS
This research is all based on both the Face and
Voice Recognition techniques being used together, famously called as
We are supposed to use the strong technologies
for both the technologies in order to be practically possible.
Using them both together would be really useful
when they are implemented in the real world scenario, where there is no
security and confidentiality assured and we can take it to another level using
them both to be a part of Multi-Biometrics technique.
1. Eigenfaces for
M Turk, A Pentland
using eigenfaces MA Turk, AP Pentland
3. FACIAL FEATURE EXTRACTION TECHNIQUES FOR FACE
RECOGNITION Rahib H. Abiyev
4.Voice recognition algorithms using mel
frequency cepstral coefficient (MFCC) and dynamic time warping (DTW) techniques
Harris,Speaker identification using MFCC and HMM based techniques,university Of
6 Cheong Soo Yee and abdul Manan ahmad, Malay
Language Text Independent Speaker Vertification using NN ?MLP classsifier with MFCC, 2008 international
Conference on Electronic Design
7 Zaidi Razak,Noor Jamilah Ibrahim, emran mohd
Yamani Idna Idris, Mohd yaakob tamil,mohd Yusoff,Quranic verse recition feature
extraction using mel frequency ceostral coefficient (MFCC),Universitiy Malaya
Stan Salvador and Pjilip
Chan,FastDTW: Toward Accurate Dy ? namic Time Warping in Linear time space,Florida
Institute of Technology,Melbourne.
Chunsheng Fang, From
Dynamic time warping (DTW) to Hidden Markov Model (HMM), University of
important features for intrusion detection using
support vector machines and neural networks
12.Intrusion Detection Using Neural Networks
and Support Vector Machines Srinivas Mukkamala, Guadalupe Janoski, Andrew Sung
Vladimir VN (1995) The
Nature of Statistical Learning Theory. Springer, Berlin Heidelberg New York.