SEMINARIOS DE INVESTIGACIÓN EN
INGENIERÍA INFORMÁTICA Y DE TELECOMUNICACIÓN
2007-2008
Actividad de Formación
Continua del Programa Oficial de Posgrado en Ingeniería
Informática y de Telecomunicación
Escuela Politécnica Superior, Universidad Autónoma de
Madrid
Martes, 26 de febrero de 2008, 13:00
Aula 3, Escuela Politécnica Superior,
Universidad Autónoma de Madrid
Optimization
of Accuracy and Calibration of Binary and Multiclass Pattern
Recognizers for Wide Ranges of Applications
Niko Brümmer
Spescom Datavoice,
Sudáfrica, http://www.datavoice.co.za
Resumen/Abstract
It is common practice in many fields of basic pattern recognition
research to evaluate performance as the misclassification error-rate on
a given evaluation database. A limitation of this approach is that it
implicitly assumes that all types of misclassification have equal cost
and that the prior class distribution equals the relative proportions
of classes in the evaluation database.
In this talk, we generalize the traditional error-rate evaluation, to
create an evaluation criterion that allows optimization of pattern
recognizers for wide ranges of applications, having different class
priors and misclassification costs. We further show that this same
strategy optimizes the amount of relevant information that recognizers
deliver to the user.
In particular, we consider a class of evaluation objectives known as
"proper scoring rules", which effectively optimize the ability of
pattern recognizers to make minimum-expected-cost Bayes decisions. In
this framework, we design our pattern recognizers to:
- extract from the input as much relevant information as possible about
the unknown classes, and
- to output this information in the form of well-calibrated class
likelihoods.
We refer to this form of output as "application-independent". Then when
application-specific priors and costs are added, the likelihoods can be
used in a straight-forward and standard way to make
minimum-expected-cost Bayes decisions.
A given proper scoring rule can be interpreted as a weighted
combination of misclassification costs, with a weight distribution over
different costs and/or priors. On the other hand, proper scoring rules
can also be interpreted as generalized measures of uncertainty and
therefore as generalized measures of information. We show that there is
a particular weighting distribution which forms the logarithmic proper
scoring rule, and for which the associated
uncertainty measure is Shannon's entropy, which is the canonical
information measure. We conclude that optimizing the logarithmic
scoring rule not only minimizes error-rates and misclassification
costs, but it also maximizes the effective amount of relevant
information delivered to the user by the recognizer.
We discuss separately our strategies for binary and multiclass pattern
recognition:
- We illustrate the binary case with the example of speaker
recognition, where the calibration of detection scores in
likelihood-ratio form is of particular importance for forensic
applications.
- We illustrate the multiclass case with examples from the recent 2007
NIST Language Recognition Evaluation, where we experiment with the
language recognizers of 7 different research teams, all of which had
been designed with one particular language detection application in
mind. We show that by re-calibrating these recognizers by optimization
of a multiclass logarithmic scoring rule, they can be successfully
applied to a range of thousands of other applications.
Niko Brümmer's CV
Niko Brümmer received the M.Eng. degree from the University of
Stellenbosch, Stellenbosch, South Africa, in 1988. He is currently
pursuing the Ph.D. degree at the University of Stellenbosch. His
dissertation is entitled “Measuring, refining, and calibrating speaker
and language information extracted from speech.” Since 1990, he has
been a Research Engineer with Spescom DataVoice, Stellenbosch, South
Africa, on behalf of whom he has participated in five NIST Speaker
Recognition Evaluations between 2000 and 2006, and also two NIST
Language Recognition Evaluations in 2005 and 2007. His contributions on
application-independent evaluation of systems have been adopted by NIST
in Speaker and Languare recognition evaluations since 2006. His
research interests include speaker and language recognition and the
evaluation and improvement of pattern recognition and machine-learning
technologies via information theory. He has co-chaired Odyssey 2008,
the ISCA Speaker and Language Recognition Workshop.