SEMINARIOS DE INVESTIGACIÓN EN INGENIERÍA INFORMÁTICA Y DE TELECOMUNICACIÓN 2007-2008


Actividad de Formación Continua  del Programa Oficial de Posgrado en Ingeniería Informática y de Telecomunicación


Escuela Politécnica Superior, Universidad Autónoma de Madrid

Escuela Politécnica Superior                        


Martes, 26 de febrero de 2008, 13:00

Aula 3, Escuela Politécnica Superior, Universidad Autónoma de Madrid


Optimization of Accuracy and Calibration of Binary and Multiclass Pattern Recognizers for Wide Ranges of Applications

Niko Brümmer

Spescom Datavoice, Sudáfrica, http://www.datavoice.co.za

     

Resumen/Abstract

It is common practice in many fields of basic pattern recognition research to evaluate performance as the misclassification error-rate on a given evaluation database. A limitation of this approach is that it implicitly assumes that all types of misclassification have equal cost and that the prior class distribution equals the relative proportions of classes in the evaluation database.

In this talk, we generalize the traditional error-rate evaluation, to create an evaluation criterion that allows optimization of pattern recognizers for wide ranges of applications, having different class priors and misclassification costs. We further show that this same strategy optimizes the amount of relevant information that recognizers deliver to the user.

In particular, we consider a class of evaluation objectives known as "proper scoring rules", which effectively optimize the ability of pattern recognizers to make minimum-expected-cost Bayes decisions. In this framework, we design our pattern recognizers to:

- extract from the input as much relevant information as possible about the unknown classes, and
- to output this information in the form of well-calibrated class likelihoods.

We refer to this form of output as "application-independent". Then when application-specific priors and costs are added, the likelihoods can be used in a straight-forward and standard way to make minimum-expected-cost Bayes decisions.

A given proper scoring rule can be interpreted as a weighted combination of misclassification costs, with a weight distribution over different costs and/or priors. On the other hand, proper scoring rules can also be interpreted as generalized measures of uncertainty and therefore as generalized measures of information. We show that there is a particular weighting distribution which forms the logarithmic proper scoring rule, and for which the associated
uncertainty measure is Shannon's entropy, which is the canonical information measure. We conclude that optimizing the logarithmic scoring rule not only minimizes error-rates and misclassification costs, but it also maximizes the effective amount of relevant information delivered to the user by the recognizer.

We discuss separately our strategies for binary and multiclass pattern recognition:

- We illustrate the binary case with the example of speaker recognition, where the calibration of detection scores in
likelihood-ratio form is of particular importance for forensic applications.

- We illustrate the multiclass case with examples from the recent 2007 NIST Language Recognition Evaluation, where we experiment with the language recognizers of 7 different research teams, all of which had been designed with one particular language detection application in mind. We show that by re-calibrating these recognizers by optimization of a multiclass logarithmic scoring rule, they can be successfully applied to a range of thousands of other applications.

PDF presentation

Niko Brümmer's CV

Niko Brümmer received the M.Eng. degree from the University of Stellenbosch, Stellenbosch, South Africa, in 1988. He is currently pursuing the Ph.D. degree at the University of Stellenbosch. His dissertation is entitled “Measuring, refining, and calibrating speaker and language information extracted from speech.” Since 1990, he has been a Research Engineer with Spescom DataVoice, Stellenbosch, South Africa, on behalf of whom he has participated in five NIST Speaker Recognition Evaluations between 2000 and 2006, and also two NIST Language Recognition Evaluations in 2005 and 2007. His contributions on application-independent evaluation of systems have been adopted by NIST in Speaker and Languare recognition evaluations since 2006. His research interests include speaker and language recognition and the evaluation and improvement of pattern recognition and machine-learning technologies via information theory. He has co-chaired Odyssey 2008, the ISCA Speaker and Language Recognition Workshop.