SEMINARIOS EN INGENIERÍA INFORMÁTICA Y DE TELECOMUNICACIÓN 2006-2007


Doctorado en Ingeniería Informática y de Telecomunicación

Programa Oficial de Posgrado en Ingeniería Informática y de Telecomunicación
Escuela Politécnica Superior, Universidad Autónoma de Madrid

Escuela Politécnica Superior                        


8 de junio de 2006, 12:00

Salón de Grados, Escuela Politécnica Superior, Universidad Autónoma de Madrid


Using Algorithmic Information Theory and Stochastic Modeling to
Improve Classification and Evolutionary Computation

Manuel Cebrián Ramos

Escuela Politécnica Superior, Universidad Autónoma de Madrid

     

Abstract

This dissertation presents theoretical and practical contributions in Algorithmic Information Theory and (Algorithmic) Stochastic Modeling. Algorithmic Information Theory is the theory concerned with obtaining an absolute measure of the information contained in an object. Stochastic Modeling is a methodology to improve an algorithm's performance by means of the introduction of random elements in its logic.

One of the most interesting advances of the Algorithmic Information Theory is the development of an absolute measure of similarity between objects. This measure can only be estimated, as it is incomputable by definition. The typical estimation relies on the use of data compression algorithms, being this estimation known as the compression distance. The two theoretical contributions presented analyze the quality of this estimation. The first quantifies the estimation robustness when the information contained in the objects is noise-altered, concluding that it is considerably resistant to noise. The second studies the impact of the compression algorithm implementation on the estimation, yielding some practical recipes for making this choice.

We use variants of the compression distance to develop three applications for classification, being one an application to evolutionary computation as well. The first application addresses the problem of detecting similarities in objects which have been generated by a predecessor common source, independently of whether they use or not the same coding scheme: this includes detecting document translation and reconstructing phylogenetic threes from genetic material. We make use of the already proved usefulness of compression based similarity distances for educational plagiarism detection to develop our second application: AC, an integrated source code plagiarism detection environment. The third application makes use of this distance as a fitness function, which is used by evolutionary algorithms to automatically generate music in a given pre-defined style.

Another three new applications are derived using Stochastic Modeling, two of them for evolutionary computation and one for classification. Two of them are intimately related and make use of the presence of Heavy Tail probability distributions in the optimization processes involved in the generation of fractals by an evolutionary algorithm, and in the training process of a multilayer perceptron. This discovery is used to improve the performance of both algorithms by means of restart strategies. The last application presented in this dissertation is a successful story of the use of a special randomized heuristic in a simple genetic algorithm to yield a state-of-the art evolutionary algorithm for solving Constraint Satisfaction Problems.

PDF presentation

Manuel Cebrián Ramos

Manuel Cebrian received the B.S. (2003) and M.S. (2005) degrees in computer science from the Universidad Autonoma de Madrid, Spain. He is currently a doctoral student in computer science and telecommunications at the same university, where he also teaches and conducts research in the Department of Computer Science. His research interest include information theory,
evolutionary computation and combinatorial optimization.