SEMINARIOS EN INGENIERÍA INFORMÁTICA Y DE
TELECOMUNICACIÓN 2006-2007
Doctorado en Ingeniería
Informática y de
Telecomunicación
Programa Oficial de Posgrado en Ingeniería
Informática y de Telecomunicación
Escuela Politécnica Superior, Universidad Autónoma de
Madrid

8 de junio de 2006, 12:00
Salón de Grados, Escuela Politécnica Superior,
Universidad Autónoma de Madrid
Using Algorithmic Information Theory and
Stochastic Modeling to
Improve Classification and Evolutionary Computation
Manuel Cebrián Ramos
Escuela Politécnica
Superior, Universidad Autónoma de Madrid
Abstract
This dissertation presents theoretical and practical contributions in
Algorithmic Information Theory and (Algorithmic) Stochastic Modeling.
Algorithmic Information Theory is the theory concerned with obtaining an
absolute measure of the information contained in an object. Stochastic Modeling
is a methodology to improve an algorithm's performance by means of the
introduction of random elements in its logic.
One of the most interesting advances of the Algorithmic Information Theory is
the development of an absolute measure of similarity between objects. This
measure can only be estimated, as it is incomputable by definition. The typical
estimation relies on the use of data compression algorithms, being this
estimation known as the compression distance. The two theoretical contributions
presented analyze the quality of this estimation. The first quantifies the
estimation robustness when the information contained in the objects is noise-altered,
concluding that it is considerably resistant to noise. The second studies the
impact of the compression algorithm implementation on the estimation, yielding
some practical recipes for making this choice.
We use variants of the compression distance to develop three applications for
classification, being one an application to evolutionary computation as well.
The first application addresses the problem of detecting similarities in objects
which have been generated by a predecessor common source, independently of
whether they use or not the same coding scheme: this includes detecting document
translation and reconstructing phylogenetic threes from genetic material. We
make use of the already proved usefulness of compression based similarity
distances for educational plagiarism detection to develop our second application:
AC, an integrated source code plagiarism detection environment. The third
application makes use of this distance as a fitness function, which is used by
evolutionary algorithms to automatically generate music in a given pre-defined
style.
Another three new applications are derived using Stochastic Modeling, two of
them for evolutionary computation and one for classification. Two of them are
intimately related and make use of the presence of Heavy Tail probability
distributions in the optimization processes involved in the generation of
fractals by an evolutionary algorithm, and in the training process of a
multilayer perceptron. This discovery is used to improve the performance of both
algorithms by means of restart strategies. The last application presented in
this dissertation is a successful story of the use of a special randomized
heuristic in a simple genetic algorithm to yield a state-of-the art evolutionary
algorithm for solving Constraint Satisfaction Problems.
PDF
presentation
Manuel Cebrián Ramos
Manuel Cebrian received the B.S. (2003) and M.S. (2005) degrees in computer
science from the Universidad Autonoma de Madrid, Spain. He is currently a
doctoral student in computer science and telecommunications at the same
university, where he also teaches and conducts research in the Department of
Computer Science. His research interest include information theory,
evolutionary computation and combinatorial optimization.