SEMINARIOS DE DOCTORADO 2005-2006
Doctorado en Ingeniería
Informática y de
Telecomunicación
Escuela Politécnica Superior, Universidad Autónoma de
Madrid

27 de abril de 2006, 12:00
Salón de Grados, Escuela Politécnica Superior,
Universidad Autónoma de Madrid
Detecting
Translations of the Same Text and Data with Common Source
Kostadin Koroutchev
Escuela Politécnica
Superior, Universidad Autónoma de Madrid.
Abstract
Compression based similarity distances have the main drawback of needing the
same coding scheme for the objects to be compared. In some situations, there
exists significant similarity with no literal shared information: text
translations, different coding schemes, etc. To overcome this problem, we
present a similarity measure that compares the redundancy structure of the data
extracted by means of a Lempel-Ziv compression scheme. Each text is represented
as a graph in which vertices are text positions and edges represent shared
information; two texts are similar with our measure if they have the same
referential topology when compressed.
In this paper we give empirical evidence and phenomenological explanation that
this new measure is a robust indicator, detecting similarity between data coded
in different languages.
We also regard a textual data without any structure, but with a common source
and find that we can detect such data and distinguish this situation from the
previous one.
presentation PDF
Kostadin Koroutchev
K. Koroutchev obtained the B.Sc. in Theoretical Physics from Sofia University
in 1983 and PhD of Computer Science from Universidad Autónoma de Madrid in 2003.
He has worked at the Institute of Personal Computer and Communication Systems of
Bulgarian Academy of Science 1983-, and the Instituto de Ingenería del
Conocimiento, Madrid, 1995-2003.
Since 1998 he teaches in the Department of Computer Engineering of the
Universidad Autónoma de Madrid.
His research interests are concentrated in Image Processing, Neural Networks and
Compression Algorithms.