SEMINARIOS DE DOCTORADO 2005-2006


Doctorado en Ingeniería Informática y de Telecomunicación
Escuela Politécnica Superior, Universidad Autónoma de Madrid

Escuela Politécnica Superior                        


27 de abril de 2006, 12:00

Salón de Grados, Escuela Politécnica Superior, Universidad Autónoma de Madrid


Detecting Translations of the Same Text and Data with Common Source

Kostadin Koroutchev

Escuela Politécnica Superior, Universidad Autónoma de Madrid.

     

Abstract

Compression based similarity distances have the main drawback of needing the same coding scheme for the objects to be compared. In some situations, there exists significant similarity with no literal shared information: text translations, different coding schemes, etc.  To overcome this problem, we present a similarity measure that compares the redundancy structure of the data extracted by means of a Lempel-Ziv compression scheme. Each text is represented as a graph in which vertices are text positions and edges represent shared information; two texts are similar with our measure if they have the same referential topology when compressed.
In this paper we give empirical evidence and phenomenological explanation that this new measure is a robust indicator, detecting similarity between data coded in different languages.
We also regard a textual data without any structure, but with a common source and find that we can detect such data and distinguish this situation from the previous one.

presentation PDF

Kostadin Koroutchev

K. Koroutchev obtained the B.Sc. in Theoretical Physics from Sofia University  in 1983 and PhD of Computer Science from Universidad Autónoma de Madrid in 2003.
He has worked at the Institute of Personal Computer and Communication Systems of  Bulgarian Academy of Science 1983-, and the Instituto de Ingenería del Conocimiento, Madrid, 1995-2003. 
Since 1998 he teaches in the Department of Computer Engineering of the Universidad Autónoma de Madrid.
His research interests are concentrated in Image Processing, Neural Networks and Compression Algorithms.