News

Seminarios del Grupo de Aprendizaje Automático 2019-2020

  • El Grupo de Aprendizaje Automático vuelve a impartir seminarios de investigación un año mas en los que los investigadores del grupo comentarán cada semana artículos de interés para el interesado en el aprendizaje automático. Estos seminarios se celebrarán los Lunes a las 12:00 de la mañana en la sala B-351 de la EPS-UAM. Cada semana se actualizará la información del seminario que se va a impartir en la pestaña eventos de la página web del grupo. ¡No dudes en venir si estás interesado en algún seminario!

Seminario de investigación: Uncovering latent jet substructure

  • Ponente: Dr. Barry M. Dillon. Jozef Stefan Institute in Ljubljana, Slovenia
  • Fecha y lugar: Jueves 23 de Septiembre de 2019, 12:00h. Aula 07 (A-123).
  • Resumen: The primary goal at the Large Hadron Collider (LHC) is now to discover new physics, often termed Beyond the Standard Model (BSM) physics, and machine learning techniques could prove essential to this discovery. In this talk I will illustrate jet substructure tools, and explain the need for more powerful algorithms to better understand the complex signatures that arise in the LHC data. I will briefly review some of the most important applications of machine learning tools used in studying LHC data, and discuss their advantages and disadvantages. The precise focus of the talk will be on unsupervised searches for BSM physics using Bayesian generative modelling, in particular the Latent Dirichlet Allocation (LDA) algorithm. I will motivate the use of these techniques with an approximate mapping between the process through which particle collisions at the LHC evolve into measurements in the detectors, and the process of document generation described by the LDA model. The goal in using this technique is to extract ¿topics¿ from the data, which describe the physics underlaying the signals that have been measured in the collider. With these topics in hand, we can then use them to classify individual signals as having arisen from different underlaying processes. Two advantages of this technique are (i) it is unsupervised and hence insensitive to modelling inaccuracies, (ii) the extraction of topics allows the user to analyse what has been learned by the algorithm. I will conclude the talk with two applications of this technique; the first is in uncovering a pair-produced top quark signal, and the second is in uncovering a W 0 signal.
  • CV Ponente: I studied for my undergaduate masters degree in Applied Mathematics and Physics at Queen's University Belfast in Northern Ireland, graduating in 2011. After one year working as a software consultant in the private sector, I began my PhD studies in 2012 at the University of Sussex in the UK, working primarily on Beyond the Standard Model physics. After graduating in 2016 I moved the university of Plymouth (also UK) to take my first role as a postdoctoral researcher. In 2018 I started my current role as a postdoctoral researcher at the Jozef Stefan Institute in Ljubljana, Slovenia, where I work on the application of machine learning tools to the search for new physics at the LHC.

Seminario de investigación: Comparison of chemical data from glass evidence to calculate likelihood ratios and improve significance statements in court

  • Ponente: Dr. Jose Almirall. Director, Center for Advanced Research in Forensic Science (CARFS) and Professor, Department of Chemistry and Biochemistry, Florida International University, Miami, Florida
  • Fecha y lugar: Jueves 20 de Junio de 2019, 12:00h. Sala de grados A EPS-UAM.
  • Resumen: According to the National Highway Traffic Safety Administration Fatal Analysis Report System (NHTSA, 2018), more or less 6% of the 34,247 fatal crashes in the United States during 2017 involved a hit-and-run driver. These accidents resulted in more or less 2000 fatalities, a number that has seen a significant increase over the last decade. The State of Florida and Dade County, in particular, are hot spots for hit-and-run accidents. If drivers leave the scene of a hit-and-run collision without rendering aid and/or providing information to the others involved in the crash, the result is a crime scene with a variety of forensic evidence that can be used to reveal the parties involved. Trace evidence found at the scene of a violent collision between vehicles, or between a vehicle and a pedestrian, often provides leads that can assist in an investigation. While the victim may shed significant biological material, the nature of a hit-and-run investigation is that the vehicle must be located first, making timely leads from trace evidence pivotal. Plastic pieces of vehicle parts, paint chips and smears, glass shards, garment impressions, air bag residues, and other "trace evidence" is very often more useful than biological evidence (DNA) at the early stages of the investigation. Trace evidence may provide answers to pertinent questions such as what type of vehicle was involved, and who was driving at the time of the crash. Laser Ablation-Inductively Coupled Plasma-Mass Spectrometry (LA-ICP-MS) is the "gold standard" for the forensic analysis and comparison of glass evidence. A match criterion is used to compare the elemental profile of the known sample to a recovered sample and, if the glass samples are determined to be indistinguishable, this may be followed by the use of a verbal scale to report the examiner conclusion. This approach has several disadvantages: a fixed match criterion suffers from the ¿fall-off-the-cliff effect, the rarity of an elemental profile is not taken into account, and the use of a verbal scale to assign a weight of evidence may be considered as subjective and can vary by examiner. An alternative approach includes the use of a continuous likelihood ratio that provides a quantitative measure of the significance of the evidence and accounts for the rarity of an elemental profile through the use of a glass database. In the present study, three glass databases were used to evaluate the performance of the likelihood ratio; the first database includes 420 automotive windshield samples, the second database includes 385 glass samples from casework and the third is a combination of the two. The multivariate kernel model was used for the calculation of the likelihood ratio. However, this model led to unreasonably large (or small) likelihood ratios. Thus, a calibration step, using the Pool Adjacent Violators (PAV) algorithm, was necessary in order to limit the likelihood ratio to reasonable values. The calibrated likelihood ratio led to improved rates of misleading evidence for same-source glass comparisons (less than 1.0 %) and comparable rates of misleading evidence for different-source glass comparisons (less than 1.0 %). In addition, the likelihood ratio limited the magnitude of the misleading evidence, providing only weak support for the incorrect hypothesis. Finally, most of the pairs found to be wrongly supporting the same-source hypothesis were explained by similarity of manufacturer of the glass source
  • CV Ponente: José R. Almirall is a Professor in the Department of Chemistry and Biochemistry, Director Emeritus of the International Forensic Research Institute at Florida International University and Director of the National Science Foundation-funded Center for Advanced Research in Forensic Science (CARFS). He was a practicing forensic scientist at the Miami-Dade Police Department Crime Laboratory for 12 years, where he testified in over 100 criminal cases in state and federal courts prior to his academic appointment at FIU in 1998. Professor Almirall has authored one book and ~ 140 peer-reviewed scientific publications in the field of analytical and forensic chemistry. The interests of Prof. Almirall¿s research group include fundamental analytical chemistry and the development of analytical chemistry tools for use in forensic science including materials analyses using LA-ICP-MS and LIBS. Dr. Almirall is interested in the standardization of analytical methods used by forensic scientists and currently leads a global effort to standardize the analysis of glass evidence using LA-ICP-MS and the interpretation of the data for use in courts of law. Prof. Almirall is a Fellow of the American Academy of Forensic Sciences (AAFS) since 1998, past member of the editorial board of the Journal of Forensic Sciences and Editor-in-Chief of Forensic Chemistry, an Elsevier journal.

Seminario de investigación: Identifying Quantum Phase Transitions with Adversarial Neural Networks

  • Ponente: Dr. Alexandre Dauphin. Institut de Ciències Fotòniques (ICFO), Barcelona.
  • Fecha y lugar: Jueves 3 de Junio de 2019, 12:00h. Sala de grados A EPS-UAM.
  • Resumen: The identification of phases of matter is a challenging task, especially in quantum mechanics, where the complexity of the ground state appears to grow exponentially with the size of the system. We address this problem with state-of-the-art deep learning techniques: adversarial domain adaptation. We derive the phase diagram of the whole parameter space starting from a fixed and known subspace using unsupervised learning. The input data set contains both labeled and unlabeled data instances. The first kind is a system that admits an accurate analytical or numerical solution, and one can recover its phase diagram. The second type is the physical system with an unknown phase diagram. Adversarial domain adaptation uses both types of data to create invariant feature extracting layers in a deep learning architecture. Once these layers are trained, we can attach an unsupervised learner to the network to find phase transitions. We show the success of this technique by applying it on several paradigmatic models: the Ising model with different temperatures, the Bose-Hubbard model, and the SSH model with disorder. The input is the ground state without any manual feature engineering, and the dimension of the parameter space is unrestricted. The method finds unknown transitions successfully and predicts transition points in close agreement with standard methods. This study opens the door to the classification of physical systems where the phases boundaries are complex such as the many-body localization problem or the Bose glass phase.
  • CV Ponente: Dr. Alexandre Dauphin is a theoretical physicist working as a Postdoctoral researcher (Juan de la Cierva Fellow) at the Institute of Photonique Sciences (ICFO, Barcelona). He did his PhD at the ULB under the supervision of Prof. P. Gaspard (ULB) and Prof. M.-A. Martin-Delgado (Universidad Complense de Madrid, UCM). Alexandre Dauphin is working on several topics at the interface of condensed matter physics, quantum optics and atomic physics, with a focus on the field of topological insulators and more recently on the application of machine learning in physics. His main focus is the quantum simulation and detection of topological insulators in cold atom experiments in optical lattices and in photonics.

Seminarios del Grupo de Aprendizaje Automático 2018-2019

  • El Grupo de Aprendizaje Automático vuelve a impartir seminarios de investigación un año mas en los que los investigadores del grupo comentarán cada semana artículos de interés para el interesado en el aprendizaje automático. Estos seminarios se celebrarán los Jueves a las 12:00 de la mañana en la sala B-351 de la EPS-UAM. Cada semana se actualizará la información del seminario que se va a impartir en la pestaña eventos de la página web del grupo. ¡No dudes en venir si estás interesado en algún seminario!

Seminario de investigación: Aprendizaje semi-supervisado en entornos de clasificación desequilibrada: creación de un modelo de emparejamiento donante-receptor para trasplantes de hígado

  • Ponente: Dr. Pedro Antonio Gutiérrez (UCO).
  • Fecha y lugar: Jueves 25 de Enero del 2018, 12:00. Sala de grados A EPS-UAM.
  • Resumen: El trasplante de hígado es un tratamiento esperanzador y ampliamente aceptado para los pacientes con una enfermedad terminal de hígado. Sin embargo, este tratamiento está limitado por la falta de donantes, que provoca muchas muertes en lista de espera. Nuestro trabajo propone un nuevo sistema de emparejamiento donante-receptor que utiliza aprendizaje automático para predecir la supervivencia del injerto tras el trasplante, utilizando para ello una base de datos de trasplantes realizados en el hospital King¿s College de Londres. Desde el punto de vista metodológico, la principal novedad del sistema es que abordamos el desequilibrio del problema en cuanto al número de patrones por clase considerando aprendizaje semi-supervisado y analizando su potencial para obtener modelos más robustos y equitativos. De esta forma, proponemos dos fuentes distintas de datos no etiquetados (trasplantes recientes cuyo resultado aún no se conoce y emparejamientos virtuales donante-receptor), junto con dos métodos para utilizar estos datos en la construcción del modelo (un algoritmo semi-supervisado y un esquema de propagación de etiquetas). Demostramos como los pares virtuales y el método de propagación de etiquetas son capaces de aliviar el problema del desequilibrio, suponiendo una forma novedosa de abordar este tipo de problemas. Los resultados obtenidos muestran que el uso conjunto de información real y sintética ayuda a mejorar y estabilizar el rendimiento del modelo y lleva a decisiones más justas. Finalmente, proponemos utilizar el modelo desarrollado junto con un criterio de severidad asociado al receptor, para llegar a un compromiso entre la gravedad del paciente y el pronóstico del trasplante.
  • CV Ponente: Pedro Antonio Gutiérrez Peña obtuvo el doctorado en Informática por la Universidad de Granada, el título de Ingeniero en Informática por la Universidad de Sevilla y el Máster en Soft Computing y Sistemas Inteligentes también por la Universidad de Granada. Actualmente es Profesor Titular del Departamento de Informática y Análisis Numérico de la Universidad de Córdoba, habiendo trabajado anteriormente en el Institutito de Agricultura Sostenible del CSIC. Pertenece al grupo de investigación AYRNA (Aprendizaje y Redes Neuronales Artificiales). Su labor de investigación está centrada en el aprendizaje automático, abarcando el diseño de redes neuronales artificiales mediante técnicas bioinspiradas, el desarrollo y evaluación de modelos para clasificación ordinal y la aplicación de todas estas técnicas a problemas reales en energías renovables o biomedicina.

Aprendizaje Regularizado: Cuando los datos no bastan

  • Tuesday, December 19, 2017. 11:30 h. FuzzyMad, Math Faculty, UCM.
  • Dr. Carlos María Alaíz Gudín (UAM).
  • El aprendizaje regularizado se ha convertido en un campo clave del aprendizaje automático debido a la necesidad de lidiar con problemas con un gran volumen de datos. En particular, el aprendizaje a través de términos convexos presenta propiedades teóricas deseables, siendo estos regularizadores los más extendidos. De entre ellos, algunos términos de regularización no diferenciables proporcionan propiedades interesantes en los modelos resultantes, como ser dispersos o constantes a trozos. En este charla se abordarán algunos de los regularizadores convexos no diferenciables clásicos, los efectos que producen al ser utilizados en aprendizaje automático, y se introducirán algunos algoritmos basados en optimización proximal para resolver los problemas de optimización resultantes.
  • More information

Community Detection in Directed Networks

  • Tuesday, July 26, 2016, 12:00 h. Sala de Grados A, EPS-UAM
  • Dr. Carlos Alaiz (Katolieke Universiteit Leuven)
  • Communities in directed networks have often been characterized as regions with a high density of links, or as sets of nodes with certain patterns of connection. Our approach for community detection combines the optimization of a quality function and a spectral clustering of a deformation of the combinatorial Laplacian, the so-called magnetic Laplacian. The eigenfunctions of the magnetic Laplacian, that we call magnetic eigenmaps, incorporate structural information. Hence, using the magnetic eigenmaps, dense communities including directed cycles can be revealed as well as “role” communities in networks with a running flow, usually discovered thanks to mixture models. Furthermore, in the spirit of the Markov stability method, an approach for studying communities at different energy levels in the network is put forward, based on a quantum mechanical system at finite temperature.
  • More information

Data Visualization of Directed Networks

  • Wednesday, July 20, 2016, 15:00 h. Aula C-105, Edif. C, EPS-UAM
  • Dr. Angela Fernandez Pascual (Katolieke Universiteit Leuven)
  • Data visualization is a crucial field for revealing information in a clear and efficient way, being a helpful tool for analyzing data. In this presentation, we will talk about a new method for directed graphs visualization, called Magnetic Eigenmaps, which is based on the analysis of the Magnetic Laplacian, a complex deformation of the well-known combinatorial Laplacian. The main advantage of this method is that it is able to highlight, in a flexible way, groups presented on the network according to the density of links and directionality patterns of the graph, that are revealed through the study of the phases of the first magnetic eigenfunctions.
  • More information

Curso Doctoral EPS-UAM: Aprendizaje automático con datos funcionales. Dr. José Luis Torrecilla

  • Curso Doctoral: Aprendizaje automático con datos funcionales.
  • Docente: Dr. José Luis Torrecilla. Grupo de Aprendizaje Automático. Escuela Politécnica Superior. Universidad Autónoma de Madrid.
  • Fechas: 18 a 21 de Julio de 2016, de 11:00 a 13:00 h.
  • Lugar: Aula 5, primera planta, Edificio A (Alan Turing), Escuela Politécnica Superior. Universidad Autónoma de Madrid. Ciudad Universitaria de Cantoblanco. Calle Francisco Tomás y Valiente, 11. 28049 Madrid, SPAIN.
  • Resumen: Los datos funcionales aparecen frecuentemente en distintos ámbitos. Aunque estos datos pueden ser tratados mediante técnicas de análisis multivariado, en muchos casos es más útil tratarlos como funciones y adaptar las técnicas generales. El análisis de datos funcionales es la rama de la estadística que estudia este tipo de datos. Es un área de investigación muy activa en la actualidad. Este curso es una introducción al análisis de datos funcionales en el que se expondrán algunas de las principales características y particularidades de los datos funcionales, así como aspectos metodológicos y técnicas para su tratamiento. Entre los contenidos se incluyen:
    Nociones básicas de las variables temporales (procesos estocásticos) y espacios funcionales.
    Introducción al análisis de datos funcionales: ejemplos, herramientas...
    Representación de datos funcionales.
    Análisis exploratorio: nociones de profundidad, reducción de dimensión...
    Modelos de aprendizaje: regresión y clasificación (con especial atención al segundo).
  • Mas detalles

Oferta de trabajo

  • El Grupo de Aprendizaje Automático (GAA) de la Escuela Politécnica Superior de la Universidad Autónoma de Madrid [www.eps.uam.es/~gaa] busca candidatos para realizar un proyecto de investigación.
  • Tareas a realizar: Aprendizaje automático aplicado en problemas de geología.
  • Perfil: Licenciado, Ingeniero en Informática o áreas afines. Se valorará experiencia de investigación demostrable en el tema del proyecto.
  • Duración: 1 de septiembre de 2016 a 31 de diciembre de 2016
  • Salario bruto: 1195,83 € /mes
  • Candidatos interesados: Enviar CV (castellano o inglés) y copia del título a alberto.suarez@uam.es [Asunto: Oferta GAA 2015/06]  hasta el jueves 2016/06/30 

Doctoral course: Bayesian Optimization

  • 16-21 December 2015, 11:00-13:00 h, LAB 16, 3rd fl., Bdg. A, EPS-UAM
  • Lecturer: Dr. José Miguel Hernández Lobato (Harvard University)
  • Registration form
  • More information

Probabilistic Backpropagation for Scalable Learning of Bayesian Neural Networks

  • Monday, 21 December 2015, 11:00-13:00 h
  • Dr. José Miguel Hernández Lobato (Harvard University)
  • Large multilayer neural networks trained with backpropagation have recently achieved state-of-the-art results in a wide range of problems. However, using backprop for neural net learning still has some disadvantages, e.g., having to tune a large number of hyperparameters to the data, lack of calibrated probabilistic predictions, and a tendency to overfit the training data. In principle, the Bayesian approach to learning neural networks does not have these problems. However, existing Bayesian techniques lack scalability to large dataset and network sizes. In this work we present a novel scalable method for learning Bayesian neural networks, called probabilistic backpropagation (PBP). Similar to classical backpropagation, PBP works by computing a forward propagation of probabilities through the network and then doing a backward computation of gradients. A series of experiments on ten real-world datasets show that PBP is significantly faster than other techniques, while offering competitive predictive abilities. Our experiments also show that PBP provides accurate estimates of the posterior variance on the network weights.
  • More information

Master/PhD/PostDoc Open Position

Shaping Social Activity by Incentivizing Users


Manuel Gómez Rodríguez (Max Planck Institute for Software Systems)

  • November 20, 2014 at 12:00 h
  • Sala de Grados, Escuela Politécnica Superior, Universidad Autónoma de Madrid
  • More information

Three reasons why control is hard: learning, planning and representing


Bert Kappen (Radboud University Nijmegen)

  • February 26, 2014 at 11:00 h
  • Sala de Grados, Escuela Politécnica Superior, Universidad Autónoma de Madrid
  • More information

Practical Implications of Classification Calibration


Irene Rodríguez Luján (Biocircuits Institute. University of California San Diego)

  • January 13, 2014 at 12:00 h
  • Sala de Grados, Escuela Politécnica Superior, Universidad Autónoma de Madrid
  • More information

Training nested functions using auxiliary coordinates


Miguel Á. Carreira-Perpiñán (University of California, Merced)

  • January 8, 2014 at 12:00 h
  • Sala de Grados, Escuela Politécnica Superior, Universidad Autónoma de Madrid
  • More information

Previous seminars