Thamar Solorio is an Associate Professor of the Department of Computer Science at the University of Houston (UH). She holds graduate degrees in Computer Science from the Instituto Nacional de Astrofísica, Óptica y Electrónica, in Puebla, Mexico. Her research interests include information extraction from social media data, enabling technology for code-switched data, stylistic modeling of text and more recently multimodal approaches to online content understanding. She is the director and founder of the Research in Text Understanding and Language Analysis Lab at UH. She is the recipient of an NSF CAREER award for her work on authorship attribution, and recipient of the 2014 Emerging Leader ABIE Award in Honor of Denice Denton. She is an elected board member of the North American Chapter of the Association of Computational Linguistics (2020-2021). Her research is currently funded by the National Science Foundation and ADOBE and in the past she has received support from the Office of Naval Research and the Defense Advanced Research Projects Agency (DARPA).
Recent Findings on Multimodal Prediction Systems
There are many cases where an AI system can benefit from observing evidence in more than one single modality. In this context, modality refers to text, speech, images or video. For example, in movie classification, it is reasonable to expect that relevant information for the task can come from any combination of the speech, the audio, images and video segments. Then the key to attain acceptable accuracies lies in the approach to combine them. During this talk I will present recent work that relies on the Gated Multimodal Unit (GMUs) to provide an adaptable internal mechanism for combining modalities during representation learning. I will motivate the use of these GMUs with successful examples of multimodal problems.
I will also argue that multimodality is not always the optimal approach. Just because we have access to multimodal data does not necessarily mean we should exploit the multiple modalities. I will present examples where multimodality fails to deliver improvements over unimodal models due to inherent characteristics of the problem/data at thand, as well as the current capabilities of the underlying models.