MATCHING VIDEO SEGMENTS WITH RELEVANT DOCUMENTS
Ilan Morgenstern Kaplan1, Bin Bi2.
1Instituto Tecnologico Autonomo de Mexico (ITAM), Mexico City, MX, 2University of California, Los Angeles, Los Angeles, CA.
The USC Shoah Foundation has collected over 52,000 video testimonies from survivors and other witnesses of the Holocaust. The goal of this project is to improve the learning experience for the people who watch the testimonies by integrating external knowledge to the archive. For example, if a person in a testimony is talking about his experience in the Warsaw Ghetto, we would like to display more information (e.g., a Wikipedia article) related to this topic next to the video. Since the videos are indexed by keywords for each minute of video, we use these data to extend various information retrieval methods to match each segment to a Wikipedia article. These methods include the vector space model approach, latent semantic indexing (tf-idf weighing) language models and topic models. Based on the existing techniques, we will propose a new method that exploits the metadata of the document corpus to match the video segments with relevant documents. One challenge of this project is that we are searching for the most relevant documents in a specific category (in this case, the Holocaust). We predict that incorporating query-expansion procedures to our methods by using definitions for each keyword and the information provided by the corpus’ metadata will improve the performance of the standard methods in all cases. We will report results from this work.