RETRIEVAL OF MAXIMALLY RELEVANT ARTICLE SECTIONS FOR INDEXED DATA
Mateo Wirth1, Bin Bi2.
1Princeton University, Princeton, NJ, 2University of California, Los Angeles, Los Angeles, CA.
The USC Shoah Foundation has collected over 52,000 video testimonies from survivors and other witnesses of the Holocaust. This project is aimed at improving the educational value of this database by incorporating information from external archives. In particular, we will focus on matching video segments collected and indexed by Shoah to a relevant section in an external article (e.g., a Wikipedia article). The challenge then is to formulate both an effective way to search the external archive (formulate a query) and then to compute the relevance of the results to the video segment. To make our search effective, we use query expansion techniques since we are only interested in documents about a specific topic (the Holocaust). We rank the documents based on their relevance using methods from information retrieval, namely latent semantic indexing and probabilistic language modeling. We must adapt these methods to take advantage of the structure of the indexing terms developed by Shoah. Finally, we will investigate how performance is affected by our innovations. We will report results from this work.