Geometry of Presentation Videos and Slides, and the Semantic Linking of Instructional Content (SLIC) System

Presentation slides are now a de facto standard in most classroom lectures, business meetings, and conference talks. Until recently, electronic presentation materials have been disjointed from each other: the video file and the corresponding slides are typically available separately for viewing or d...

Full description

Bibliographic Details
Main Author: Kharitonova, Yekaterina
Other Authors: Barnard, Kobus
Language:en_US
Published: The University of Arizona. 2016
Subjects:
Online Access:http://hdl.handle.net/10150/612447
http://arizona.openrepository.com/arizona/handle/10150/612447
Description
Summary:Presentation slides are now a de facto standard in most classroom lectures, business meetings, and conference talks. Until recently, electronic presentation materials have been disjointed from each other: the video file and the corresponding slides are typically available separately for viewing or download. In this work, we exploit the fact that video frames of a presentation and the corresponding slides are mapped into one another by a geometric transformation, called a homography. This mapping allows us to synchronize a video with the slides shown in it, enabling users to interactively view presentation materials, and search within and across presentations. We show how we can approximate homographies with affine transformations. Similarly to the original homographies, such transformations allow us to project slides back into the video (i.e., perform backprojection), which improves their resulting appearance. The advantage of our method is that we use homographies to compress the original video, reducing bandwidth used to transmit the video file, and then carry out backprojection using affine transformations on the client side. Additionally, we introduce a novel approach to slide appearance approximation, which improves SIFT-based matching for videos with out-of-plane rotation of the projection screen. This method also allows us to split each slide into three overlapping panels, and generate rotated versions of each such panel. Using these panels during matching, we detect slide's content that is projected onto a speaker (what we call "slide tattoos"). We treat these "tattoos" as implicit structured light, which provides hints about the scene geometry. We then use the homography obtained from detecting "slide tattoos" to compute a fundamental matrix. The main significance of this contribution is that it allows us to infer 3-D information from 2-D presentation materials. Finally, we present the Semantically Linked Instructional Content (SLIC) Portal, an online system for accessing presentations that exploits our slide-video matching. Aspects of the SLIC system fully developed or significantly improved as part of this work include: *a publicly-open web collection of video presentations indexed by slides *a unified clear interface displaying a video player along with slide images synchronized with their appearance in the video *a categorization tree that allows browsing for presentations by topic/category *an ability to query slide words within and across presentations; querying is integrated with the"browsing"mode, where the search results can be narrowed to only the selected categories *an easy integration with the audio transcript: the ability to preview and search within speech words *cross-platform and mobile support. We conducted user studies at the University of Arizona to measure the effect of synchronized presentation materials on learners, and discuss students' favorable response to the SLIC Portal, which they used during the experiments.