Multimodal Video Search by Examples


Summary

How to effectively and efficiently search for content from large video archives such as BBC TV programmes is a significant challenge. Search is typically done via keyword queries using pre-defined metadata such as titles, tags and viewer's notes.

However, it is difficult to use keywords to search for specific moments in a video where a particular speaker talks about a specific topic at a particular location. Video search by examples is a desirable approach for this scenario as it allows search for content by one or more examples of the interested content without having to specify interest in keyword. However, video search by examples is notoriously challenging, and its performance is still poor.

To improve search performance, multiple modalities should be considered – image, sound, voice and text, as each modality provides a separate search cue so multiple cues should identify more relevant content.

This is multimodal video search by examples (MVSE). In this project we will study efficient, effective, scalable and robust MVSE where video archives are large, historical and dynamic; and the modalities are person (face or voice), context, and topic. The aim is to develop a framework for MVSE and validate it through the development of a prototype search tool.

Such a search tool will be useful for organisations such as the BBC and British Library, who maintain large collections of video archives and want to provide a search tool for their own staff as well as for the public.

It will also be useful for companies such as Youtube who host videos from the public and want to enable video search by examples. We will address key challenges in video segmentation, content representation, hashing, ranking and fusion.


Objectives

The aim of this research project is to create a general framework for multimodal video search by examples (MVSE) so that video archives can be searched by examples of any content; and to build a prototype MVSE tool that is efficient, effective, scalable and extensible, in order to demonstrate the framework.

This project will benefit public and private organisations such as the BBC who maintain large collections of video archives, and want to use the MVSE tool for their own staff, as well as enabling public access to these archives. The project will also benefit other organisations who collect large quantities of audio-visual material and want to develop their own search tools using the framework developed by this project.

Measurable objectives are the following:

  1. To develop state-of-the-art methods for content-based video segmentation in order to index the video by content at the right time and for the right duration. This will support achieving the aim of search by examples of any content. This will be assessed based on evaluation feedback from the BBC on the prototype search tool.
  2. To develop state-of-the-art methods for multimodal, variation (age, lighting, pose and quality) invariant content representation. This will ensure the prototype search tool is effective and, in particular, can deal with historical archives that have lots of variations. This will be assessed by advancing the state-of-the-art in each of the modalities, as well as contributions to the performance of the prototype search tool.
  3. To develop state-of-the-art methods for content ranking based on hash codes and feature vectors for effective video search. This will enable the prototype search tool to handle uncertainty due to accuracy with segmentation and representation. This will be assessed by effectiveness improvements over existing approaches, and contributions to the prototype.
  4. To develop state-of-the-art methods for incremental content hashing in order to allow the tool to efficiently handle dynamic archives, large or small. This will occur as the prototype tool will be built in versions on increasingly complex data, as well as in an incremental manner as more videos are made available by the BBC.
  5. To establish a network of researchers and practitioners in the area of example-based multimedia retrieval. The Challenge Arena, part of the annual workshop, will provide an opportunity to develop shared tasks, in collaboration with the BBC, with objectives directly linked to the aim of this project. It will also provide an opportunity for individuals and organisations external to the project to benchmark against the MVSE tool, to challenge it, or to try it to meet their search needs.

Impact

The project’s objective is to provide scalable next-generation ‘search by example’ functionality across national video archives. The project will develop beyond the state of the art in video segmentation, content representation/matching/ranking functionality and these outputs are intended to provide positive, disruptive impact in multimedia search capability across the media industry nationally and internationally.

The beneficiaries of this project’s outputs will include academics, journalists, broadcasters, TV viewers, multimedia companies and organisations hosting and managing large video or multimedia repositories.

Journalists and broadcasters will directly benefit by time efficiency savings and the rapid discovery of relevant content when using this new technology. This will in turn provide better, more relevant and more enriched TV programming in less time thus having economic savings.

This will have a benefit to TV viewers who will enjoy more relevant TV programmes by the effective repurposing of content within big media archives. As a key partner, the immediate beneficiary will be the BBC who will likely adopt and integrate the new technology within their workflows to improve the discovery of media content when producing TV programmes.

However, the technologies developed are transferable to other broadcasters and indeed major online companies such as Youtube who rely on semantically enriched search technologies.

Academics will benefit by the dissemination and inspiration of the project’s new research findings and search technologies for rapidly discovering relevant video/multimedia content based on new intelligent algorithms.

The pathways to impact document provides an outline of a series of activities including co-creation workshops and licensing to increase the likelihood of research impact and adoption of the novel, disruptive technologies produced in this project.