多媒体内容分析与检索技术.ppt-得力文库

资源描述

《多媒体内容分析与检索技术.ppt》由会员分享，可在线阅读，更多相关《多媒体内容分析与检索技术.ppt（84页珍藏版）》请在得力文库 - 分享文档赚钱的网站上搜索。

1、多媒体分析与检索技术 Multimedia Analysis and Retrieval Technology,注：本讲内容参考了北京大学数字媒体研究所数字媒体技术基础课件,数字媒体技术基础第五讲（8课时）,课程内容及安排,第一部分：数字媒体导论第二部分：数字媒体基础数字彩色图像基础图像/视频处理基础第三部分：数字媒体关键技术多媒体压缩编码技术多媒体分析与检索技术多媒体通信技术数字版权管理技术,2/80,教学目标,通过本章的学习，掌握“多媒体分析与检索”这一多媒体领域最活跃研究方向的基本研究问题和方法，及其最新进展。 ACM Multimedia ACM ICMR ICME M

2、MM ICIMCS ICCV CVPR ICIP ICPR,3/80,教学内容,多媒体检索概论(2) 基于内容的图像分析与检索（CBIR）(2) 视频分析与检索(3) 音频分析与检索(1),4/80,一、多媒体检索概论,5/80,Internet Videos, Images, Audio, Flash, Aminations,Local Videos, Images,如何从如此海量的多媒体数据中定位到你所感兴趣的信息？,How to effectively organize, manage, browse, retrieve?,Image/Video indexing should be a

3、nalogous to text document indexing,Multimedia Analysis and Retrieval,6/80,引言,“多媒体搜索引擎” 可以搜索多媒体文档的搜索引擎多媒体文档: 可包含多种模态,如文本、图像、视频、音频等广义的：可以搜索非文字信息的搜索引擎 “视/听觉”信息,7/80,多媒体文档的特点,多媒体文档包含丰富的非文字信息,8/80,多媒体文档的特点,关键字对应的非文字信息可能过于宽泛,9/80,麦浪滚滚,多媒体检索概念,提供多媒体的查询输入可以方便地输入多媒体和文字查询对多媒体文档进行多媒体索引特征索引：文本特征（字、词、短语）、视觉

4、特征（颜色直方图、Gabor纹理、形状特征、）、音频特征（音高、音调.）语义索引：元数据、概念、事件提供多媒体的结果显示直观地展示多媒体和文字信息直观地展示深层信息跨文档综合（多媒体和文字信息）方便浏览大量文档,10/80,如何检索？,11/80,检索方法1：基于文本(QBT),关键问题：如何获得关键字标注？方法1：手工标注工作量巨大-不可行！即使对同一幅图像，不同的人有不同的描述方法2：自动标注各种机器学习的算法性能不佳：只能提取少数概念，准确率也低(30%),12/80,检索方法1：基于文本(QBT),关键问题：如何获得关键字标注？方法3：元数据分析-URL、链接

5、文字、标题、关联页面,Meta-data 元数据,东北虎：5 老虎：3 动物：2 中国：1 俄罗斯：1 长白山：1 。,13/80,检索方法1：基于文本(QBT),关键问题：如何获得关键字标注？方法3：元数据分析-URL、链接文字、标题、关联页面问题：元数据不一定与多媒体文档内容相关,没有元数据或不完整！,元数据与图像内容不相关！,14/80,检索方法1：基于文本(QBT),关键问题：如何获得关键字标注？方法4：网络标注（Social tagging/Folksonomy）向普通用户提供上载和分享平台鼓励所有用户对上载的文档进行评论和标注这些评论和标注是直接针对文档作出的,15/8

6、0,16/80,QBT的难题,需求难以用文字精确描述非文字需求用户不愿意输入很多文字用户需求不是特别具体大多数人的想象力是不够丰富的系统提供的结果会极大地影响用户的需求需要浏览更多的文档才能发现需要的结果最重要：图像/视频/音频往往难以用文字准确描述一图胜千言各种文字标注方法普遍准确率不高,18/80,视觉信息描述的复杂性,19,检索方法2：基于内容/样例,基于内容的图像/视频检索 Content-based image/video retrieval (CBIR/CBVR) Query-by-Example (QBE) 什么是“内容”（Content）？图像和视频的视觉特

7、性如何描述？(数学模型) 如何匹配？(相似度计算方法) 如何索引？(快速找到相似文档) 如何提交查询？,20/80,基于内容的图像/视频检索,“内容”的数学模型文本文档：向量模型多媒体文档：特征提取表示视觉的多个物理量组成描述文档内容的特征视觉特征：颜色、纹理、形状、运动音频特征：音频、音质、音调.,21/80,Color Camera motion Motion activity Mosaic,Color Motion trajectory Parametric motion Spatio-temporal shape,Color Shape Position Texture,S

8、poken content Spectral characterization Music: timbre, melody, pitch,视音频特征示例,22,基于内容的图像检索,Query by content: Color,texture Eigen vectors of matrix) Turing function based (similar to Fourier descriptor) convex/concave polygons Wavelet transforms leverages multiresolution Chamfer matching for comparing

9、 2 shapes (linear dimension rather than area) 3-D object representations using similar invariant features Well-known edge detection algorithms,48/80,特征举例：颜色特征,Colour histograms (CH) Global CH generated directly from RGB space, with 125 (5x5x5) bins.,49/80,Bosch, IVC, 2006,50/80,特征举例：边特征,Edge histogr

10、am (EHD) Captures the spatial distribution of the edge in six statues: 0, 45, 90, 135, non direction and no edge. Global EHD of an image: Concatenating 16 sub EHDs into a 96 bins Local EHD of a segment Grouping the edge histogram of the image-blocks fallen into the segment,51/80,特征举例：点特征,Detect patc

11、hes Mikojaczyk and Schmid 02 Sivic et al. 03,Compute SIFT descriptor Lowe99,52,全局 vs. 局部特征,54/80,区域分割,计算机视觉领域的公开难题,55/80,相似度度量,Dotta, et al., Image retrieval:Ideas, influences, and trends of the new age, ACM Computing Survey, 2008,56/80,相似度度量,Dotta, et al., Image retrieval:Ideas, influences, and tre

12、nds of the new age, ACM Computing Survey, 2008,CBIR不是为了进行精确的匹配，而是计算查询图像和数据库中的图像之间的视觉相似度，相应的，检索结果不是单一的一副图像，而是按照与查询图像的相似度排序的一系列图像。不同的相似度度量显著影响CBIR系统的性能。,57/80,基于样例的查询Query by Example,Pick query examples and ask the system to retrieve “similar” images.,Query Sample,58/80,相关反馈Relevance Feedback,User giv

13、es a feedback to the query results System recalculates feature weights,Initial sample,59/80,相关反馈Online Feature Weighting,From Query Examples, the system determines feature weighting (k x k) matrix W,Query,60/80,基于相关反馈的检索界面,User selects relevant images If good images are found, add them When no more

14、images to add, the search converges,Slider or Checkbox,61/80,基于相关反馈的检索界面,62/80,评价指标：Average Precision,只对返回的相关文档进行计算,系统检索出来的相关文档越靠前(rank 越高)，AP就越高,63,评价指标：Average Precision,MAP(Mean Average Precision) is the average AP for all queries,例如：假设有两个queries，query 1有4个相关images，query2有5个相关image。某系统对于query1检索

15、出4个相关image，其rank分别为1, 2, 4, 7；对于query2检索出3个相关query，其rank分别为1,3,5。对于query1，AP为 (1/1+2/2+3/4+4/7)/4=0.83。对于query2，AP为 (1/1+2/3+3/5+0+0)/5=0.45。则MAP= (0.83+0.45)/2=0.64。,64,现实中的CBIR系统示例,Visual similarity search in Specific Domain： a photo-sharing community with more than a million airplane-related

16、pictures,65/80,现实中的CBIR系统示例,a public-domain search engine which incorporates image retrieval and face recognition for searching pictures of people and products on the Web.,66/80,Image Annotation/Tagging：面向图像语义检索,Ship Water Tree sky,Use for keyword-based image retrieval,67,Image Annotation/Tagging,J

17、Jeon, et al., Automatic image annotation and retrieval using cross-media relevance models, Sigir, 2003,Relevance Models,w1, w2, w3, . wn,68/80,Annotation Examples,69/80,Bridge User Intention Gap,User queries are usually short, ambiguous How to capture user search intent?,70/80,Visual Query Suggestio

18、n,Zheng-Jun Zha, et al., Visual Query Suggestion, ACM MM, 2009,71/80,Visual Query Suggestion,To help users specify and deliver their search intents,Zheng-Jun Zha, et al., Visual Query Suggestion, ACM MM, 2009,72/80,IGroup: presenting web image search results in semantic clusters,The result of “tiger

19、” in MSN image search: mixed with “tiger woods” and “tiger animal”.,73/80,IGroup: presenting web image search results in semantic clusters,The screen of IGroup: the general view,74/80,IGroup: presenting web image search results in semantic clusters,The screen of IGroup: the cluster view,75/80,IGroup

20、: presenting web image search results in semantic clusters,76/80,3D MARS: 图像检索的3D展示,Image retrieval and browsing in 3D Virtual Reality The user can see more images without occlusion Query results can be displayed in various criteria Results by Color features, by texture, by combination of color and

21、texture,77/80,3D MARS,78/80,79/80,Copy Detection,拷贝检测的定义拷贝(Copy)是从源视频中截取的一段视频片断，并对其内容或格式进行多种形式的转变/攻击(Transformations)。拷贝检测(CBCD, Content-based copy detection)是给定查询视频，判断其是否是来源于数据库某视频的拷贝，并判定对应原始片断的起止时间。,80/80,Global Quality Decrease Partial Content Alteration,Original,Blur,Brightness,Adding noise,Or

22、iginal,Insert Caption,Pic in Pic,Crop,Shift,Transformations,81/80,Applications of Copy Detection,数字媒体的版权保护商业视频的数据挖掘 (台标和广告检测) 基于视频内容的法律调查与取证视频搜索与视频数据库中的去冗余个人媒体库内容自动管理视频智能搜索和内容索引等,82/80,Copy Detection,Copy detection evaluation, CIVR 2007 Copy detection evaluation task, TRECV 2008 Challenges Featu

23、res: invariant patterns Large-scale: speed, indexing Zhang2009 KeyFrame Global: Blcok Gradient Histogram Local: SIFT + Spatial Constraints Temporal consistency Constriants,Yong-Dong Zhang et al., TRECVID 2009 of MCG-ICT-CAS, 2009,83/80,Presentation (20 minutes),Background & Motivations Contributions or Differences Framework Key Techniques Evaluations Dataset Metric Results,84/56,

展开阅读全文