Structure-from-motion.ppt

上传人:豆**** 文档编号:27426172 上传时间:2022-07-24 格式:PPT 页数:51 大小:3.18MB
返回 下载 相关 举报
Structure-from-motion.ppt_第1页
第1页 / 共51页
Structure-from-motion.ppt_第2页
第2页 / 共51页
点击查看更多>>
资源描述

《Structure-from-motion.ppt》由会员分享,可在线阅读,更多相关《Structure-from-motion.ppt(51页珍藏版)》请在得力文库 - 分享文档赚钱的网站上搜索。

1、What can we compute from a collection of pictures?- 3D structure- camera poses and parametersOne of the most important / exciting results in computer vision from 90sIt is difficult, largely due to numerical computation in practice.But this is SO powerful!2 SIGGRAPH papers with several sketches this

2、year!show a few demo videosNow lets see how this works!Input: (1) A collection of pictures.Output:(1) camera parameters(2) sparse 3D scene structureConsider 1 camera firstWhats the relation between pixels and rays in space?10100ZYXffZfYfX10101011ZYXffZfYfXPXx 0| I) 1 ,(diagPffC-XRXcamX10RCR110CRRXca

3、mZYXcamX0| IKx XC| IKRxC| IKRPP is a 3x4 Matrix7 degree of freedom:1 from focal length3 from rotation3 from translationt |RKP Simplified projective camera model Px = P X = K R | t Xx = P XConsider 1 cameraP3x4 has 7 degrees of freedomGiven one image, we observe xCan we recover X or P?If P is known,

4、what do we know about X?If X is known, can we recover P?# unknown = 7Each X gives 2 equations2n = 7 i.e. n = 4This is a Camera Calibration ProblemInput: n4 world to image point correspondences Xi xiOutput:camera parameters P = KR|TDirect Linear Transform (DLT)iiPXx iiPXxwhere Xix = 0 -w yw 0 x-y x 0

5、 Direct Linear Transform (DLT)n 4 pointsApminimize subject to constraint 1p use SVDTVUA p is the last column vector of V: p = VnObjectiveGiven n4, 3D to 2D point correspondences Xixi, determine PAlgorithmLinear solution: Normalization: DLTMinimization of geometric error: Iteratively optimization (Le

6、venberg-Marquardt):(i)Denormalization:iiUXXiiTxxUPTP-1Implementation in PracticeCamera centre C is the point for which PC = 0i.e. the right null vector of PObjectiveGiven camera projection matrix P, decompose P = KR|tAlgorithm Perform RQ decomposition of M, so that K is the upper-triangular matrix a

7、nd R is orthonormal matrix.write M = KR, then P = MI|- CHow to recover K, R and t from P?P = KR|t = KR|-RC = KRI|-CThis is what we learn from 1 CameraLets consider 2 camerasCorrespondence geometry: Given an image point x in the first image, how does this constrain the position of the corresponding p

8、oint x in the second image?(ii) Camera geometry (motion): Given a set of corresponding image points xi xi, i=1,n, what are the cameras P and P for the two views?Correspondence geometry: Given an image point x in the first image, how does this constrain the position of the corresponding point x in th

9、e second image?The Fundamental Matrix FxT Fx = 0What does Fundamental Matrix F tell us?xT Fx = 0Fundamental matrix F relates corresponding pixelsIf the intrinsic parameter (i.e. focal length in our camera model) of both cameras are known, as K and K.Then we can derive (not here) that: KTFK = t cross

10、 product Rt and R are translation and rotation for the 2nd camerai.e. P = I|0 and P = R|tGood thing is that xT Fx = 0Fundamental matrix F can be computed:from a set of pixel correspondences: x xCompute F from correspondence:0FxxTseparate known from unknown0333231232221131211fyfxffyyfyxfyfxyfxxfx0,1

11、, , ,T333231232221131211fffffffffyxyyyxyxyxxx(data)(unknowns)(linear)0Af 0f11111111111111nnnnnnnnnnnnyxyyyxyxyxxxyxyyyxyxyxxxHow many correspondences do we need?What can we do now?(1) Given F, K and K, we can estimate the relative translationand rotation for two cameras:(2) Given 8 correspondences:

12、x x, we can compute FP = I | 0 and P = R | tGiven K and K, and 8 correspondences x x, we can compute: P = I | 0 and P = R | t This answers the 2nd questionCorrespondence geometry: Given an image point x in the first image, how does this constrain the position of the corresponding point x in the seco

13、nd image?(ii) Camera geometry (motion): Given a set of corresponding image points xi xi, i=1,n, what are the cameras P and P for the two views?But how to make this automatic?Given K and K, and 8 correspondences x x, we can compute: P = I | 0 and P = R | t (1) Estimating intrinsic K and K (auto-calib

14、ration) will not be discussed here. (involve much projective geometry knowledge)(2) Lets see how to find correspondences automatically. (i.e. Feature detection and matching)Lowes SIFT features invariant to with position, orientation and scaleScale Look for strong responses of DOG filter (Difference-

15、Of-Gaussian) over scale space Only consider local maxima in both position and scale Orientation Create histogram of local gradient directions computed at selected scale Assign canonical orientation at peak of smoothed histogram Each key specifies stable 2D coordinates (x, y, scale, orientation)02Sim

16、ple matchingFor each feature in image 1 find the feature in image 2 that is most similar (compute correlation of two vectors) and vice-versaKeep mutual best matchesCan design a very robust RANSAC type algorithmWhat have we learnt so far?What have we learnt so far?Consider more then 2 camerasKKPPXPOb

17、jectiveGiven N images Q1, , QN with reasonable overlapsCompute N camera projection matrices P1, , PN , where each Pi = KiRi |ti, Ki is the intrinsic parameter, Ri and ti are rotation and translation matrix respectivelyAlgorithm(1) Find M tracks T = T1, T2, , TN (i ) for every pair of image Qi , Qj:

18、detect SIFT feature points in Qi and Qj match feature points robustly (RANSAC)(ii) match features across multiple images, construct tracks.(2) Estimate P1 PN and 3D position for each track X1 XN (i ) select one pair of image Q1 , Q2 (well-conditioned). Let T12 = their associate overlapping track;(ii

19、) Estimate K1 and K2, compute P1 , P2 and 3D position of T12 from fundamental matrix.(iii) incrementally add new camera Pk into the system, estimate its camera matrix by DLT (calibration) (iv) repeat (iii) until all the cameras are estimated.Algorithm(1) Find M tracks T = T1, T2, , TN (i ) for every

20、 pair of image Qi , Qj: detect SIFT feature points in Qi and Qj match feature points robustly (RANSAC)(ii) match features across multiple images, construct tracks.(2) Estimate P1 PN and 3D position for each track X1 XN (i ) select one pair of image Q1 , Q2 (well-conditioned). Let T12 = their associa

21、te overlapping track;(ii) Estimate K1 and K2, compute P1 , P2 and 3D position of T12 from fundamental matrix.(iii) incrementally add new camera Pk into the system, estimate its camera matrix by DLT (calibration) (iv) repeat (iii) until all the cameras are estimated.However, this wont work!Algorithm(

22、1) Find M tracks T = T1, T2, , TN (i ) for every pair of image Qi , Qj: detect SIFT feature points in Qi and Qj match feature points robustly (RANSAC)(ii) match features across multiple images, construct tracks.(2) Estimate P1 PN and 3D position for each track X1 XN (i ) select one pair of image Q1

23、, Q2 (well-conditioned). Let T12 = their associate overlapping track;(ii) Estimate K1 and K2, compute P1 , P2 and 3D position of T12 from fundamental matrix. Then non-linearly minimize reprojection errors (LM).(iii) incrementally add new camera Pk into the system, estimate initial value by DLT, then

24、 non-linearly optimize the system. (iv) repeat (iii) until all the cameras are estimated.Replaces with more robust non-linear optimizationTired?Recall the camera calibration algorithmObjectiveGiven n4, 3D to 2D point correspondences Xixi, determine PAlgorithmLinear solution: Normalization: DLTMinimi

25、zation of geometric error: Iteratively optimization (Levenberg-Marquardt):(i)Denormalization:iiUXXiiTxxUPTP-1We are lucky! 1st time huge amount of visual data is easily accessible. High-level description of these data also become available. How do we explore them? Analysis them? Wisely use them?What

26、s the contribution of this paper?How to extract high-level information?- Computer Vision, Machine Learning Tools. Structure from motion, and more computer vision tools reach a certain robust point for graphics application.- InternetImage search- Human Labelgame with purposeWhat is the space of all t

27、he pictures?in the pastpresentthe future?Whats the space of all the videos?in the pastpresentthe future?What else?Using Search Engine?Using human computation power?Using human computation power?Using human computation power?What else?What else?Book:“Multiple View Geometry in Computer Vision” Hartley and ZissermanOnline Tutorial:http:/www.cs.unc.edu/marc/tutorial.pdfhttp:/www.cs.unc.edu/marc/tutorial/Matlab Toolbox:http:/homepages.inf.ed.ac.uk/rbf/CVonline/LOCAL_COPIES/TORR1/index.html结束结束

展开阅读全文
相关资源
相关搜索

当前位置:首页 > 教育专区 > 教案示例

本站为文档C TO C交易模式,本站只提供存储空间、用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。本站仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知得利文库网,我们立即给予删除!客服QQ:136780468 微信:18945177775 电话:18904686070

工信部备案号:黑ICP备15003705号-8 |  经营许可证:黑B2-20190332号 |   黑公网安备:91230400333293403D

© 2020-2023 www.deliwenku.com 得利文库. All Rights Reserved 黑龙江转换宝科技有限公司 

黑龙江省互联网违法和不良信息举报
举报电话:0468-3380021 邮箱:hgswwxb@163.com