数据处理、光谱分析 与数据挖掘.ppt

上传人:创****公 文档编号:1726785 上传时间:2019-10-23 格式:PPT 页数:63 大小:9.29MB
返回 下载 相关 举报
数据处理、光谱分析 与数据挖掘.ppt_第1页
第1页 / 共63页
数据处理、光谱分析 与数据挖掘.ppt_第2页
第2页 / 共63页
点击查看更多>>
资源描述

《数据处理、光谱分析 与数据挖掘.ppt》由会员分享,可在线阅读,更多相关《数据处理、光谱分析 与数据挖掘.ppt(63页珍藏版)》请在得力文库 - 分享文档赚钱的网站上搜索。

1、基因序列的比对、挖掘和功能分析,邹权 (PH.D.&Professor)天津大学 计算机科学与技术学院 2017.10,Sequence alignmentAlgorithmParallelIdentification and miningmicroRNAmachine learning related worksFunction predictionmiRNA disease relationshipcrops yield related genes,Outline,Multiple Sequence Alignment(MSA) VS BLAST,Multiple Sequence Ali

2、gnment(MSA): What & Where,Multiple Sequence Alignment,Multiple DNA Sequence Alignment,Multiple Similar DNA Sequence Alignment,Our Focus,Phylogenetic tree,Virus sequences,Population SNV calling,Application,Techniques for similar DNA MSA,1. k-band Dynamic Programming,-1,-1,-4,-5,0,K-band,How to set k

3、for k-band?,Greedy search with suffix tree,T=GTCCTGAAGCTCCGT 1234567890123456,S=GTCCGAAGCTCCGG,(1,1,4),(5,6,9),2. Center star strategy,Techniques for similar DNA MSA,S1,S2,S3,S4,S5,S1,S2,S3,S4,S5,tree alignment,Center star strategy,sum up,update,final result,Extreme MSA for Very Similar DNA Sequence

4、s,Experiments,100 human mitochondria genome sequences16k length (1555KB),Our output 1558KBClustal 1627KB,Time cost of every steps,Sequence alignmentAlgorithmParallelIdentification and miningmicroRNAmachine learning related worksFunction predictionmiRNA disease relationshipcrops yield related genes,O

5、utline,Multiple sequence alignment in Hadoop,Multiple sequence alignment in Spark,Running time of different software tools on mtDNA datasets,Running time with HPTree on 16S rRNA datasets,Comparison with CPUs-based and Spark-based,CPUs-based MSA can only address small datasets ( 10% memory size) slow

6、ly.GPUs-based MSA can address small datasets in shorter time than the former.Spark-based MSA can address ultra-large datasets in acceptable time.,Memory Limit Exceeded,Running time (sec),Software,http:/ Web Server,Step 1:After you click the link(http:/ as shown in above, you will see the HAlign web

7、server.,2. Web Server,Step 2:After you submit your experiment task successfully, wait a second, you will see the results.,2. Web Server,Step 3:Now, you can visit your multiple sequences alignment results visualization by click View link.,2. Web Server,Step 4:Now, you can visit your phylogenetic tree

8、 visualization by click Generate link.,Quan Zou, Qinghua Hu, Maozu Guo, Guohua Wang. HAlign: Fast Multiple Similar DNA/RNA Sequence Alignment Based on the Centre Star Strategy. Bioinformatics. 2015,31(15): 2475-2481Xi Chen, Chen Wang, Shanjiang Tang, Ce Yu, Quan Zou. CMSA: A heterogeneous CPU/GPU co

9、mputing system for multiple similar RNA/DNA sequence alignment. BMC Bioinformatics. 2017, 18: 315Shixiang Wan, Quan Zou*. HAlign-II: efficient ultra-large multiple sequence alignment and phylogenetic tree reconstruction with distributed and parallel computing. Algorithms for Molecular Biology. 2017,

10、12: 25Wenhe Su, Quan Zou, etc. MASC: A Linear Method for Multiple Nucleotide Sequence Alignment on Spark Parallel Framework. Journal of Computational Biology. Accepted,References on MSA,Sequence alignmentAlgorithmParallelIdentification and miningmicroRNAmachine learning related worksFunction predict

11、ionmiRNA disease relationshipcrops yield related genes,Outline,Identification of microRNA,AUCGUGCAGAGACUAGACUGACAUCGUGCAGAGACUAGACUGACAUCGUGCAGAGACUAGACUGACAUCGUGCAGAGACUAGACUGACAUCGUGCAGAGACUAGACUGAC,1tgcgcgaauucacccauggauccauucaucuuccaagggcaccagc2agcgcgaauuccaagucacccauggauccauucaucuggcagcgu3agucg

12、cgaauucaucaucuuccaagggcacccauggauccaucca,microRNA prediction based on machine learning,obvious differences,weak generalization,33,100nt,100nt,Parameter Filter,Prediction Model,Extend,Compute Secondary Structures,Extract,Human CDs,Human Mature microRNAs,Blast,Mature-like Reads,Original NegativeSet,Mi

13、ned Sequences,Rebuilt,Replace,innovation point,34,microRNA family identification,2019/10/23,36/30,http:/ miRNA found by our method,1,37/30,Dinoflagellates genome (甲藻),Lin, et al. The Symbiodinium kawagutii genome illuminates dinoflagellate gene expression and coral symbiosis. Science. 2015, 350(6261

14、): 691-694.,Sequence alignmentAlgorithmParallelIdentification and miningmicroRNAmachine learning related worksFunction predictionmiRNA disease relationshipcrops yield related genes,Outline,Machine learning frame in gene identification,-0.12972021-0.102671220.05165671-0.02537533-0.023275810.01257873-

15、0.04431615-0.037938240.00783558-0.09035013-0.04484774-0.02480496-0.01150325-0.024003250.03616526-0.13563429-0.15971042-0.00528393,-0.02425524-0.050296270.0067438-0.04724623-0.081165380.039152870.05580992-0.02495753-0.054907530.03615180.04706983-0.098071230.104478040.099174030.078162870.112675660.060

16、60866-0.01122177,-0.12972021-0.10267122-0.02537533-0.02327581-0.04431615-0.03793824-0.09035013-0.04484774-0.01150325-0.02400325-0.13563429-0.15971042,-0.34972021-0.10267784-0.02537533-0.02356713-0.57316152-0.43227931-0.09881432-0.09100432-0.23156745-0.07830325-0.13563472-0.15957833,Ensemble learning

17、: Make weak classifiers to strong one,ClassificationResult,Combine to form theFinal strong classifier,h1( ) h2()h3( ) h4( ) h5( ) h6() h7(),Ensemble learning for Class Imbalance Problem,http:/ in Bioinformatics,DNA Binding proteinsLi Song, Dapeng Li, Xiangxiang Zeng, Yunfeng Wu, Li Guo*,Quan Zou*. n

18、DNA-prot: Identification of DNA-binding Proteins Based on Unbalanced Classification.BMC Bioinformatics. 2014, 15:298. tRNAQuan Zou, et al. Improving tRNAscan-SE annotation results via ensemble classifiers.Molecular Informatics. 2015,34(11-12):761-770miRNALeyi Wei, Minghong Liao, Yue Gao, Rongrong Ji

19、, Zengyou He*,Quan Zou*. Improved and Promising Identification of Human MicroRNAs by Incorporating a High-quality Negative Set.IEEE/ACM Transactions on Computational Biology and Bioinformatics. 2014, 11(1):192-201circleRNAXiangxiang Zeng, Wei Lin, Maozu Guo,Quan Zou*. A comprehensive overview and ev

20、aluation of circular RNA detection tools.PLoS Computational Biology. 2017,13(6): e1005420,2019/10/23,利用邹权副教授提出的集成学习方法,,Leyi Wei, Minghong Liao, Yue Gao, Rongrong Ji, Zengyou He*,Quan Zou*. Improved and Promising Identification of Human MicroRNAs by Incorporating a High-quality Negative Set.IEEE/ACM

21、Transactions on Computational Biology and Bioinformatics. 2014, 11(1):192-201Quan Zou*, Yaozong Mao, Lingling Hu, Yunfeng Wu, Zhiliang Ji*. miRClassify: An advanced web server for miRNA family classification and annotation.Computers in Biology and Medicine. 2014, 45:157-160Chen Lin, Wenqiang Chen, C

22、heng Qiu, Yunfeng Wu, Sridhar Krishnan,Quan Zou*. LibD3C: Ensemble Classifiers with a Clustering and Dynamic Selection Strategy.Neurocomputing. 2014,123:424-435.Quan Zou, Jiancang Zeng, Liujuan Cao, Rongrong Ji. A Novel Features Ranking Metric with Application to Scalable Visual and Bioinformatics D

23、ata Classification.Neurocomputing. 2016, 173:346-354,References,Sequence alignmentAlgorithmParallelIdentification and miningmicroRNAmachine learning related worksFunction predictionmiRNA disease relationshipcrops yield related genes,Outline,52,Similarity between two microRNAs,(A),(B),(C),targets of

24、miR1,targets of miR1,targets of miR1,targets of miR2,targets of miR2,targets of miR2,Quan Zou, et al. Similarity computation strategies in the microRNA-disease network: A Survey. Briefings in Functional Genomics. 2016, 15(1): 55-64.,53,Wei Tang, Zhijun Liao, Quan Zou*. Which statistical significance

25、 test best detects oncomiRNAs in cancer tissues? An exploratory analysis. Oncotarget. DOI: 10.18632/oncotarget.12828 .,http:/ Origin Detection,Sequence alignmentAlgorithmParallelIdentification and miningmicroRNAmachine learning related worksFunction predictionmiRNA disease relationshipcrops yield re

26、lated genes,Outline,http:/ Zeng, Xuan Zhang,Quan Zou*. Integrative approaches for predicting microRNA function and prioritizing disease-related microRNA using biological interaction networks.Briefings in Bioinformatics. 2016,17(2):193-203.Yuansheng Liu, Xiangxiang Zeng, Zengyou He*,Quan Zou*. Inferr

27、ing microRNA-disease associations by random walk on a heterogeneous network with multiple data sources.IEEE/ACM Transactions on Computational Biology and Bioinformatics. 2017, 14(4): 905-915Wei Tang, Zhijun Liao,Quan Zou*. Which statistical significance test best detects oncomiRNAs in cancer tissues? An exploratory analysis.Oncotarget. 2016, 7(51):85613-85623Wei Tang, Shixiang Wan, Zhen Yang, Andrew E. Teschendorff*,Quan Zou*. Tumor Origin Detection with Tissue-Specific miRNA and DNA methylation Markers.Bioinformatics. Doi: 10.1093/bioinformatics/btx622,References,

展开阅读全文
相关资源
相关搜索

当前位置:首页 > pptx模板 > 校园应用

本站为文档C TO C交易模式,本站只提供存储空间、用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。本站仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知得利文库网,我们立即给予删除!客服QQ:136780468 微信:18945177775 电话:18904686070

工信部备案号:黑ICP备15003705号-8 |  经营许可证:黑B2-20190332号 |   黑公网安备:91230400333293403D

© 2020-2023 www.deliwenku.com 得利文库. All Rights Reserved 黑龙江转换宝科技有限公司 

黑龙江省互联网违法和不良信息举报
举报电话:0468-3380021 邮箱:hgswwxb@163.com