How Machines Work in Healthcare.pdf-得力文库

资源描述

《How Machines Work in Healthcare.pdf》由会员分享，可在线阅读，更多相关《How Machines Work in Healthcare.pdf（7页珍藏版）》请在得力文库 - 分享文档赚钱的网站上搜索。

1、 IQVIA Real-World Insights How macHineS learn in HealtHcare Machine learning is transforming every facet of healthcare,as computer systems are being taught how to use Big Data to derive insights and support decision making.In this respect,teaching a computer,no less than teaching a child,is to“shape

2、 the future.”Educating a computer is a surprisingly labor-intensive process,requiring massive amounts of data,a nuanced understanding of every data element from every data source,years of trial and error,and extensive domain expertise.The key differentiator in machine learning is not the specific te

3、chnology and science applied;it is in the volume and quality of the instructional material and the knowledge of the instructor.22aDVances in DaTa scienceanDRei sToica,PhD Vice President Systems Development IQVIA Today,were surrounded by computing systems that can learn from experience and handle new

4、 situations.Behind our internet searches,spam filters,online music curation,and virtual assistants,computers are studying away,becoming“smarter”with each interaction we have with them.Machine learning is destined to accelerate the pace of healthcare transformation,as it allows us to extract meaning

5、from otherwise insurmountable volumes of data.It is proving valuable in supporting research and development,identifying populations at risk,improving diagnostics,providing clinical decision support and optimizing sales and marketing.A little understood fact is that a machine learns in much the same

6、way as humans.The ingredients are a scientific model(from simple rules to complex algorithms),information,and a knowledgeable teacher(the domain expert).When these elements come together in the right way,machines are able to perform high-volume automation,recognize patterns,spot anomalies,provide li

7、nkages,offer recommendations,run simulations and make predictions about future outcomes with great reliability.IQVIA was a Big Data company long before the term“Big Data”was coined.Thirty years ago,we had more data than we could effectively move over the internet and had to use elaborate“sneakernets

8、”to transport data.Today,we are in a similar position with machine learning.The term has been over-hyped by vendors that have limited experience with healthcare data.This article is the first in a series examining what it takes to do machine learning in healthcare based on the knowledge of experts t

9、hat have been applying these models and algorithms for decades.We define the entire process from data processing to analytics and the intrinsic interdependencies between the various stages underpinning the quality of results.This article provides additional details about data processing as the found

10、ation for analytics.GooD DaTa hYGiene Sometimes,answering healthcare business questions calls for data of great breadth.Other times,for data of great depth.But in most cases,and especially for business critical decisions,the data must be clean.Thats why most data mining systems that claim they work

11、on“dirty data”have in fact an intensive data-cleansing step prior to data processing.Its worth reviewing the three basic steps involved in data cleansing and processing:bridging,coding and linking.These steps not only prepare the data,but they are the foundation for quality machine learning in proce

12、ssing and analytics stages.All healthcare records contain multiple references to entities(such as diagnoses,products,physicians,procedures,outlets and companies,etc.)And there can be thousands of attributes linked to each entity.For example,a medical encounter can have hundreds of attributes,includi

13、ng details on the procedures,imaging,notes,etc.In some cases,there are standard codes by which these entities can be referenced,such as the National Drug Code(NDC),a universal product identifier for human drugs in the U.S.Where standard codes such as this exist,they must be assigned to the entity in

14、 the data record and subsequently validated.This assignment is called bridging.If the entity does not have a standard code,a unique one must be created as a reference in a process called coding.To prepare data for bridging and coding,simple rules are first introduced to the computer.For example,one

15、basic rule might be to remove all extra blank spaces in names.Machine learning requires human,healthcare knowledgeMachine learning is destined to accelerate the pace of healthcare transformation,as it allows us to extract meaning from otherwise insurmountable volumes of datacontinued on next pageAcc

16、essPoint Volume 7 Issue 14 23ADVANCeS IN DATA SCIeNCeThen,with greater exposure to more data and situations,complex machine-learning algorithms(such as neural networks,score engines,Ngrams,random forests,Bayesian networks,genetic algorithms and many others)begin reasoning about data attributes in ev

17、ery individual data stream.At this stage,machines can differentiate,for example,between a doctor who has just changed her name after marriage vs.another doctor with the same last name that just graduated medical school and started practicing in the same city.Experience shows that complex machine-lea

18、rning inferences used to bridge and code data on pharmaceutical packaging and dosage,doctor addresses,distribution outlets and volumes,patients and medical procedures and hundreds of other attributes must be highly specific even down to the individual data stream or data supplier.Achieving this leve

19、l of specificity requires deep,detailed knowledge of the field.only someone with a history of working with a given supplier would know,for instance,that promotions for an individual pharmacy store are coded as one record while promotions at the chain-store level are coded as multiple transactions.Th

20、is example is just one simple rule that IQVIAs massive machine-learning infrastructure has learned over time.Currently there are hundreds to,in some cases,thousands of rules for each of our 800,000 data suppliers around the world.Once an entity in a record is properly bridged or coded,it can be link

21、ed to records in other data sets to allow for cross-referencing.Billions of healthcare records are generated and exchanged between different parties in the healthcare industry each year,and accurate bridging and coding must take place with every data exchange to maintain the quality and usefulness o

22、f the data.“Dirty”data can have significant implications for the decisions that pharmaceutical companies make from research and development to commercial planning.The privacy laws of different countries add complexity to the challenge of linking records.Where laws allow de-identified data to be coll

23、ected,the de-identification algorithms have to be both secure(irreversible)as well as versatile to allow linkage for longitudinal patient analytics for example.For de-identification,homogenous systems are key to data security and linkage.It is less likely that data de-identified with different stand

24、ards can be linked,or if it is,that it will not affect the security of the data.a case in PoinTMachine learning is changing healthcare in real time.IQVIA built a decision-support system using machine learning to help sponsors manage physician selection in clinical trials a task fundamental to trial

25、success.experts in the therapeutic area defined multi-dimensional models to express all of the study protocol details.Data scientists with complementary expertise worked as part of the same team to define matching multi-dimensional models to express all physician prescribing/treatment patterns and h

26、istory.Through deep learning,the system was trained on eight petabytes of clean(bridged,coded and linked)claims and electronic medical records(eMR)data.The result was a prioritized list of investigators with the highest probability of success.This resulted in a double-digit percentage decrease in no

27、n-enrolling investigators and,at the same time,a double-digit percentage increase in patient enrollment for rheumatoid arthritis studies.Since both physician behavior and study protocols can be very complex,it is incumbent upon domain experts to understand how to model this complexity.The accuracy o

28、f the results depends on training,and that is directly influenced by the quality of the data.Machine learning depends on the availability of clean data for training and constant supervision from domain expert operators who tune the results.IQVIA Real-World Insights 24For this reason,IQVIA has global

29、 standards for de-identification that are constantly reviewed by security experts and designed to support linkage where permitted by law.Machine learning plays a crucial role in identifying patterns that weaken the security and in cleaning the data prior to de-identification.The onGoinG human Role E

30、ven after a machine-learning system is mature and has been successfully processing billions or trillions of transactions every year as our systems have been doing for the past 60 years day-to-day human tasks are still critical to data operations and data mining.Domain experts must work alongside the

31、 machine,overseeing,and sometimes correcting,its work.Healthcare is continuously evolving,and data structures are volatile;every day there are variances with unaccounted scenarios that the machine has not yet learned.Imagine that the machine learning does 99 percent of the work but misses identifyin

32、g one new pharmaceutical product in the market.In that case,domain experts must teach the machine about that product so that it now can be automatically processed.In the absence of domain experts who sample data and direct the machine when confidence levels are low,we find that machine learning qual

33、ity degrades anywhere from 0.1 to 10 percent every month,depending on the data source.even at 0.1 percent level,a machine operating for a year without a domain expert will eventually produce outputs at unacceptable quality levels.For instance,IQVIA operates a machine-learning algorithm in one set of

34、 product data that processes seven million records and learns 300,000 new product reference keys each month.About half of the new records undergo some human-assisted quality control operations.machine leaRninG+DaTa+Domain exPeRTise=imPacTful analYTics Clean,linked and attributes-rich Big Data repres

35、ent the foundation for quality and impactful analytics.every stakeholder within healthcare(pharma,payers and providers,etc.)along every dimension(countries,languages,suppliers,data type),and for every specific use case(therapy,clinical,commercial,R&D,etc.)requires its own machine-learning algorithms

36、 and specific configurations.These can only be developed over time by constantly building upon an ever-growing knowledge base.Analytics are divided into three categories descriptive,predictive and prescriptive.Descriptive analytics summarize,describe,quantify and analyze the past trends of an activi

37、ty.Predictive analytics use past behavior to train machine-learning algorithms to predict possible future trends.Prescriptive analytics look at different possible outcomes in the future,why they can happen,how to navigate them and the impact of possible decisions.Prescriptive analytics are used in i

38、ntelligent and decision-support systems.All records now standard format and can be stackedMachine mines recordsfor advanced analyticsMachine to bridgeand code recordsProduct RecordsRecords checked byhuman teacherfigure 1:Humans and machines Work in Tandem from Data Processing to Advanced Analytics A

39、ccessPoint Volume 7 Issue 1425continued on next pageADVANCeS IN DATA SCIeNCeIn both predictive and prescriptive analytics,the first phase is modeling.A team of domain experts including clinicians and data scientists analyze the problem and the available data and select the machine-learning algorithm

40、 that will have the highest success rate.This is a“human only”intensive process that requires great expertise in clinical practices,data,computer science and machine learning algorithms,and region-specific healthcare delivery and application processes.At this point we have a capable computer system

41、(a machine)with the right“instructions”to learn.If we go back to our initial analogy,we have a capable“genius baby”with no knowledge just capacity to learn.There are supervised or unsupervised learning methods,but they all depend on large amounts of data already categorized by human operators;“teach

42、ers,”or domain experts,that must tell a computer when it is right or wrong.With every choice that the human makes,the computer“learns”and will be able to apply that decision process in the next similar instance.This iterative process can be set so that experts review only the machines choices that f

43、all outside of certain confidence limits.or,it can be set so that the machine recommends a choice for the experts approval.There is a basic data volume threshold that must be met for the process to work at all.If too little data are used,the system will simply not have enough information to train th

44、e model.After that threshold is met,the more data available to represent every situation with enough examples,the better the machine can be taught.Just how much data are required depends on the situation.When patterns repeat infrequently,it may take several years worth of data to train a computer.Wh

45、en data are clean and bear standard reference codes,smaller amounts are needed than when the data are unstructured.Because data and the practice of healthcare are constantly changing,machine learning demands constant review of core algorithms and settings.only domain expert teams with detailed knowl

46、edge of the data,science and healthcare can determine when and how to refine the model.Such permanent day-to-day interaction with the machine and regular supervision from domain experts is resource intensive,but absolutely essential for the performance of the computer systems.Large scale data proces

47、sing using machine learning is the strong foundation required to build a meaningful data repository,train machines,and constantly refine domain knowledge expertise.Building Blocks for Machine Learning in HealthcareQUALITY AND VOLUME OF DATABRIDGING&CODINGPRESCRIPTIVEANALYTICSPREDICTIVEANALYTICSAUTOM

48、ATIONDESCRIPTIVE ANALYTICS QUALITY AND VOLUME OF DATA SPECIFICITY OF ALGORITHMS DOMAIN KNOWLEDGEfigure 2:Building Blocks for machine learning in HealthcareOnly domain expert teams with detailed knowledge of the data,science and healthcare can determine when and how to refine the model IQVIA Real-Wor

49、ld Insights 26With this foundation,now its possible to perform the next generation of clinical decision support analytics including use cases such as non-adherence,disease progression and care management,safety signal detection and evaluation,therapy dosing and response,or using the digital footprin

50、t of diagnosed patients to find new,undiagnosed patients.Machine learning models such as vectorization,natural language processing term extraction,ensemble suites of algorithms using deep-learning and many others can return high-quality results for these and many other use cases in real world eviden

展开阅读全文