Sparks+of+Artificial+General+Intelligence-+Early+experiments+with+GPT-4.pdf-得力文库

资源描述

《Sparks+of+Artificial+General+Intelligence-+Early+experiments+with+GPT-4.pdf》由会员分享，可在线阅读，更多相关《Sparks+of+Artificial+General+Intelligence-+Early+experiments+with+GPT-4.pdf（155页珍藏版）》请在得力文库 - 分享文档赚钱的网站上搜索。

1、Sparks of Artificial General Intelligence:Early experiments with GPT-4S ebastien BubeckVarun ChandrasekaranRonen EldanJohannes GehrkeEric HorvitzEce KamarPeter LeeYin Tat LeeYuanzhi LiScott LundbergHarsha NoriHamid PalangiMarco Tulio RibeiroYi ZhangMicrosoft ResearchAbstractArtificial intelligence(A

2、I)researchers have been developing and refining large language models(LLMs)that exhibit remarkable capabilities across a variety of domains and tasks,challenging our understandingof learning and cognition.The latest model developed by OpenAI,GPT-4 Ope23,was trained using anunprecedented scale of com

3、pute and data.In this paper,we report on our investigation of an early versionof GPT-4,when it was still in active development by OpenAI.We contend that(this early version of)GPT-4 is part of a new cohort of LLMs(along with ChatGPT and Googles PaLM for example)that exhibitmore general intelligence t

4、han previous AI models.We discuss the rising capabilities and implications ofthese models.We demonstrate that,beyond its mastery of language,GPT-4 can solve novel and difficulttasks that span mathematics,coding,vision,medicine,law,psychology and more,without needing anyspecial prompting.Moreover,in

5、all of these tasks,GPT-4s performance is strikingly close to human-levelperformance,and often vastly surpasses prior models such as ChatGPT.Given the breadth and depth ofGPT-4s capabilities,we believe that it could reasonably be viewed as an early(yet still incomplete)versionof an artificial general

6、 intelligence(AGI)system.In our exploration of GPT-4,we put special emphasison discovering its limitations,and we discuss the challenges ahead for advancing towards deeper and morecomprehensive versions of AGI,including the possible need for pursuing a new paradigm that moves beyondnext-word predict

7、ion.We conclude with reflections on societal influences of the recent technological leap andfuture research directions.Contents1Introduction41.1Our approach to studying GPT-4s intelligence.71.2Organization of our demonstration.82Multimodal and interdisciplinary composition132.1Integrative ability.13

8、2.2Vision.162.2.1Image generation beyond memorization.162.2.2Image generation following detailed instructions(a la Dall-E).172.2.3Possible application in sketch generation.182.3Music.193Coding213.1From instructions to code.213.1.1Coding challenges.213.1.2Real world scenarios.223.2Understanding exist

9、ing code.261arXiv:2303.12712v1 cs.CL 22 Mar 20234Mathematical abilities304.1A mathematical conversation with GPT-4.314.1.1A first generalization of the original question.314.1.2A second variant of the original question.324.1.3Analysis of the limitations highlighted by conversation.344.2Performance o

10、n mathematical problem datasets.354.3Mathematical modeling in various domains.374.4Higher level mathematics.395Interaction with the world435.1Tool use.435.1.1Using multiple tools to solve more complex tasks.445.1.2Discussion.495.2Embodied Interaction.495.2.1Warmup:navigating a map.495.2.2Text-based

11、games.495.2.3Real world problems.525.2.4Discussion.536Interaction with humans546.1Understanding Humans:Theory of Mind.546.1.1Testing specific aspects of theory of mind.546.1.2Testing theory of mind in realistic scenarios.546.1.3Discussion.606.2Talking to Humans:Explainability.607Discriminative Capab

12、ilities697.1PII Detection.697.2Misconceptions and Fact-Checking.707.2.1Why Are Current Metrics Insufficient?.717.2.2GPT-4 as a Judge.738Limitations of autoregressive architecture highlighted by GPT-4768.1Warm-up with two basic examples.768.2Lack of planning in arithmetic/reasoning problems.778.3Lack

13、 of planning in text generation.789Societal influences829.1Challenges of erroneous generations.829.2Misinformation and manipulation.839.3Bias.869.4Human expertise,jobs,and economics.899.5Constellation of influences and considerations.9010 Directions and Conclusions9210.1 Definitions of intelligence,

14、AI,and AGI.9210.2 On the path to more general artificial intelligence.9310.3 What is actually happening?.94A GPT-4 has common sense grounding101B Appendix for multimodal and interdisciplinary composition105B.1Further details on integrative ability results.105B.2Further details on vision results.108B

15、.3Graphic novel design example.1102C Appendix for the Coding section111C.1 Measuring human performance on LeetCode.111C.2 Example of GPT-4 visualizing IMDb data.112C.3 More examples on visualization.115C.4 Example for 2D HTML game development.116C.5 Example for graphical user interface programming.1

16、16C.6 Example for reverse engineering.119C.7 Testing GPT-4s ability to execute(pseudo)code.121D Additional examples for mathematical reasoning122D.1 Limitations.122D.2 Further examples.126D.3 Generating math problems with GPT-4.138D.4 Mitigating calculation errors via external code execution.139E Ad

17、ditional Interpretability Examples141E.1Explanation Agent Mismatches.141F Additional examples for interaction with the world144F.1Interact with tools.144F.2Examples for interaction with environments.1493Something unknown is doing we dont know what.Sir Arthur Eddington1IntroductionIntelligence is a m

18、ultifaceted and elusive concept that has long challenged psychologists,philosophers,andcomputer scientists.An attempt to capture its essence was made in 1994 by a group of 52 psychologistswho signed onto a broad definition published in an editorial about the science of intelligence Got97.Theconsensu

19、s group defined intelligence as a very general mental capability that,among other things,involves theability to reason,plan,solve problems,think abstractly,comprehend complex ideas,learn quickly and learnfrom experience.This definition implies that intelligence is not limited to a specific domain or

20、 task,but ratherencompasses a broad range of cognitive skills and abilities.Building an artificial system that exhibits the kindof general intelligence captured by the 1994 consensus definition is a long-standing and ambitious goal of AIresearch.In early writings,the founders of the modern disciplin

21、e of artificial intelligence(AI)research calledout sets of aspirational goals for understanding intelligence MMRS06.Over decades,AI researchers havepursued principles of intelligence,including generalizable mechanisms for reasoning(e.g.,NSS59,LBFL93)and construction of knowledge bases containing lar

22、ge corpora of commonsense knowledge Len95.However,many of the more recent successes in AI research can be described as being narrowly focused on well-definedtasks and challenges,such as playing chess or Go,which were mastered by AI systems in 1996 and 2016,respectively.In the late-1990s and into the

23、 2000s,there were increasing calls for developing more generalAI systems(e.g.,SBD+96)and scholarship in the field has sought to identify principles that might underlymore generally intelligent systems(e.g.,Leg08,GHT15).The phrase,“artificial general intelligence”(AGI),was popularized in the early-20

24、00s(see Goe14)to emphasize the aspiration of moving from the“narrowAI”,as demonstrated in the focused,real-world applications being developed,to broader notions of intelli-gence,harkening back to the long-term aspirations and dreams of earlier AI research.We use AGI to referto systems that demonstra

25、te broad capabilities of intelligence as captured in the 1994 definition above,withthe additional requirement,perhaps implicit in the work of the consensus group,that these capabilities areat or above human-level.We note however that there is no single definition of AGI that is broadly accepted,and

26、we discuss other definitions in the conclusion section.The most remarkable breakthrough in AI research of the last few years has been the advancement ofnatural language processing achieved by large language models(LLMs).These neural network models arebased on the Transformer architecture VSP+17 and

27、trained on massive corpora of web-text data,using at itscore a self-supervised objective of predicting the next word in a partial sentence.In this paper,we report onevidence that a new LLM developed by OpenAI,which is an early and non-multimodal version of GPT-4Ope23,exhibits many traits of intellig

28、ence,according to the 1994 definition.Despite being purely a languagemodel,this early version of GPT-4 demonstrates remarkable capabilities on a variety of domains and tasks,including abstraction,comprehension,vision,coding,mathematics,medicine,law,understanding of humanmotives and emotions,and more

29、.We interacted with GPT-4 during its early development by OpenAI usingpurely natural language queries(prompts)1.In Figure 1.1,we display some preliminary examples of outputsfrom GPT-4,asking it to write a proof of infinitude of primes in the form of a poem,to draw a unicorn inTiKZ(a language for cre

30、ating graphics in LATEX),to create a complex animation in Python,and to solvea high-school level mathematical problem.It easily succeeds at all these tasks,and produces outputs thatare essentially indistinguishable from(or even better than)what humans could produce.We also compareGPT-4s performance

31、to those of previous LLMs,most notably ChatGPT,which is a fine-tuned version of(animproved)GPT-3 BMR+20.In Figure 1.2,we display the results of asking ChatGPT for both the infini-tude of primes poem and the TikZ unicorn drawing.While the system performs non-trivially on both tasks,there is no compar

32、ison with the outputs from GPT-4.These preliminary observations will repeat themselvesthroughout the paper,on a great variety of tasks.The combination of the generality of GPT-4s capabilities,with numerous abilities spanning a broad swath of domains,and its performance on a wide spectrum of tasksat

33、or beyond human-level,makes us comfortable with saying that GPT-4 is a significant step towards AGI.1As GPT-4 s development continued after our experiments,one should expect different responses from the final version of GPT-4.In particular,all quantitative results should be viewed as estimates of th

34、e models potential,rather than definitive numbers.Werepeat this caveat throughout the paper to clarify that the experience on the deployed model may differ.Moreover we emphasizethat the version we tested was text-only for inputs,but for simplicity we refer to it as GPT-4 too.4GPT-4Prompt:Can you wri

35、te a proof that there are infinitelymany primes,with every line that rhymes?GPT-4:Yes,I think I can,Though it might take a clever plan.Ill start by noting Euclids proof,Which shows that primes arent just aloof.Assume we have a finite list,Of primes,and that none have been missed.Multiply them all to

36、gether,And add one,just to be clever.The result will be a number,That has no prime factors,oh wonder!But every number has a prime,That divides it,its just a matter of time.So weve found a contradiction,And our finite list needs eviction.There must be infinitely many primes,And thats the end of my rh

37、yming lines.sPrompt:Draw a unicorn in TiKZ.GPT-4:ProducesLATEXcompilingtofollowing picture.Prompt:Can you generate a pyplot for the following data:x=1,3,5,6,8,y1=2,3,5,18,1,y2=3,5,6,8,1,y3=5,1,2,3,4,y4=9,7,2,3,1.I want y1,y2 to be in the same plot,but y3 is in another plot next to that plot,y4 is in

38、below.I want the legend of y1 to be”bob”,y2 to be”alice”,y3 to be”bilbo”,y4 to be”allie”.I want the x-axis to belabeled with”time”and y axis to be labeled with”money”.I want to add a 10%-40%random error bar to each curve,through all times(including non-integers).I want smoothed curves to show the pl

39、ot,and smoothed error bar.Do notuse linear interpolation,use smooth interpolation!I want to also add some small zig-zag to the smoothed curve to makeit look more real.I want to put a baseline as the mean of each line.I want to put a pie chart below indicating thefraction of the four people in each t

40、ime step.I also want an animation to show how the fractions are changing in thepie chart in*continuous time*.Interpolate the missing fractions!I also want vertical line animation in other threeplots to match the pie chart.I want the fanciest plot.Please add as many fancy things as possible.Here is a

41、 possible code to generate a pyplot for the following data,with some comments to explain the steps:GPT-4:We give two snapshots from the animation resulting from the produced codePrompt:Suppose g(x)=f1(x),g(0)=5,g(4)=7,g(3)=2,g(7)=9,g(9)=6 what is f(f(f(6)?GPT-4:There are different ways to approach t

42、his problem,but one possible method is to use the fact that g(x)=f1(x)implies that f(g(x)=x for any x in the domain of g.That is,applying f to both sides of the equationg(x)=f1(x)gives f(g(x)=f(f1(x)=x.This means that we can undo the effect of f by applying g,and viceversa.Using this idea,we can try

43、 to find f(6)by looking for the value of x such that g(x)=6.From the giveninformation,we see that g(9)=6,so f(6)=f(g(9)=9.Similarly,we can find f(9)by looking for the value of x suchthat g(x)=9.We see that g(7)=9,so f(9)=f(g(7)=7.Finally,we can find f(7)by looking for the value of x suchthat g(x)=7.

44、We see that g(4)=7,so f(7)=f(g(4)=4.Therefore,f(f(f(6)=f(f(9)=f(7)=4.Figure 1.1:Preliminary examples of GPT-4s capabilities in language,vision,coding,and mathe-matics.5ChatGPTPrompt:Can you write a proof that there are infinitelymany primes,with every line that rhymes?ChatGPT:Sure!Here is a proof th

45、at there are infinitely many primes,with each line of the proof rhyming:Consider the set of all primes.Call this set P and let it start with just two,the firstprime.If we take all the primes in P and add one,We get a new set of numbers that has never been done.This set,call it S,is composed of only

46、composite numbers.No primes will be found among them,thats for sure.But we know that there must be at least one prime,Since S is infinite,and that much is clear.If we take the smallest number in S that is not in P,And call it p,we can add it to our set,dont you see?But this process can be repeated i

47、ndefinitely.Thus,our set P must also be infinite,youll agree.So there you have it,a proof thats quite solid.There are infinitely many primes,that much we can see.No need to be coy,no need to be contrite.Infinity is a big number,just like P.Prompt:Draw a unicorn in TiKZ.ChatGPT:Produces LATEX compili

48、ngto following picture.Figure 1.2:ChatGPT output on the first two prompts from Figure 1.1.Our claim that GPT-4 represents progress towards AGI does not mean that it is perfect at what it does,orthat it comes close to being able to do anything that a human can do(which is one of the usual definition

49、ofAGI;see the conclusion section for more on this),or that it has inner motivation and goals(another key aspectin some definitions of AGI).In fact,even within the restricted context of the 1994 definition of intelligence,itis not fully clear how far GPT-4 can go along some of those axes of intellige

50、nce,e.g.,planning(see Section 8),and arguably it is entirely missing the part on“learn quickly and learn from experience”as the model is notcontinuously updating(although it can learn within a session,see Section 5 for example).Overall GPT-4still has many limitations,and biases,which we discuss in d

展开阅读全文