嵌入式系统3-嵌入式系统设计-软硬件协同划分技术.pdf

上传人:qwe****56 文档编号:71061671 上传时间:2023-01-31 格式:PDF 页数:46 大小:348.25KB
返回 下载 相关 举报
嵌入式系统3-嵌入式系统设计-软硬件协同划分技术.pdf_第1页
第1页 / 共46页
嵌入式系统3-嵌入式系统设计-软硬件协同划分技术.pdf_第2页
第2页 / 共46页
点击查看更多>>
资源描述

《嵌入式系统3-嵌入式系统设计-软硬件协同划分技术.pdf》由会员分享,可在线阅读,更多相关《嵌入式系统3-嵌入式系统设计-软硬件协同划分技术.pdf(46页珍藏版)》请在得力文库 - 分享文档赚钱的网站上搜索。

1、嵌入式系统设计嵌入式系统设计软硬件划分软硬件划分软硬件划分软硬件划分?嵌入式系统生命周期的开始阶段?主要工作在于软硬件部件之间的划分算法?关键部分的设计如果错误将带来产品的失败划分划分:软硬件的二元性软硬件的二元性?嵌入式系统中软硬件协同工作执行功能?如何划分嵌入式系统的功能由硬件部件实现还是由软件实现呢?通常根据速度和费用的要求确定比较比较最高高低灵活性低最高高费用慢快最快速度软件可重配置硬件专用硬件HW/SW HW/SW 费用分析示例费用分析示例?增加一个附加的ASIC到嵌入式系统中?开发一个ASIC并增加到嵌入式系统中$3,000,000$14,000Total cost$6.00$14

2、.00Cost per ASIC500,000 units1,000 unitsAnnual volume$5.25$525Cost per device$200,000$200,000NRE foundry charges500,000 units1,000 unitsAnnual volume$525,000$525,000Total cost$25,000$25,000Amortized tool costs$300,000$300,000Development cost(2 eng-yrs)HW/SW HW/SW 费用分析示例费用分析示例(2)(2)?软件开发费用方案?总费用500,0

3、00 units1,000 unitsAnnual volume$300,000$300,000Total cost$0.60$300Cost per device$300,000$300,000Cost to develop(2 eng-yrs)$3,525,000$300,000$539,000$300,000Total cost$6.00$0.60$14.00$300Cost per device$525,000$300,000$525,000$300,000Cost to develop500,000500,0001,0001,000VolumeHardwareSoftwareHard

4、wareSoftwareHW/SW HW/SW 费用分析示例费用分析示例(3)(3)?假设开发一个喷墨打印机,产品价格如何定位?$89$139Selling price14Printing speed(Pages per minute)Software solutionHW solution应该采用什么分区方式?应该采用什么分区方式??风险分析:ASIC一般有30%的可能性需要二次设计?$200,000+2 月软件开发很难严格按照进度进行?平均超期3个月消费类电子设备的上市期只有4个月硬件设计硬件设计?HDL带来了硬件设计的革命常见的有 Verilog 以及 VHDL类C,带有实时扩展和硬件实

5、现将硬件的设计从晶体管和走线的设计中解脱出来,变成对算法和状态机的设计?HDL设计的硬件被编译成硅芯片FAB(设计方案?)来生产芯片硅编译?一个独立的硬件设计者能够开发一个IC,这项工作在以前需要整个项目组开发很多年的!导致了新概念,新技术Systems-on-a-chip,or SOC 的出现ASIC ASIC 软硬件二元性示例软硬件二元性示例?ASIC中硬件和控制软件的设计几乎可以看做都是软件的设计?HDL编译成制作指令,交给硅晶片加工厂软件(C,C+,JAVA,Ada,Pascal)被编译成嵌入式控制代码(固件,firmware)?分区的划分是一种工程性的选择而已!?最新的ASIC,即F

6、PGA都可以动态可重配置1-逻辑“且”:C为真仅当A是真而且B是真2-C 语言:Boolean A,B,C;C=A&B;3-Gate Level HW Design4-Verilog Language Construct:regC;wireA,B;assign C=A&B;ABC+5VDCGNDC1A1B1C2A2B2C4A4B4C3A3B37408 Device另一个角度看软硬件二元性另一个角度看软硬件二元性2-C 语言语言:Boolean A,B,C;C=A&B;2-C 语言语言:Boolean A,B,C;C=A&B;C Compiler/Assembler/Linker/LoaderA

7、lgorithm implements the AND functionAlgorithm implements the AND function软件实现4-Verilog 语言语言:regC;wireA,B;assign C=A&B;4-Verilog 语言语言:regC;wireA,B;assign C=A&B;Verilog Compiler/IC Design Library/IC fabrication Hardware createdTo implement ANDHardware createdTo implement AND硬件实现另一个角度看软硬件二元性(另一个角度看软硬件二

8、元性(2 2)ASIC:(SOC)ASIC:(SOC)ASICEmbedded MicroprocessorCommerciallyavailable“devices”(IP)User designed elementsFIRMWAREAnalog I/ODigital I/O晶体管门级硬件设计晶体管门级硬件设计VHDLVHDL硬件设计硬件设计划分方法划分方法描述ControlunitDatapath描述ControlunitDatapathDatapathControlunitConuntrolitdatapath描述划分划分合成合成结构划分功能划分结构划分功能划分系统划分系统划分系统级划分

9、?将一个操作分配给软件或硬件决定了这个操作的延迟?将一个操作交给处理器以及ASIC芯片还需要附加的通讯延迟?好的分区方案 最小化通讯延迟功能分配给软件或硬件功能分配给软件或硬件系统划分系统划分?单处理器上软件操作的增多 增加了处理器的利用率?系统性能:依赖于根据处理器利用率以及处理器和硬件之间的总线带宽的利用率进行的软硬件划分?划分方案特征:划分的效果对系统的影响在于软硬件功能实施的一个平衡划分技术趋势功能划分划分技术趋势功能划分?将划分置于合成之前有很多优点order of magnitude reduction in logic synthesis runtime.Improved sys

10、tem performance as smaller processes can be synthesized with shorter clock period than one large processor.Improved satisfaction of I/O and size capacity constraints on a package,reducing inter-package signals(compared to structural partitioning)Many applications consist of one or smallnumber of ver

11、y large processes功能划分功能划分?将系统功能规格划分为多个子功能规格?每一个子功能规格代表着一个系统功能部件?各个部件被合成为晶体管或机器代码典型划分的五个任务典型划分的五个任务?创建模型converts input to an internal model(call graph model)?配置Instantiating processors of varying type(done before)?划分Dividing input process among allocated processors?转换modifies the input process into o

12、ne with different organization but same overall functionality,leading to better partition.?评估provides data used to create values for design metrics.Pre-estimation and online-estimation.期望!期望!?希望能够消除嵌入式系统中软件与硬件设计之间的区别?将重点集中在算法的设计,让分区成为设计中的一个自然的结果PartitioningPartitioning MethodologyMethodology?Three-s

13、tep method:GranularitySelectionPre-ClusteringN-wayAssignmentPre-EstimationOnlineEstimationAccess GraphPartitionedAccess GraphSequence of partitioningsteps proposed by VahidStep1:Granularity SelectionStep1:Granularity Selection?Goal:Extract procedure from specification,which are to be assignedto proc

14、essors during N-way assignment.?Granularity is a measure of complexityFine:many procedures of low complexity.?Little pre-estimation and online-estimation less accurate.Make online-estimation more complex to build higher accuracy.?Can be more time consuming and may prohibit the use of assignment heur

15、istics that need many estimations.Course:few procedures of high complexity.?many behaviors are grouped together into inseparable unit,so that any possible solution that separate those behavior is excluded.GranularityGranularity?Procedures are selected very carefully to balance the above effects.?Eac

16、h statement is treated as atomic unit.?Granularity Selection Problem:Partitioning statements into procedures such that,(1)procedures are as course-grained as possible,to enable maximum pre-estimation and application of powerful N-way heuristics and(2)statements are grouped into a procedure only if t

17、heir separation would yield inferior solution.Granularity Granularity?A straight forward heuristic:choose a specification construct to represent a procedure.I.e.each statement or block.Also,user defined procedure for partitioning.?Transformations can be used to improve the above strategyProcedure In

18、lining:replace procedure call by procedures contents making granularity coarser.Inline procedure disappears.Procedure cloning:makes a copy of a procedure for exclusive use by a particular caller.Ex:Multiply-called procedure if inlinedmight grow excess,and if not-inlined,might needs more communicatio

19、n.Cloning is a compromise.Illustration Illustration MwtFreq=1bits=0LCDClearLCDInitMode1Mode2LCDUpdateFreq=1,bits=8LCDSendXmitDataXmitLevelLevelFreq=48bits=8Mwtbytelevel LcdSend(byte)Mode1()LcdClear()Mode2()LcdUpdate(byte,byte)LcdInit()XmitLevel(byte)XmitData(bit)begin-sequence throgh modes-which the

20、n call-other proceduresInput specificationwith many proceduresAccess graphTransformation contd.Transformation contd.?Procedure Exlining:Replaces a subsequences of a procedures statements by a call to a new procedure containg only that subsequences.(opposite to inlining).This technique moves towards

21、finer granularity.Redundancy exlining:replaces two or more near-identical sequences of statements by one procedure.(use string matching method:statements are encoded characters)Distinct computation exlining:Divide a large sequence of statements into several smaller procedures such that statements wi

22、thin a procedure are tightly related and would not be separated during N-way assignment solution.Illustration of Illustration of exliningexliningMwtLcdInitFreq=1,bits=8Mode1Mode1aMode2LcdUpdateLcdSendFreq=48bits=8LevelXmitDataXmitLevelStep2:PreStep2:Pre-clusteringclustering?Goal:Reduce the number of

23、 procedures for subsequent N-way assignment by merging procedures whose separation among parts would never represent good solution.?Different from granularity step:procedures being clustered here may not be such that they could exlined into single new procedure.I.e.calls to theses procedure are non-

24、adjacent.?Different from N-way assignment:each cluster does not represent a processor and therefore can not be guided by direct design metrics estimates.PrePre-clustering methodclustering method?Uses hierarchical clustering:?procedures after granularity selection are converted to a graph node and ed

25、ges are created between every pair weighed by the closeness of the nodes,?closest pair of nodes are merged to a new node.This is repeated until no nodes are exceeding the threshold weight.10 Illustration of preIllustration of pre-clusteringclustering?Two procedures LcdUpdate and LcdSend communicate

26、heavily:48 times per call.?These two should never be separated.Since LcdSend appears 48 times inside LcdUpdate,inlining during granularity selection was not reasonable option.MwtLcdInitFreq=1,bits=8Mode1Mode1aMode2LcdUpdateLcdSendFreq=48bits=8LevelXmitDataXmitLevelMore on preMore on pre-clusteringcl

27、ustering?Can reduce runtime of N-way assignment by 30%or more?May look at Ethernet example in the reference.Step3:NStep3:N-way assignmentway assignment?Goal:Distribute the procedure among given set of processors.Procedures are created after granularity selection and pre-clustering?constructive heuri

28、stics are used to create initial solution and can include random distribution and clustering.?There is an additional metric:“Balanced size”.Size of an implementation of both sets of node divided by the size of all nodes.This favors merging small sets over large ones.?Heuristics applied:Greedy,Simula

29、ted Annealing,Hill climbingN N-way assignmentsway assignmentsGreedy algorithm:linear time heuristic that moves nodes that reduce the value of cost functionSimulated annealing:randomized hill climbing to avoid local minima with long runtimeExtended hill climbing:with some restrictions and tightly cou

30、pled data structure,O(n log(n)runtime?cloning transformation can be applied selectively here?port-calling,another transform:for I/O balance and ease access to shared ports.(I/O procedures are used in place of external port access that take care of send/receive etc.)Illustration of NIllustration of N

31、-way way assignmentsassignmentsMwtLcdInitFreq=1,bits=8Mode1Mode1aMode2LcdUpdateLcdSendFreq=48bits=8LevelXmitDataXmitLevelOther partitions of operationsOther partitions of operations?Aparty:among datapath modules using multi-stage clustering,?Vulcan:among packages using iterative improvement heuristi

32、cs?Chop:among packages focusing on providing suite of feasible solutions for each package that would satisfy overall constraints?Multipar:among packages simultaneous with scheduling and allocation,using linear programming?SpecPart:partitioned procedures among packages using clustering and iterative

33、improvements.Limitation of threeLimitation of three-step approach.step approach.?Total hardware increase may be large for examples with small controllers and large datapaths.?Problems that has large number of small processes-much like a scheduling problem?parallel execution on processors?Reference:F

34、rank Vahid,“A three-step approach to the functional partitioning of large behavioral processes”.Binary partitioningBinary partitioning?Goal:Map each node of a directed acyclic graph(DAG)to hardware or software(binary choice)and to determine the schedule for each node.?DAG:The task level description

35、of an application is specified as SDF(synchronous data flow)graph,then SDF is translated to DAG representing precedence relationship among the nodes.A DAG is input to partitioning tool.?Note:For a given mapping of a node(hw or sw),it is possible that the node can be implemented using various algorit

36、hms and synthesis mechanisms and they vary by area and delay outcomes.Call this“implementation bins”.Extended partitioningExtended partitioningGoal:Combine implementation bins with binary partitioning.?A joint problem of mapping nodes in DAG to hw or sw and within each mapping,select suitable implem

37、entation for better results.Hardware/Software Mapping and SchedulingHardware/Software Mapping and SchedulingImplementation-bin selectionBinary PartitioningExtended Partitioning+AssumptionsAssumptions1.The precedences between the tasks are specified as a DAG(G=(N,A).The throughput constraints on the

38、SDF graph translates to a deadline constraint D,I.e.,the execution time of the DAG should not exceed D clock cycles.2.Target architecture:programmable processor and custom datapath.These components have constraints.Software:program and data size,AS-memory capacity.Hardware has maximum size AH.3.Comm

39、unication cost of interface:ahcomm=hardware area such as glue logic interface,ascomm=software area the code size for send/rec,tcomm=#cycles to transfer data.AssumptionsAssumptions4.Self-timed blocking memory mapped interface.5.Communication cost of sw-sw and hw-hw neglected.6.Area and time estimates

40、 of each node is known.7.Nodes mapped to the hw do not share resources.Binary partitioning problem Binary partitioning problem?Given a DAG,area and time estimates for hw and sw mapping of all nodes,and communication cost,subject to resource capacity constraints and deadline D,determine for each node

41、 i,the hw or sw mapping(Mi)and the start time for the execution of the node(schedule ti),such that the total area occupied by the nodes mapped to hardware is minimum.Partitioning AlgorithmPartitioning Algorithm(with various notations)(with various notations)Graph parametersG,D,ahi,asi,thi,sizeiGCLP AlgorithmArchitecture constraintsAH,AS,ahcomm,ascomm,tcommOutputsMi,tithi:software execution time estimate for node isizei:size of node I(number of atomic operation)

展开阅读全文
相关资源
相关搜索

当前位置:首页 > 技术资料 > 其他杂项

本站为文档C TO C交易模式,本站只提供存储空间、用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。本站仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知得利文库网,我们立即给予删除!客服QQ:136780468 微信:18945177775 电话:18904686070

工信部备案号:黑ICP备15003705号-8 |  经营许可证:黑B2-20190332号 |   黑公网安备:91230400333293403D

© 2020-2023 www.deliwenku.com 得利文库. All Rights Reserved 黑龙江转换宝科技有限公司 

黑龙江省互联网违法和不良信息举报
举报电话:0468-3380021 邮箱:hgswwxb@163.com