Interprocedural optimization - Wikipedia
文章推薦指數: 80 %
WPO and LTO
Interproceduraloptimization
FromWikipedia,thefreeencyclopedia
Jumptonavigation
Jumptosearch
Computerprogramoptimizationmethod
Interproceduraloptimization(IPO)isacollectionofcompilertechniquesusedincomputerprogrammingtoimproveperformanceinprogramscontainingmanyfrequentlyusedfunctionsofsmallormediumlength.IPOdiffersfromothercompileroptimizationbecauseitanalyzestheentireprogram;otheroptimizationslookatonlyasinglefunction,orevenasingleblockofcode.
IPOseekstoreduceoreliminateduplicatecalculations,inefficientuseofmemory,andtosimplifyiterativesequencessuchasloops.Ifthereisacalltoanotherroutinethatoccurswithinaloop,IPOanalysismaydeterminethatitisbesttoinlinethat.Additionally,IPOmayre-ordertheroutinesforbettermemorylayoutandlocality.
IPOmayalsoincludetypicalcompileroptimizationsonawhole-programlevel,forexampledeadcodeelimination(DCE),whichremovescodethatisneverexecuted.Toaccomplishthis,thecompilertestsforbranchesthatarenevertakenandremovesthecodeinthatbranch.IPOalsotriestoensurebetteruseofconstants.ModerncompilersofferIPOasanoptionatcompile-time.TheactualIPOprocessmayoccuratanystepbetweenthehuman-readablesourcecodeandproducingafinishedexecutablebinaryprogram.
Forlanguagesthatcompileonafile-by-filebasis,effectiveIPOacrosstranslationunits(modulefiles)requiresknowledgeofthe"entrypoints"oftheprogramsothatawholeprogramoptimization(WPO)canberun.Inmanycases,thisisimplementedasalink-timeoptimization(LTO)pass,becausethewholeprogramisvisibletothelinker.
Contents
1Analysis
2WPOandLTO
3Example
3.1Ingeneral
4History
5Flagsandimplementation
5.1Unix-like
5.1.1Non-LTOoptions
5.2Other
6Seealso
7References
8Externallinks
Analysis[edit]
Theobjectiveofanyoptimizationforspeedistohavetheprogramrunasswiftlyaspossible;theproblemisthatitisnotpossibleforacompilertocorrectlyanalyzeaprogramanddeterminewhatitwilldo,muchlesswhattheprogrammerintendedforittodo.Bycontrast,humanprogrammersstartattheotherendwithapurpose,andattempttoproduceaprogramthatwillachieveit,preferablywithoutexpendingalotofthoughtintheprocess.
Forvariousreasons,includingreadability,programsarefrequentlybrokenupintoanumberofprocedures,whichhandleafewgeneralcases.However,thegeneralityofeachproceduremayresultinwastedeffortinspecificusages.Interproceduraloptimizationrepresentsanattemptatreducingthiswaste.
SupposethereisaprocedurethatevaluatesF(x),andthecoderequeststheresultofF(6)andthenlater,F(6)again.Thissecondevaluationisalmostcertainlyunnecessary:theresultcouldhaveinsteadbeensavedandreferredtolater,assumingthatFisapurefunction.ThissimpleoptimizationisfoiledthemomentthattheimplementationofF(x)becomesimpure;thatis,itsexecutioninvolvesreferencetoparametersotherthantheexplicitargument6thathavebeenchangedbetweentheinvocations,orsideeffectssuchasprintingsomemessagetoalog,countingthenumberofevaluations,accumulatingtheCPUtimeconsumed,preparinginternaltablessothatsubsequentinvocationsforrelatedparameterswillbefacilitated,andsoforth.Losingthesesideeffectsvianon-evaluationasecondtimemaybeacceptable,ortheymaynot.
Moregenerally,asidefromoptimization,thesecondreasontouseproceduresistoavoidduplicationofcodethatwouldproducethesameresults,oralmostthesameresults,eachtimetheprocedureisperformed.Ageneralapproachtooptimizationwouldthereforebetoreversethis:someorallinvocationsofacertainprocedurearereplacedbytherespectivecode,withtheparametersappropriatelysubstituted.Thecompilerwillthentrytooptimizetheresult.
WPOandLTO[edit]
Wholeprogramoptimization(WPO)isthecompileroptimizationofaprogramusinginformationaboutallthemodulesintheprogram.Normally,optimizationsareperformedonapermodule,"compiland",basis;butthisapproach,whileeasiertowriteandtestandlessdemandingofresourcesduringthecompilationitself,doesnotallowcertaintyaboutthesafetyofanumberofoptimizationssuchasaggressiveinliningandthuscannotperformthemeveniftheywouldactuallyturnouttobeefficiencygainsthatdonotchangethesemanticsoftheemittedobjectcode.
Link-timeoptimization(LTO)isatypeofprogramoptimizationperformedbyacompilertoaprogramatlinktime.Linktimeoptimizationisrelevantinprogramminglanguagesthatcompileprogramsonafile-by-filebasis,andthenlinkthosefilestogether(suchasCandFortran),ratherthanallatonce(suchasJava'sjust-in-timecompilation(JIT)).
Onceallfileshavebeencompiledseparatelyintoobjectfiles,traditionally,acompilerlinks(merges)theobjectfilesintoasinglefile,theexecutable.However,inLTOasimplementedbytheGNUCompilerCollection(GCC)orLLVM,thecompilerisabletodumpitsintermediaterepresentation(GIMPLEbytecodeorLLVMbitcode)todisksothatallthedifferentcompilationunitsthatwillgotomakeupasingleexecutablecanbeoptimizedasasinglemodulewhenthelinkfinallyhappens.Thisexpandsthescopeofinterproceduraloptimizationstoencompassthewholeprogram(or,rather,everythingthatisvisibleatlinktime).Withlink-timeoptimization,thecompilercanapplyvariousformsofinterproceduraloptimizationtothewholeprogram,allowingfordeeperanalysis,moreoptimization,andultimatelybetterprogramperformance.
Inpractice,LTOdoesnotalwaysoptimizetheentireprogram--libraryfunctions,especiallydynamicallylinkedsharedobjects,areintentionallykeptouttoavoidexcessiveduplicationandtoallowforupdating.StaticlinkingdoesnaturallylendtotheconceptofLTO,butitonlyworkswithlibraryarchivesthatcontainIRobjectsasopposedtomachine-codeonlyobjectfiles.[1]Duetoperformanceconcerns,noteventheentireunitisalwaysdirectlyused--aprogramcouldbepartitionedinadivide-and-conquerstyleLTOsuchasGCC'sWHOPR.[2]Andofcourse,whentheprogrambeingbuiltisitselfalibrary,theoptimizationwouldkeepeveryexternally-available(exported)symbol,withouttryingtoohardatremovingthemasapartofDCE.[1]
AmuchmorelimitedformofWPOisstillpossiblewithoutLTO,asexemplifiedbyGCC's-fwhole-programswitch.ThismodemakesGCCassumethatthemodulebeingcompiledcontainstheentrypoint(usuallymain())oftheentireprogram,sothateveryotherfunctioninitisnotexternallyusedandcanbesafelyoptimizedaway.Sinceitonlyappliestoonemoduleonly,itcannottrulyencompassthewholeprogram.(ItcanbecombinedwithLTOintheone-big-modulesense,usefulwhenthelinkerisnotcommunicatingbacktoGCCaboutwhatentrypointsorsymbolsarebeingusedexternally.)[1]
Example[edit]
Programexample;
integerb;{Avariable"global"totheprocedureSilly.}
ProcedureSilly(a,x)
ifx<0thena:=x+belsea:=-6;
EndSilly;{Referencetob,notaparameter,makesSilly"impure"ingeneral.}
integera,x;{ThesevariablesarevisibletoSillyonlyifparameters.}
x:=7;b:=5;
Silly(a,x);write(x);
Silly(x,a);write(x);
Silly(b,b);write(b);
Endexample;
IftheparameterstoSillyarepassedbyvalue,theactionsoftheprocedurehavenoeffectontheoriginalvariables,andsinceSillydoesnothingtoitsenvironment(readfromafile,writetoafile,modifyglobalvariablessuchasb,etc.)itscodeplusallinvocationsmaybeoptimizedawayentirely,leavingthevalueofaundefined(whichdoesn'tmatter)sothatjusttheprintstatementsremain,andtheyforconstantvalues.
Ifinsteadtheparametersarepassedbyreference,thenactiononthemwithinSillydoesindeedaffecttheoriginals.Thisisusuallydonebypassingthemachineaddressoftheparameterstotheproceduresothattheprocedure'sadjustmentsaretotheoriginalstoragearea.
Thusinthecaseofcallbyreference,procedureSillyhasaneffect.Supposethatitsinvocationsareexpandedinplace,withparametersidentifiedbyaddress:thecodeamountsto
x:=7;b:=5;
ifx<0thena:=x+belsea:=-6;write(x);{aischanged.}
ifa<0thenx:=a+belsex:=-6;write(x);{Becausetheparametersareswapped.}
ifb<0thenb:=b+belseb:=-6;write(b);{TwoversionsofvariablebinSilly,plustheglobalusage.}
Thecompilercouldtheninthisrathersmallexamplefollowtheconstantsalongthelogic(suchasitis)andfindthatthepredicatesoftheif-statementsareconstantandso...
x:=7;b:=5;
a:=-6;write(7);{bisnotreferenced,sothisusageremains"pure".}
x:=-1;write(-1);{bisreferenced...}
b:=-6;write(-6);{bismodifiedviaitsparametermanifestation.}
Andsincetheassignmentstoa,bandxdelivernothingtotheoutsideworld-theydonotappearinoutputstatements,norasinputtosubsequentcalculations(whoseresultsinturndoleadtooutput,elsetheyalsoareneedless)-thereisnopointinthiscodeeither,andsotheresultis
write(7);
write(-1);
write(-6);
Avariantmethodforpassingparametersthatappeartobe"byreference"iscopy-in,copy-outwherebytheprocedureworksonalocalcopyoftheparameterswhosevaluesarecopiedbacktotheoriginalsonexitfromtheprocedure.IftheprocedurehasaccesstothesameparameterbutindifferentwaysasininvocationssuchasSilly(a,a)orSilly(a,b),discrepanciescanarise.So,iftheparameterswerepassedbycopy-in,copy-outinleft-to-rightorderthenSilly(b,b)wouldexpandinto
p1:=b;p2:=b;{Copyin.Localvariablesp1andp2areequal.}
ifp2<0thenp1:=p2+belsep1:=-6;{Thusp1maynolongerequalp2.}
b:=p1;b:=p2;{Copyout.Inleft-to-rightorder,thevaluefromp1isoverwritten.}
Andinthiscase,copyingthevalueofp1(whichhasbeenchanged)tobispointless,becauseitisimmediatelyoverwrittenbythevalueofp2,whichvaluehasnotbeenmodifiedwithintheprocedurefromitsoriginalvalueofb,andsothethirdstatementbecomes
write(5);{Not-6}
Suchdifferencesinbehaviorarelikelytocausepuzzlement,exacerbatedbyquestionsastotheorderinwhichtheparametersarecopied:willitbelefttorightonexitaswellasentry?Thesedetailsareprobablynotcarefullyexplainedinthecompilermanual,andiftheyare,theywilllikelybepassedoverasbeingnotrelevanttotheimmediatetaskandlongforgottenbythetimeaproblemarises.If(asislikely)temporaryvaluesareprovidedviaastackstoragescheme,thenitislikelythatthecopy-backprocesswillbeinthereverseordertothecopy-in,whichinthisexamplewouldmeanthatp1wouldbethelastvaluereturnedtobinstead.
Theprocessofexpandingaprocedurein-lineshouldnotberegardedasavariantoftextualreplacement(asinmacroexpansions)becausesyntaxerrorsmayariseaswhenparametersaremodifiedandtheparticularinvocationusesconstantsasparameters.Becauseitisimportanttobesurethatanyconstantssuppliedasparameterswillnothavetheirvaluechanged(constantscanbeheldinmemoryjustasvariablesare)lestsubsequentusagesofthatconstant(madeviareferencetoitsmemorylocation)goawry,acommontechniqueisforthecompilertogeneratecodecopyingtheconstant'svalueintoatemporaryvariablewhoseaddressispassedtotheprocedure,andifitsvalueismodified,nomatter;itisnevercopiedbacktothelocationoftheconstant.
Putanotherway,acarefullywrittentestprogramcanreportonwhetherparametersarepassedbyvalueorreference,andifused,whatsortofcopy-inandcopy-outscheme.However,variationisendless:simpleparametersmightbepassedbycopywhereaslargeaggregatessuchasarraysmightbepassedbyreference;simpleconstantssuchaszeromightbegeneratedbyspecialmachinecodes(suchasClear,orLoadZ)whilemorecomplexconstantsmightbestoredinmemorytaggedasread-onlywithanyattemptatmodifyingitresultinginimmediateprogramtermination,etc.
Ingeneral[edit]
Thisexampleisextremelysimple,althoughcomplicationsarealreadyapparent.Morelikelyitwillbeacaseofmanyprocedures,havingavarietyofdeducibleorprogrammer-declaredpropertiesthatmayenablethecompiler'soptimizationstofindsomeadvantage.Anyparametertoaproceduremightbereadonly,bewrittento,bebothreadandwrittento,orbeignoredaltogethergivingrisetoopportunitiessuchasconstantsnotneedingprotectionviatemporaryvariables,butwhathappensinanygiveninvocationmaywelldependonacomplexwebofconsiderations.Otherprocedures,especiallyfunction-likeprocedureswillhavecertainbehavioursthatinspecificinvocationsmayenablesomeworktobeavoided:forinstance,theGammafunction,ifinvokedwithanintegerparameter,couldbeconvertedtoacalculationinvolvingintegerfactorials.
Somecomputerlanguagesenable(orevenrequire)assertionsastotheusageofparameters,andmightfurtheroffertheopportunitytodeclarethatvariableshavetheirvaluesrestrictedtosomeset(forinstance,6
延伸文章資訊
- 1What is Link Time Optimization (LTO) - Arm C/C++ Compiler ...
Link Time Optimization is a form of interprocedural optimization that is performed at the time of...
- 2Optimizing across modules with Link-Time Optimization - Arm ...
When Link-Time Optimization (LTO) is enabled, the compiler translates source code into an interme...
- 3Enabling Link-Time Optimization - Keil
You must enable Link-Time Optimization (LTO) in both armclang and armlink . To enable LTO: At com...
- 4LinkTimeOptimization - GCC Wiki
Link Time Optimization (LTO) gives GCC the capability of dumping its internal representation (GIM...
- 5LLVM Link Time Optimization: Design and Implementation
LLVM features powerful intermodular optimizations which can be used at link time. Link Time Optim...