Logit Regression | R Data Analysis Examples

文章推薦指數: 80 %
投票人數:10人

Logistic regression, also called a logit model, is used to model dichotomous outcome variables. In the logit model the log odds of the outcome is modeled as ... SkiptoprimarynavigationSkiptomaincontentSkiptoprimarysidebar    Logisticregression,alsocalledalogitmodel,isusedtomodeldichotomous outcomevariables.Inthelogitmodelthelogoddsoftheoutcomeismodeledasalinear combinationofthepredictorvariables. Thispageusesthefollowingpackages.Makesurethatyoucanload thembeforetryingtoruntheexamplesonthispage.Ifyoudonothave apackageinstalled,run:install.packages("packagename"),or ifyouseetheversionisoutofdate,run:update.packages(). library(aod) library(ggplot2) Versioninfo:CodeforthispagewastestedinRversion3.0.2(2013-09-25) On:2013-12-16 With:knitr1.5;ggplot20.9.3.1;aod1.3 Pleasenote:Thepurposeofthispageistoshowhowtousevariousdataanalysiscommands. Itdoesnotcoverallaspectsoftheresearchprocesswhichresearchersareexpectedtodo.In particular,itdoesnotcoverdatacleaningandchecking,verificationofassumptions,model diagnosticsandpotentialfollow-upanalyses. Examples Example1.Supposethatweareinterestedinthefactors thatinfluencewhetherapoliticalcandidatewinsanelection.The outcome(response)variableisbinary(0/1);winorlose. Thepredictorvariablesofinterestaretheamountofmoneyspentonthecampaign,the amountoftimespentcampaigningnegativelyandwhetherornotthecandidateisan incumbent. Example2.Aresearcherisinterestedinhowvariables,suchasGRE(GraduateRecordExamscores), GPA(gradepointaverage)andprestigeoftheundergraduateinstitution,effectadmissionintograduate school.Theresponsevariable,admit/don’tadmit,isabinaryvariable. Descriptionofthedata Forourdataanalysisbelow,wearegoingtoexpandonExample2aboutgetting intograduateschool.Wehavegeneratedhypotheticaldata,which canbeobtainedfromourwebsitefromwithinR.NotethatRrequiresforwardslashes (/)notbackslashes()whenspecifyingafilelocationevenifthefileis onyourharddrive. mydata|z|) ##(Intercept)-3.989981.13995-3.500.00047*** ##gre0.002260.001092.070.03847* ##gpa0.804040.331822.420.01539* ##rank2-0.675440.31649-2.130.03283* ##rank3-1.340200.34531-3.880.00010*** ##rank4-1.551460.41783-3.710.00020*** ##--- ##Signif.codes:0'***'0.001'**'0.01'*'0.05'.'0.1''1 ## ##(Dispersionparameterforbinomialfamilytakentobe1) ## ##Nulldeviance:499.98on399degreesoffreedom ##Residualdeviance:458.52on394degreesoffreedom ##AIC:470.5 ## ##NumberofFisherScoringiterations:4 Intheoutputabove,thefirstthingweseeisthecall, thisisRremindinguswhatthemodelweranwas,whatoptionswespecified,etc. Nextweseethedevianceresiduals,whichareameasureofmodelfit.Thispart ofoutputshowsthedistributionofthedevianceresidualsforindividualcasesused inthemodel.Belowwediscusshowtousesummariesofthedeviancestatistictoassessmodelfit. Thenextpartoftheoutputshowsthecoefficients,theirstandarderrors,thez-statistic(sometimes calledaWaldz-statistic),andtheassociatedp-values.Bothgreandgpaarestatistically significant,asarethethreetermsforrank.Thelogisticregressioncoefficients givethechangeinthelogoddsoftheoutcomeforaoneunitincreaseinthepredictorvariable. Foreveryoneunitchangeingre,thelogoddsofadmission(versusnon-admission) increasesby0.002. Foraoneunitincreaseingpa,thelogoddsofbeingadmittedtograduate schoolincreasesby0.804. Theindicatorvariablesforrankhaveaslightlydifferent interpretation.Forexample,havingattendedanundergraduateinstitutionwithrankof2, versusaninstitutionwitharankof1,changesthelogoddsofadmissionby -0.675. Belowthetableofcoefficientsarefitindices,includingthenullanddevianceresidualsandtheAIC. Laterweshowanexampleofhowyoucanusethesevaluestohelpassessmodelfit. Wecanusetheconfintfunctiontoobtainconfidence intervalsforthecoefficientestimates.Notethatforlogisticmodels, confidenceintervalsarebasedontheprofiledlog-likelihoodfunction. WecanalsogetCIsbasedonjustthestandarderrorsbyusingthedefaultmethod. ##CIsusingprofiledlog-likelihood confint(mylogit) ##Waitingforprofilingtobedone... ##2.5%97.5% ##(Intercept)-6.271620-1.79255 ##gre0.0001380.00444 ##gpa0.1602961.46414 ##rank2-1.300889-0.05675 ##rank3-2.027671-0.67037 ##rank4-2.400027-0.75354 ##CIsusingstandarderrors confint.default(mylogit) ##2.5%97.5% ##(Intercept)-6.22424-1.75572 ##gre0.000120.00441 ##gpa0.153681.45439 ##rank2-1.29575-0.05513 ##rank3-2.01699-0.66342 ##rank4-2.37040-0.73253 Wecantestforanoveralleffectofrankusingthewald.test functionoftheaodlibrary.The orderinwhichthecoefficientsaregiveninthetableofcoefficientsisthe sameastheorderofthetermsinthemodel.Thisisimportantbecausethe wald.testfunctionreferstothecoefficientsbytheirorderinthemodel. Weusethewald.testfunction.b suppliesthecoefficients,whileSigmasuppliesthevariancecovariance matrixoftheerrorterms,finallyTermstellsRwhichtermsinthemodel aretobetested,inthiscase,terms4,5,and6,arethethreetermsforthe levelsofrank. wald.test(b=coef(mylogit),Sigma=vcov(mylogit),Terms=4:6) ##Waldtest: ##---------- ## ##Chi-squaredtest: ##X2=20.9,df=3,P(>X2)=0.00011 Thechi-squaredteststatisticof20.9,withthreedegreesoffreedomis associatedwithap-valueof0.00011indicatingthattheoveralleffectof rankisstatisticallysignificant. Wecanalsotestadditionalhypothesesaboutthedifferencesinthe coefficientsforthedifferentlevelsofrank.Belowwe testthatthecoefficientforrank=2isequaltothecoefficientforrank=3. Thefirstlineofcodebelowcreatesavectorlthatdefinesthetestwe wanttoperform.Inthiscase,wewanttotestthedifference(subtraction)of thetermsforrank=2andrank=3(i.e.,the4thand5thtermsinthe model).Tocontrastthesetwoterms,wemultiplyoneofthemby1,andtheother by-1.Theothertermsinthemodelarenotinvolvedinthetest,sotheyare multipliedby0.ThesecondlineofcodebelowusesL=ltotellRthatwe wishtobasethetestonthevectorl(ratherthanusingtheTermsoption aswedidabove). lX2)=0.019 Thechi-squaredteststatisticof5.5with1degreeoffreedomisassociatedwith ap-valueof0.019,indicatingthatthedifferencebetweenthecoefficientforrank=2 andthecoefficientforrank=3isstatisticallysignificant. Youcanalsoexponentiatethecoefficientsandinterpretthemas odds-ratios.Rwilldothiscomputationforyou. Togettheexponentiatedcoefficients,youtellRthatyouwant toexponentiate(exp),andthattheobjectyouwanttoexponentiateis calledcoefficientsanditispartofmylogit(coef(mylogit)).Wecanuse thesamelogictogetoddsratiosandtheirconfidenceintervals,byexponentiating theconfidenceintervalsfrombefore.Toputitallinonetable,weusecbindto bindthecoefficientsandconfidenceintervalscolumn-wise. ##oddsratiosonly exp(coef(mylogit)) ##(Intercept)gregparank2rank3rank4 ##0.01851.00232.23450.50890.26180.2119 ##oddsratiosand95%CI exp(cbind(OR=coef(mylogit),confint(mylogit))) ##Waitingforprofilingtobedone... ##OR2.5%97.5% ##(Intercept)0.01850.001890.167 ##gre1.00231.000141.004 ##gpa2.23451.173864.324 ##rank20.50890.272290.945 ##rank30.26180.131640.512 ##rank40.21190.090720.471 Nowwecansaythatforaoneunitincreaseingpa,theoddsofbeing admittedtograduateschool(versusnotbeingadmitted)increasebyafactorof 2.23.FormoreinformationoninterpretingoddsratiosseeourFAQpage HowdoIinterpretoddsratiosinlogisticregression? .NotethatwhileRproducesit,theoddsratiofortheinterceptisnotgenerallyinterpreted. Youcanalsousepredictedprobabilitiestohelpyouunderstandthemodel. Predictedprobabilitiescanbecomputedforbothcategoricalandcontinuous predictorvariables.Inordertocreate predictedprobabilitieswefirstneedtocreateanewdataframewiththevalues wewanttheindependentvariablestotakeontocreateourpredictions. Wewillstartbycalculatingthepredictedprobabilityofadmissionateach valueofrank,holdinggreandgpaattheirmeans.Firstwecreate andviewthedataframe. newdata1



請為這篇文章評分?