11.2 Probit and Logit Regression
文章推薦指數: 80 %
Probit and Logit models are harder to interpret but capture the nonlinearities better than the linear approach: both models produce predictions of probabilities ... Preface 1Introduction 1.1Colophon 1.2AVeryShortIntroductiontoRandRStudio 2ProbabilityTheory 2.1RandomVariablesandProbabilityDistributions ProbabilityDistributionsofDiscreteRandomVariables BernoulliTrials ExpectedValue,MeanandVariance ProbabilityDistributionsofContinuousRandomVariables TheNormalDistribution TheChi-SquaredDistribution TheStudenttDistribution TheFDistribution 2.2RandomSamplingandtheDistributionofSampleAverages MeanandVarianceoftheSampleMean LargeSampleApproximationstoSamplingDistributions 2.3Exercises 3AReviewofStatisticsusingR 3.1EstimationofthePopulationMean 3.2PropertiesoftheSampleMean 3.3HypothesisTestsConcerningthePopulationMean Thep-Value Calculatingthep-ValuewhentheStandardDeviationisKnown SampleVariance,SampleStandardDeviationandStandardError Calculatingthep-valueWhentheStandardDeviationisUnknown Thet-statistic HypothesisTestingwithaPrespecifiedSignificanceLevel One-sidedAlternatives 3.4ConfidenceIntervalsforthePopulationMean 3.5ComparingMeansfromDifferentPopulations 3.6AnApplicationtotheGenderGapofEarnings 3.7Scatterplots,SampleCovarianceandSampleCorrelation 3.8Exercises 4LinearRegressionwithOneRegressor 4.1SimpleLinearRegression 4.2EstimatingtheCoefficientsoftheLinearRegressionModel TheOrdinaryLeastSquaresEstimator 4.3MeasuresofFit TheCoefficientofDetermination TheStandardErroroftheRegression ApplicationtotheTestScoreData 4.4TheLeastSquaresAssumptions Assumption1:TheErrorTermhasConditionalMeanofZero Assumption2:IndependentlyandIdenticallyDistributedData Assumption3:LargeOutliersareUnlikely 4.5TheSamplingDistributionoftheOLSEstimator SimulationStudy1 SimulationStudy2 SimulationStudy3 4.6Exercises 5HypothesisTestsandConfidenceIntervalsintheSimpleLinearRegressionModel 5.1TestingTwo-SidedHypothesesConcerningtheSlopeCoefficient 5.2ConfidenceIntervalsforRegressionCoefficients SimulationStudy:ConfidenceIntervals 5.3RegressionwhenXisaBinaryVariable 5.4HeteroskedasticityandHomoskedasticity AReal-WorldExampleforHeteroskedasticity ShouldWeCareAboutHeteroskedasticity? ComputationofHeteroskedasticity-RobustStandardErrors 5.5TheGauss-MarkovTheorem SimulationStudy:BLUEEstimator 5.6Usingthet-StatisticinRegressionWhentheSampleSizeIsSmall 5.7Exercises 6RegressionModelswithMultipleRegressors 6.1OmittedVariableBias 6.2TheMultipleRegressionModel 6.3MeasuresofFitinMultipleRegression 6.4OLSAssumptionsinMultipleRegression Multicollinearity SimulationStudy:ImperfectMulticollinearity 6.5TheDistributionoftheOLSEstimatorsinMultipleRegression 6.6Exercises 7HypothesisTestsandConfidenceIntervalsinMultipleRegression 7.1HypothesisTestsandConfidenceIntervalsforaSingleCoefficient 7.2AnApplicationtoTestScoresandtheStudent-TeacherRatio AnotherAugmentationoftheModel 7.3JointHypothesisTestingUsingtheF-Statistic 7.4ConfidenceSetsforMultipleCoefficients 7.5ModelSpecificationforMultipleRegression ModelSpecificationinTheoryandinPractice 7.6AnalysisoftheTestScoreDataSet 7.7Exercises 8NonlinearRegressionFunctions 8.1AGeneralStrategyforModellingNonlinearRegressionFunctions 8.2NonlinearFunctionsofaSingleIndependentVariable Polynomials Logarithms 8.3InteractionsBetweenIndependentVariables 8.4NonlinearEffectsonTestScoresoftheStudent-TeacherRatio 8.5Exercises 9AssessingStudiesBasedonMultipleRegression 9.1InternalandExternalValidity 9.2ThreatstoInternalValidityofMultipleRegressionAnalysis 9.3InternalandExternalValiditywhentheRegressionisUsedforForecasting 9.4Example:TestScoresandClassSize 9.5Exercises 10RegressionwithPanelData 10.1PanelData 10.2PanelDatawithTwoTimePeriods:“BeforeandAfter”Comparisons 10.3FixedEffectsRegression EstimationandInference ApplicationtoTrafficDeaths 10.4RegressionwithTimeFixedEffects 10.5TheFixedEffectsRegressionAssumptionsandStandardErrorsforFixedEffectsRegression 10.6DrunkDrivingLawsandTrafficDeaths 10.7Exercises 11RegressionwithaBinaryDependentVariable 11.1BinaryDependentVariablesandtheLinearProbabilityModel 11.2ProbitandLogitRegression ProbitRegression LogitRegression 11.3EstimationandInferenceintheLogitandProbitModels 11.4ApplicationtotheBostonHMDAData 11.5Exercises 12InstrumentalVariablesRegression 12.1TheIVEstimatorwithaSingleRegressorandaSingleInstrument 12.2TheGeneralIVRegressionModel 12.3CheckingInstrumentValidity 12.4ApplicationtotheDemandforCigarettes 12.5WhereDoValidInstrumentsComeFrom? 12.6Exercises 13ExperimentsandQuasi-Experiments 13.1PotentialOutcomes,CausalEffectsandIdealizedExperiments 13.2ThreatstoValidityofExperiments 13.3ExperimentalEstimatesoftheEffectofClassSizeReductions ExperimentalDesignandtheDataSet AnalysisoftheSTARData 13.4QuasiExperiments TheDifferences-in-DifferencesEstimator RegressionDiscontinuityEstimators 13.5Exercises 14IntroductiontoTimeSeriesRegressionandForecasting 14.1UsingRegressionModelsforForecasting 14.2TimeSeriesDataandSerialCorrelation Notation,Lags,Differences,LogarithmsandGrowthRates 14.3Autoregressions AutoregressiveModelsofOrder\(p\) 14.4CanYouBeattheMarket?(PartI) 14.5AdditionalPredictorsandTheADLModel ForecastUncertaintyandForecastIntervals 14.6LagLengthSelectionUsingInformationCriteria 14.7NonstationarityI:Trends 14.8NonstationarityII:Breaks 14.9CanYouBeattheMarket?(PartII) 15EstimationofDynamicCausalEffects 15.1TheOrangeJuiceData 15.2DynamicCausalEffects 15.3DynamicMultipliersandCumulativeDynamicMultipliers 15.4HACStandardErrors 15.5EstimationofDynamicCausalEffectswithStrictlyExogeneousRegressors 15.6OrangeJuicePricesandColdWeather 16AdditionalTopicsinTimeSeriesRegression 16.1VectorAutoregressions 16.2OrdersofIntegrationandtheDF-GLSUnitRootTest 16.3Cointegration 16.4VolatilityClusteringandAutoregressiveConditionalHeteroskedasticity ARCHandGARCHModels ApplicationtoStockPriceVolatility Summary References Publishedwithbookdown IntroductiontoEconometricswithR ThisbookisinOpenReview.Wewantyourfeedbacktomakethebookbetterforyouandotherstudents.Youmayannotatesometextbyselectingitwiththecursorandthenclicktheonthepop-upmenu.Youcanalsoseetheannotationsofothers:clicktheintheupperrighthandcornerofthepage 11.2ProbitandLogitRegression Thelinearprobabilitymodelhasamajorflaw:itassumestheconditionalprobabilityfunctiontobelinear.Thisdoesnotrestrict\(P(Y=1\vertX_1,\dots,X_k)\)toliebetween\(0\)and\(1\).WecaneasilyseethisinourreproductionofFigure11.1ofthebook:for\(P/I\ratio\geq1.75\),(11.2)predictstheprobabilityofamortgageapplicationdenialtobebiggerthan\(1\).Forapplicationswith\(P/I\ratio\)closeto\(0\),thepredictedprobabilityofdenialisevennegativesothatthemodelhasnomeaningfulinterpretationhere. Thiscircumstancecallsforanapproachthatusesanonlinearfunctiontomodeltheconditionalprobabilityfunctionofabinarydependentvariable.CommonlyusedmethodsareProbitandLogitregression. ProbitRegression InProbitregression,thecumulativestandardnormaldistributionfunction\(\Phi(\cdot)\)isusedtomodeltheregressionfunctionwhenthedependentvariableisbinary,thatis,weassume \[\begin{align} E(Y\vertX)=P(Y=1\vertX)=\Phi(\beta_0+\beta_1X).\tag{11.4} \end{align}\] \(\beta_0+\beta_1X\)in(11.4)playstheroleofaquantile\(z\).Rememberthat\[\Phi(z)=P(Z\leqz)\,\Z\sim\mathcal{N}(0,1)\]suchthattheProbitcoefficient\(\beta_1\)in(11.4)isthechangein\(z\)associatedwithaoneunitchangein\(X\).Althoughtheeffecton\(z\)ofachangein\(X\)islinear,thelinkbetween\(z\)andthedependentvariable\(Y\)isnonlinearsince\(\Phi\)isanonlinearfunctionof\(X\). Sincethedependentvariableisanonlinearfunctionoftheregressors,thecoefficienton\(X\)hasnosimpleinterpretation.AccordingtoKeyConcept8.1,theexpectedchangeintheprobabilitythat\(Y=1\)duetoachangein\(P/I\ratio\)canbecomputedasfollows: Computethepredictedprobabilitythat\(Y=1\)fortheoriginalvalueof\(X\). Computethepredictedprobabilitythat\(Y=1\)for\(X+\DeltaX\). Computethedifferencebetweenbothpredictedprobabilities. Ofcoursewecangeneralize(11.4)toProbitregressionwithmultipleregressorstomitigatetheriskoffacingomittedvariablebias.ProbitregressionessentialsaresummarizedinKeyConcept11.2. KeyConcept11.2 ProbitModel,PredictedProbabilitiesandEstimatedEffects Assumethat\(Y\)isabinaryvariable.Themodel \[Y=\beta_0+\beta_1+X_1+\beta_2X_2+\dots+\beta_kX_k+u\] with \[P(Y=1\vertX_1,X_2,\dots,X_k)=\Phi(\beta_0+\beta_1+X_1+\beta_2X_2+\dots+\beta_kX_k)\] isthepopulationProbitmodelwithmultipleregressors\(X_1,X_2,\dots,X_k\)and\(\Phi(\cdot)\)isthecumulativestandardnormaldistributionfunction. Thepredictedprobabilitythat\(Y=1\)given\(X_1,X_2,\dots,X_k\)canbecalculatedintwosteps: Compute\(z=\beta_0+\beta_1X_1+\beta_2X_2+\dots+\beta_kX_k\) Lookup\(\Phi(z)\)bycallingpnorm(). \(\beta_j\)istheeffecton\(z\)ofaoneunitchangeinregressor\(X_j\),holdingconstantallother\(k-1\)regressors. TheeffectonthepredictedprobabilityofachangeinaregressorcanbecomputedasinKeyConcept8.1. InR,Probitmodelscanbeestimatedusingthefunctionglm()fromthepackagestats.UsingtheargumentfamilywespecifythatwewanttouseaProbitlinkfunction. WenowestimateasimpleProbitmodeloftheprobabilityofamortgagedenial. #estimatethesimpleprobitmodel denyprobit #>ztestofcoefficients: #> #>EstimateStd.ErrorzvaluePr(>|z|) #>(Intercept)-2.194150.18901-11.6087<2.2e-16*** #>pirat2.967870.536985.52693.259e-08*** #>--- #>Signif.codes:0'***'0.001'**'0.01'*'0.05'.'0.1''1 Theestimatedmodelis \[\begin{align} \widehat{P(deny\vertP/I\ratio})=\Phi(-\underset{(0.19)}{2.19}+\underset{(0.54)}{2.97}P/I\ratio).\tag{11.5} \end{align}\] Justasinthelinearprobabilitymodelwefindthattherelationbetweentheprobabilityofdenialandthepayments-to-incomeratioispositiveandthatthecorrespondingcoefficientishighlysignificant. ThefollowingcodechunkreproducesFigure11.2ofthebook. #plotdata plot(x=HMDA$pirat, y=HMDA$deny, main="ProbitModeloftheProbabilityofDenial,GivenP/IRatio", xlab="P/Iratio", ylab="Deny", pch=20, ylim=c(-0.4,1.4), cex.main=0.85) #addhorizontaldashedlinesandtext abline(h=1,lty=2,col="darkred") abline(h=0,lty=2,col="darkred") text(2.5,0.9,cex=0.8,"Mortgagedenied") text(2.5,-0.1,cex=0.8,"Mortgageapproved") #addestimatedregressionline x2 #>0.06081433 Wefindthatanincreaseinthepayment-to-incomeratiofrom\(0.3\)to\(0.4\)ispredictedtoincreasetheprobabilityofdenialbyapproximately\(6.2\%\). WecontinuebyusinganaugmentedProbitmodeltoestimatetheeffectofraceontheprobabilityofamortgageapplicationdenial. denyprobit2 #>ztestofcoefficients: #> #>EstimateStd.ErrorzvaluePr(>|z|) #>(Intercept)-2.2587870.176608-12.7898<2.2e-16*** #>pirat2.7417790.4976735.50923.605e-08*** #>blackyes0.7081550.0830918.5227<2.2e-16*** #>--- #>Signif.codes:0'***'0.001'**'0.01'*'0.05'.'0.1''1 Theestimatedmodelequationis \[\begin{align} \widehat{P(deny\vertP/I\ratio,black)}=\Phi(-\underset{(0.18)}{2.26}+\underset{(0.50)}{2.74}P/I\ratio+\underset{(0.08)}{0.71}black).\tag{11.6} \end{align}\] Whileallcoefficientsarehighlysignificant,boththeestimatedcoefficientsonthepayments-to-incomeratioandtheindicatorforAfricanAmericandescentarepositive.Again,thecoefficientsaredifficulttointerpretbuttheyindicatethat,first,AfricanAmericanshaveahigherprobabilityofdenialthanwhiteapplicants,holdingconstantthepayments-to-incomeratioandsecond,applicantswithahighpayments-to-incomeratiofaceahigherriskofbeingrejected. Howbigistheestimateddifferenceindenialprobabilitiesbetweentwohypotheticalapplicantswiththesamepayments-to-incomeratio?Asbefore,wemayusepredict()tocomputethisdifference. #1.computepredictionsforP/Iratio=0.3 predictions2 #>0.1578117 Inthiscase,theestimateddifferenceindenialprobabilitiesisabout\(15.8\%\). LogitRegression KeyConcept11.3summarizestheLogitregressionfunction. KeyConcept11.3 LogitRegression ThepopulationLogitregressionfunctionis \[\begin{align*} P(Y=1\vertX_1,X_2,\dots,X_k)=&\,F(\beta_0+\beta_1X_1+\beta_2X_2+\dots+\beta_kX_k)\\ =&\,\frac{1}{1+e^{-(\beta_0+\beta_1X_1+\beta_2X_2+\dots+\beta_kX_k)}}. \end{align*}\] TheideaissimilartoProbitregressionexceptthatadifferentCDFisused:\[F(x)=\frac{1}{1+e^{-x}}\]istheCDFofastandardlogisticallydistributedrandomvariable. AsforProbitregression,thereisnosimpleinterpretationofthemodelcoefficientsanditisbesttoconsiderpredictedprobabilitiesordifferencesinpredictedprobabilities.Hereagain,\(t\)-statisticsandconfidenceintervalsbasedonlargesamplenormalapproximationscanbecomputedasusual. ItisfairlyeasytoestimateaLogitregressionmodelusingR. denylogit #>ztestofcoefficients: #> #>EstimateStd.ErrorzvaluePr(>|z|) #>(Intercept)-4.028430.35898-11.2218<2.2e-16*** #>pirat5.884501.000155.88364.014e-09*** #>--- #>Signif.codes:0'***'0.001'**'0.01'*'0.05'.'0.1''1 ThesubsequentcodechunkreproducesFigure11.3ofthebook. #plotdata plot(x=HMDA$pirat, y=HMDA$deny, main="ProbitandLogitModelsModeloftheProbabilityofDenial,GivenP/IRatio", xlab="P/Iratio", ylab="Deny", pch=20, ylim=c(-0.4,1.4), cex.main=0.9) #addhorizontaldashedlinesandtext abline(h=1,lty=2,col="darkred") abline(h=0,lty=2,col="darkred") text(2.5,0.9,cex=0.8,"Mortgagedenied") text(2.5,-0.1,cex=0.8,"Mortgageapproved") #addestimatedregressionlineofProbitandLogitmodels x #>ztestofcoefficients: #> #>EstimateStd.ErrorzvaluePr(>|z|) #>(Intercept)-4.125560.34597-11.9245<2.2e-16*** #>pirat5.370360.963765.57232.514e-08*** #>blackyes1.272780.146168.7081<2.2e-16*** #>--- #>Signif.codes:0'***'0.001'**'0.01'*'0.05'.'0.1''1 Weobtain \[\begin{align} \widehat{P(deny=1\vertP/Iratio,black)}=F(-\underset{(0.35)}{4.13}+\underset{(0.96)}{5.37}P/I\ratio+\underset{(0.15)}{1.27}black).\tag{11.7} \end{align}\] AsfortheProbitmodel(11.6)allmodelcoefficientsarehighlysignificantandweobtainpositiveestimatesforthecoefficientson\(P/I\ratio\)and\(black\).Forcomparisonwecomputethepredictedprobabilityofdenialfortwohypotheticalapplicantsthatdifferinraceandhavea\(P/I\ratio\)of\(0.3\). #1.computepredictionsforP/Iratio=0.3 predictions12 #>0.074851430.22414592 #2.Computedifferenceinprobabilities diff(predictions) #>2 #>0.1492945 Wefindthatthewhiteapplicantfacesadenialprobabilityofonly\(7.5\%\),whiletheAfricanAmericanisrejectedwithaprobabilityof\(22.4\%\),adifferenceof\(14.9\%\). ComparisonoftheModels TheProbitmodelandtheLogitmodeldeliveronlyapproximationstotheunknownpopulationregressionfunction\(E(Y\vertX)\).Itisnotobvioushowtodecidewhichmodeltouseinpractice.Thelinearprobabilitymodelhasthecleardrawbackofnotbeingabletocapturethenonlinearnatureofthepopulationregressionfunctionanditmaypredictprobabilitiestolieoutsidetheinterval\([0,1]\).ProbitandLogitmodelsarehardertointerpretbutcapturethenonlinearitiesbetterthanthelinearapproach:bothmodelsproducepredictionsofprobabilitiesthatlieinsidetheinterval\([0,1]\).Predictionsofallthreemodelsareoftenclosetoeachother.Thebooksuggeststousethemethodthatiseasiesttouseinthestatisticalsoftwareofchoice.Aswehaveseen,itisequallyeasytoestimateProbitandLogitmodelusingR.Wecanthereforegivenogeneralrecommendationwhichmethodtouse.
延伸文章資訊
- 1多元概率比回歸模型 - MBA智库百科
Probit模型和Logit模型的思路很相似,但在具體的計算方法和假設前提上又有一定的差異,主要體現在三個方面:. 一是假設前提不同,Logit不需要嚴格的假設條件,而Probit則 ...
- 2Logit/Probit Model - RPubs
3 Logit 模型的基本原理. 4 最大概似法(MLE)原理. 4.1 最大概似法求解. 4.1.1 常態分佈. 4.2 二元 ...
- 3Probit and Logit Models - Econometrics Academy - Google Sites
Probit and logit models are among the most popular models. The dependent variable is a binary res...
- 411.2 Probit and Logit Regression
Probit and Logit models are harder to interpret but capture the nonlinearities better than the li...
- 5第9 章Binary choice model | 數量方法(一) - Bookdown
若樣本數有500個,其中y=0 y = 0 的有30個,請問在Probit和Logit模型下, L0 L 0 為多少? 9.5 邊際效果. 迴歸模型: ...