What is Logistic regression? | IBM
文章推薦指數: 80 %
This type of statistical model (also known as logit model) is often used for classification and predictive analytics. Logistic regression estimates the ... Whatislogisticregression? Learnhowlogisticregressioncanhelpmakepredictionstoenhancedecision-making Whatislogisticregression? Thistypeofstatisticalmodel(alsoknownaslogitmodel)isoftenusedforclassificationandpredictiveanalytics.Logisticregressionestimatestheprobabilityofaneventoccurring,suchasvotedordidn’tvote,basedonagivendatasetofindependentvariables.Sincetheoutcomeisaprobability,thedependentvariableisboundedbetween0and1.Inlogisticregression,alogittransformationisappliedontheodds—thatis,theprobabilityofsuccessdividedbytheprobabilityoffailure.Thisisalsocommonlyknownasthelogodds,orthenaturallogarithmofodds,andthislogisticfunctionisrepresentedbythefollowingformulas: Logit(pi)=1/(1+exp(-pi)) ln(pi/(1-pi))=Beta_0+Beta_1*X_1+…+B_k*K_k Inthislogisticregressionequation,logit(pi)isthedependentorresponsevariableandxistheindependentvariable.Thebetaparameter,orcoefficient,inthismodeliscommonlyestimatedviamaximumlikelihoodestimation(MLE).Thismethodtestsdifferentvaluesofbetathroughmultipleiterationstooptimizeforthebestfitoflogodds.Alloftheseiterationsproducetheloglikelihoodfunction,andlogisticregressionseekstomaximizethisfunctiontofindthebestparameterestimate.Oncetheoptimalcoefficient(orcoefficientsifthereismorethanoneindependentvariable)isfound,theconditionalprobabilitiesforeachobservationcanbecalculated,logged,andsummedtogethertoyieldapredictedprobability.Forbinaryclassification,aprobabilitylessthan.5willpredict0whileaprobabilitygreaterthan0willpredict1. Afterthemodelhasbeencomputed,it’sbestpracticetoevaluatethehowwellthemodelpredictsthedependentvariable,whichiscalledgoodnessoffit.TheHosmer–Lemeshowtestisapopularmethodtoassessmodelfit. Interpretinglogisticregression Logoddscanbedifficulttomakesenseofwithinalogisticregressiondataanalysis.Asaresult,exponentiatingthebetaestimatesiscommontotransformtheresultsintoanoddsratio(OR),easingtheinterpretationofresults.TheORrepresentstheoddsthatanoutcomewilloccurgivenaparticularevent,comparedtotheoddsoftheoutcomeoccurringintheabsenceofthatevent.IftheORisgreaterthan1,thentheeventisassociatedwithahigheroddsofgeneratingaspecificoutcome.Conversely,iftheORislessthan1,thentheeventisassociatedwithaloweroddsofthatoutcomeoccurring.Basedontheequationfromabove,theinterpretationofanoddsratiocanbedenotedasthefollowing:theoddsofasuccesschangesbyexp(cB_1)timesforeveryc-unitincreaseinx.Touseanexample,let’ssaythatweweretoestimatetheoddsofsurvivalontheTitanicgiventhatthepersonwasmale,andtheoddsratioformaleswas.0810.We’dinterprettheoddsratioastheoddsofsurvivalofmalesdecreasedbyafactorof.0810whencomparedtofemales,holdingallothervariablesconstant. Readthewhitepaper(776KB) Linearregressionvslogisticregression Bothlinearandlogisticregressionareamongthemostpopularmodelswithindatascience,andopen-sourcetools,likePythonandR,makethecomputationforthemquickandeasy. Linearregressionmodelsareusedtoidentifytherelationshipbetweenacontinuousdependentvariableandoneormoreindependentvariables.Whenthereisonlyoneindependentvariableandonedependentvariable,itisknownassimplelinearregression,butasthenumberofindependentvariablesincreases,itisreferredtoasmultiplelinearregression.Foreachtypeoflinearregression,itseekstoplotalineofbestfitthroughasetofdatapoints,whichistypicallycalculatedusingtheleastsquaresmethod. Similartolinearregression,logisticregressionisalsousedtoestimatetherelationshipbetweenadependentvariableandoneormoreindependentvariables,butitisusedtomakeapredictionaboutacategoricalvariableversusacontinuousone.Acategoricalvariablecanbetrueorfalse,yesorno,1or0,etcetera.Theunitofmeasurealsodiffersfromlinearregressionasitproducesaprobability,butthelogitfunctiontransformstheS-curveintostraightline. Whilebothmodelsareusedinregressionanalysistomakepredictionsaboutfutureoutcomes,linearregressionistypicallyeasiertounderstand.Linearregressionalsodoesnotrequireaslargeofasamplesizeaslogisticregressionneedsanadequatesampletorepresentvaluesacrossalltheresponsecategories.Withoutalarger,representativesample,themodelmaynothavesufficientstatisticalpowertodetectasignificanteffect. Typesoflogisticregression Therearethreetypesoflogisticregressionmodels,whicharedefinedbasedoncategoricalresponse. Binarylogisticregression:Inthisapproach,theresponseordependentvariableisdichotomousinnature—i.e.ithasonlytwopossibleoutcomes(e.g.0or1).Somepopularexamplesofitsuseincludepredictingifane-mailisspamornotspamorifatumorismalignantornotmalignant.Withinlogisticregression,thisisthemostcommonlyusedapproach,andmoregenerally,itisoneofthemostcommonclassifiersforbinaryclassification. Multinomiallogisticregression:Inthistypeoflogisticregressionmodel,thedependentvariablehasthreeormorepossibleoutcomes;however,thesevalueshavenospecifiedorder. Forexample,moviestudioswanttopredictwhatgenreoffilmamoviegoerislikelytoseetomarketfilmsmoreeffectively.Amultinomiallogisticregressionmodelcanhelpthestudiotodeterminethestrengthofinfluenceaperson'sage,gender,anddatingstatusmayhaveonthetypeoffilmthattheyprefer.Thestudiocanthenorientanadvertisingcampaignofaspecificmovietowardagroupofpeoplelikelytogoseeit. Ordinallogisticregression:Thistypeoflogisticregressionmodelisleveragedwhentheresponsevariablehasthreeormorepossibleoutcome,butinthiscase,thesevaluesdohaveadefinedorder.ExamplesofordinalresponsesincludegradingscalesfromAtoForratingscalesfrom1to5. Aglimpseinsidethemindofadatascientist(776KB) Logisticregressionandmachinelearning Withinmachinelearning,logisticregressionbelongstothefamilyofsupervisedmachinelearningmodels.Itisalsoconsideredadiscriminativemodel,whichmeansthatitattemptstodistinguishbetweenclasses(orcategories).Unlikeagenerativealgorithm,suchasnaïvebayes,itcannot,asthenameimplies,generateinformation,suchasanimage,oftheclassthatitistryingtopredict(e.g.apictureofacat). Previously,wementionedhowlogisticregressionmaximizestheloglikelihoodfunctiontodeterminethebetacoefficientsofthemodel.Thischangesslightlyunderthecontextofmachinelearning.Withinmachinelearning,thenegativeloglikelihoodusedasthelossfunction,usingtheprocessofgradientdescenttofindtheglobalmaximum.Thisisjustanotherwaytoarriveatthesameestimationsdiscussedabove. Logisticregressioncanalsobepronetooverfitting,particularlywhenthereisahighnumberofpredictorvariableswithinthemodel.Regularizationistypicallyusedtopenalizeparameterslargecoefficientswhenthemodelsuffersfromhighdimensionality. Scikit-learn(linkresidesoutsideIBM)providesvaluabledocumentationtolearnmoreaboutthelogisticregressionmachinelearningmodel. Usecasesoflogisticregression Logisticregressioniscommonlyusedforpredictionand classificationproblems.Someoftheseusecasesinclude: Frauddetection:Logisticregressionmodelscanhelpteamsidentifydataanomalies,whicharepredictiveoffraud.Certainbehaviorsorcharacteristicsmayhaveahigherassociationwithfraudulentactivities,whichisparticularlyhelpfultobankingandotherfinancialinstitutionsinprotectingtheirclients.SaaS-basedcompanieshavealsostartedtoadoptthesepracticestoeliminatefakeuseraccountsfromtheirdatasetswhenconductingdataanalysisaroundbusinessperformance. Diseaseprediction:Inmedicine,thisanalyticsapproachcanbeusedtopredictthelikelihoodofdiseaseorillnessforagivenpopulation.Healthcareorganizationscansetuppreventativecareforindividualsthatshowhigherpropensityforspecificillnesses. Churnprediction:Specificbehaviorsmaybeindicativeofchurnindifferentfunctionsofanorganization.Forexample,humanresourcesandmanagementteamsmaywanttoknowiftherearehighperformerswithinthecompanywhoareatriskofleavingtheorganization;thistypeofinsightcanpromptconversationstounderstandproblemareaswithinthecompany,suchascultureorcompensation.Alternatively,thesalesorganizationmaywanttolearnwhichoftheirclientsareatriskoftakingtheirbusinesselsewhere.Thiscanpromptteamstosetuparetentionstrategytoavoidlostrevenue. Examplesoflogisticregressionsuccess Assesscreditrisk Binarylogisticregressioncanhelpbankersassesscreditrisk.Imaginethatyouarealoanofficeratabankandyouwanttoidentifycharacteristicsofpeoplewhoarelikelytodefaultonloans.Thenyouwanttousethosecharacteristicstoidentifygoodandbadcreditrisks.Youhavedataon850customers.Thefirst700arecustomerswhohavealreadyreceivedloans.Seehowyoucanusearandomsampleofthese700customerstocreatealogisticregressionmodelandclassifythe150remainingcustomersasgoodorbadrisks. Increaseprofitsinthebankingindustry FirstTennesseeBankboostedprofitabilitywithIBMSPSSsoftwareandachievedincreasesofupto600percentincross-salecampaigns.LeadersatthisregionalbankintheUSwantedtoapproachtherightcustomerswiththerightproductsandservices.Thereisnoshortageofdatatohelp,butitwasachallengetobridgethegapfromhavingdatatotakingaction.FirstTennesseeisusingpredictiveanalyticsandlogisticanalyticstechniqueswithinananalyticssolutiontogaingreaterinsightintoallofitsdata.Asaresult,decision-makingisimprovedtooptimizecustomerinteractions.(1MB) Relatedsolutions IBMSPSSAdvancedStatistics Reachmoreaccurateconclusionswhenanalyzingcomplexrelationshipsusingunivariateandmultivariatemodelingtechniques. ExploreSPSSAdvancedStatistics IBMSPSSModeler Drivereturnoninvestmentwithadrag-and-dropdatasciencetool. ExploreSPSSModeler IBMSPSSRegression Predictcategoricaloutcomesandapplyawiderangeofnonlinearregressionprocedures. ExploreSPSSRegression IBMWatsonStudio BuildandtrainAIandmachine-learningmodels,prepareandanalyzedata—allinaflexible,hybridcloudenvironment. ExploreWatsonStudio IBMWatsonDiscovery Getasmart,simplewaytomineandexploreallyourunstructureddatawithcognitiveexploration,powerfultextanalyticsandmachine-learningcapabilities. ExploreWatsonDiscovery Resources IBMSPSSStatistics14-dayfreetrial IBMSPSSStatisticsstatisticalanalysisdemo LearnmoreaboutIBMWatsonStudioLocal
延伸文章資訊
- 1What is Logistic regression? | IBM
This type of statistical model (also known as logit model) is often used for classification and p...
- 2羅吉斯迴歸分析(Logistic regression, logit model) - 永析統計
羅吉斯迴歸分析(Logistic regression, logit model)-統計說明與SPSS操作 · 1. 模式係數的Omnibus測試:相當於線性迴歸裡的ANOVA-F檢定,探討羅吉...
- 3Logistic regression - Wikipedia
- 4Logit Model - an overview | ScienceDirect Topics
Logistic regression is a well-known procedure that can be used for classification. This is a vari...
- 5Logistic Regression 介紹
有兩個,則為二元的邏輯式迴歸(Binary logistic regression),若是類別超過三. 個以上則為Polytomous logistic regression,model 相對複...