What is Logistic regression? | IBM

文章推薦指數: 80 %
投票人數:10人

This type of statistical model (also known as logit model) is often used for classification and predictive analytics. Logistic regression estimates the ... Whatislogisticregression? Learnhowlogisticregressioncanhelpmakepredictionstoenhancedecision-making Whatislogisticregression? Thistypeofstatisticalmodel(alsoknownaslogitmodel)isoftenusedforclassificationandpredictiveanalytics.Logisticregressionestimatestheprobabilityofaneventoccurring,suchasvotedordidn’tvote,basedonagivendatasetofindependentvariables.Sincetheoutcomeisaprobability,thedependentvariableisboundedbetween0and1.Inlogisticregression,alogittransformationisappliedontheodds—thatis,theprobabilityofsuccessdividedbytheprobabilityoffailure.Thisisalsocommonlyknownasthelogodds,orthenaturallogarithmofodds,andthislogisticfunctionisrepresentedbythefollowingformulas: Logit(pi)=1/(1+exp(-pi)) ln(pi/(1-pi))=Beta_0+Beta_1*X_1+…+B_k*K_k Inthislogisticregressionequation,logit(pi)isthedependentorresponsevariableandxistheindependentvariable.Thebetaparameter,orcoefficient,inthismodeliscommonlyestimatedviamaximumlikelihoodestimation(MLE).Thismethodtestsdifferentvaluesofbetathroughmultipleiterationstooptimizeforthebestfitoflogodds.Alloftheseiterationsproducetheloglikelihoodfunction,andlogisticregressionseekstomaximizethisfunctiontofindthebestparameterestimate.Oncetheoptimalcoefficient(orcoefficientsifthereismorethanoneindependentvariable)isfound,theconditionalprobabilitiesforeachobservationcanbecalculated,logged,andsummedtogethertoyieldapredictedprobability.Forbinaryclassification,aprobabilitylessthan.5willpredict0whileaprobabilitygreaterthan0willpredict1. Afterthemodelhasbeencomputed,it’sbestpracticetoevaluatethehowwellthemodelpredictsthedependentvariable,whichiscalledgoodnessoffit.TheHosmer–Lemeshowtestisapopularmethodtoassessmodelfit. Interpretinglogisticregression   Logoddscanbedifficulttomakesenseofwithinalogisticregressiondataanalysis.Asaresult,exponentiatingthebetaestimatesiscommontotransformtheresultsintoanoddsratio(OR),easingtheinterpretationofresults.TheORrepresentstheoddsthatanoutcomewilloccurgivenaparticularevent,comparedtotheoddsoftheoutcomeoccurringintheabsenceofthatevent.IftheORisgreaterthan1,thentheeventisassociatedwithahigheroddsofgeneratingaspecificoutcome.Conversely,iftheORislessthan1,thentheeventisassociatedwithaloweroddsofthatoutcomeoccurring.Basedontheequationfromabove,theinterpretationofanoddsratiocanbedenotedasthefollowing:theoddsofasuccesschangesbyexp(cB_1)timesforeveryc-unitincreaseinx.Touseanexample,let’ssaythatweweretoestimatetheoddsofsurvivalontheTitanicgiventhatthepersonwasmale,andtheoddsratioformaleswas.0810.We’dinterprettheoddsratioastheoddsofsurvivalofmalesdecreasedbyafactorof.0810whencomparedtofemales,holdingallothervariablesconstant.     Readthewhitepaper(776KB)  Linearregressionvslogisticregression Bothlinearandlogisticregressionareamongthemostpopularmodelswithindatascience,andopen-sourcetools,likePythonandR,makethecomputationforthemquickandeasy. Linearregressionmodelsareusedtoidentifytherelationshipbetweenacontinuousdependentvariableandoneormoreindependentvariables.Whenthereisonlyoneindependentvariableandonedependentvariable,itisknownassimplelinearregression,butasthenumberofindependentvariablesincreases,itisreferredtoasmultiplelinearregression.Foreachtypeoflinearregression,itseekstoplotalineofbestfitthroughasetofdatapoints,whichistypicallycalculatedusingtheleastsquaresmethod. Similartolinearregression,logisticregressionisalsousedtoestimatetherelationshipbetweenadependentvariableandoneormoreindependentvariables,butitisusedtomakeapredictionaboutacategoricalvariableversusacontinuousone.Acategoricalvariablecanbetrueorfalse,yesorno,1or0,etcetera.Theunitofmeasurealsodiffersfromlinearregressionasitproducesaprobability,butthelogitfunctiontransformstheS-curveintostraightline.  Whilebothmodelsareusedinregressionanalysistomakepredictionsaboutfutureoutcomes,linearregressionistypicallyeasiertounderstand.Linearregressionalsodoesnotrequireaslargeofasamplesizeaslogisticregressionneedsanadequatesampletorepresentvaluesacrossalltheresponsecategories.Withoutalarger,representativesample,themodelmaynothavesufficientstatisticalpowertodetectasignificanteffect. Typesoflogisticregression Therearethreetypesoflogisticregressionmodels,whicharedefinedbasedoncategoricalresponse. Binarylogisticregression:Inthisapproach,theresponseordependentvariableisdichotomousinnature—i.e.ithasonlytwopossibleoutcomes(e.g.0or1).Somepopularexamplesofitsuseincludepredictingifane-mailisspamornotspamorifatumorismalignantornotmalignant.Withinlogisticregression,thisisthemostcommonlyusedapproach,andmoregenerally,itisoneofthemostcommonclassifiersforbinaryclassification. Multinomiallogisticregression:Inthistypeoflogisticregressionmodel,thedependentvariablehasthreeormorepossibleoutcomes;however,thesevalueshavenospecifiedorder. Forexample,moviestudioswanttopredictwhatgenreoffilmamoviegoerislikelytoseetomarketfilmsmoreeffectively.Amultinomiallogisticregressionmodelcanhelpthestudiotodeterminethestrengthofinfluenceaperson'sage,gender,anddatingstatusmayhaveonthetypeoffilmthattheyprefer.Thestudiocanthenorientanadvertisingcampaignofaspecificmovietowardagroupofpeoplelikelytogoseeit. Ordinallogisticregression:Thistypeoflogisticregressionmodelisleveragedwhentheresponsevariablehasthreeormorepossibleoutcome,butinthiscase,thesevaluesdohaveadefinedorder.ExamplesofordinalresponsesincludegradingscalesfromAtoForratingscalesfrom1to5.  Aglimpseinsidethemindofadatascientist(776KB)  Logisticregressionandmachinelearning Withinmachinelearning,logisticregressionbelongstothefamilyofsupervisedmachinelearningmodels.Itisalsoconsideredadiscriminativemodel,whichmeansthatitattemptstodistinguishbetweenclasses(orcategories).Unlikeagenerativealgorithm,suchasnaïvebayes,itcannot,asthenameimplies,generateinformation,suchasanimage,oftheclassthatitistryingtopredict(e.g.apictureofacat). Previously,wementionedhowlogisticregressionmaximizestheloglikelihoodfunctiontodeterminethebetacoefficientsofthemodel.Thischangesslightlyunderthecontextofmachinelearning.Withinmachinelearning,thenegativeloglikelihoodusedasthelossfunction,usingtheprocessofgradientdescenttofindtheglobalmaximum.Thisisjustanotherwaytoarriveatthesameestimationsdiscussedabove. Logisticregressioncanalsobepronetooverfitting,particularlywhenthereisahighnumberofpredictorvariableswithinthemodel.Regularizationistypicallyusedtopenalizeparameterslargecoefficientswhenthemodelsuffersfromhighdimensionality. Scikit-learn(linkresidesoutsideIBM)providesvaluabledocumentationtolearnmoreaboutthelogisticregressionmachinelearningmodel. Usecasesoflogisticregression Logisticregressioniscommonlyusedforpredictionand classificationproblems.Someoftheseusecasesinclude: Frauddetection:Logisticregressionmodelscanhelpteamsidentifydataanomalies,whicharepredictiveoffraud.Certainbehaviorsorcharacteristicsmayhaveahigherassociationwithfraudulentactivities,whichisparticularlyhelpfultobankingandotherfinancialinstitutionsinprotectingtheirclients.SaaS-basedcompanieshavealsostartedtoadoptthesepracticestoeliminatefakeuseraccountsfromtheirdatasetswhenconductingdataanalysisaroundbusinessperformance. Diseaseprediction:Inmedicine,thisanalyticsapproachcanbeusedtopredictthelikelihoodofdiseaseorillnessforagivenpopulation.Healthcareorganizationscansetuppreventativecareforindividualsthatshowhigherpropensityforspecificillnesses. Churnprediction:Specificbehaviorsmaybeindicativeofchurnindifferentfunctionsofanorganization.Forexample,humanresourcesandmanagementteamsmaywanttoknowiftherearehighperformerswithinthecompanywhoareatriskofleavingtheorganization;thistypeofinsightcanpromptconversationstounderstandproblemareaswithinthecompany,suchascultureorcompensation.Alternatively,thesalesorganizationmaywanttolearnwhichoftheirclientsareatriskoftakingtheirbusinesselsewhere.Thiscanpromptteamstosetuparetentionstrategytoavoidlostrevenue. Examplesoflogisticregressionsuccess Assesscreditrisk Binarylogisticregressioncanhelpbankersassesscreditrisk.Imaginethatyouarealoanofficeratabankandyouwanttoidentifycharacteristicsofpeoplewhoarelikelytodefaultonloans.Thenyouwanttousethosecharacteristicstoidentifygoodandbadcreditrisks.Youhavedataon850customers.Thefirst700arecustomerswhohavealreadyreceivedloans.Seehowyoucanusearandomsampleofthese700customerstocreatealogisticregressionmodelandclassifythe150remainingcustomersasgoodorbadrisks. Increaseprofitsinthebankingindustry FirstTennesseeBankboostedprofitabilitywithIBMSPSSsoftwareandachievedincreasesofupto600percentincross-salecampaigns.LeadersatthisregionalbankintheUSwantedtoapproachtherightcustomerswiththerightproductsandservices.Thereisnoshortageofdatatohelp,butitwasachallengetobridgethegapfromhavingdatatotakingaction.FirstTennesseeisusingpredictiveanalyticsandlogisticanalyticstechniqueswithinananalyticssolutiontogaingreaterinsightintoallofitsdata.Asaresult,decision-makingisimprovedtooptimizecustomerinteractions.(1MB) Relatedsolutions IBMSPSSAdvancedStatistics Reachmoreaccurateconclusionswhenanalyzingcomplexrelationshipsusingunivariateandmultivariatemodelingtechniques. ExploreSPSSAdvancedStatistics IBMSPSSModeler Drivereturnoninvestmentwithadrag-and-dropdatasciencetool. ExploreSPSSModeler IBMSPSSRegression Predictcategoricaloutcomesandapplyawiderangeofnonlinearregressionprocedures. ExploreSPSSRegression IBMWatsonStudio BuildandtrainAIandmachine-learningmodels,prepareandanalyzedata—allinaflexible,hybridcloudenvironment. ExploreWatsonStudio IBMWatsonDiscovery Getasmart,simplewaytomineandexploreallyourunstructureddatawithcognitiveexploration,powerfultextanalyticsandmachine-learningcapabilities. ExploreWatsonDiscovery Resources IBMSPSSStatistics14-dayfreetrial IBMSPSSStatisticsstatisticalanalysisdemo LearnmoreaboutIBMWatsonStudioLocal



請為這篇文章評分?