Image Recognition Guide | Fritz AI

文章推薦指數: 80 %
投票人數:10人

Image recognition is a computer vision technique that allows machines to interpret and categorize what they “see” in images or videos. Often referred to as ... FritzAIImageRecognitionGuideAlmosteverythingyouneedtoknowabouthowimagerecognitionworks IntroductionImagerecognitionisacomputervisiontechniquethatallowsmachinestointerpretandcategorizewhatthey“see”inimagesorvideos.Oftenreferredtoas“imageclassification”or“imagelabeling”,thiscoretaskisafoundationalcomponentinsolvingmanycomputervision-basedmachinelearningproblems.Buthowdoesimagerecognitionactuallywork?Whatarethedifferentapproaches,whatareitspotentialbenefitsandlimitations,andhowmightyouuseitinyourbusiness?Inthisguide,you’llfindanswerstoallofthosequestionsandmore.Whetheryou’reanexperiencedmachinelearningengineerconsideringimplementation,adeveloperwantingtolearnmore,oraproductmanagerlookingtoexplorewhat’spossiblewithcomputervisionandimagerecognition,thisguideisforyou.Here’salookatwhatwe’llcover:Part1:Imagerecognition–thebasicsWhatisimagerecognition?ModesandtypesofimagerecognitionWhyisimagerecognitionimportant?Part2:Howdoesimagerecognitionwork?InputsandoutputsBasicstructureModelarchitectureoverviewHowimagerecognitionworksontheedgePart3:UsecasesandapplicationsVisualsearchImageorganizationContentmoderationAccessibilityPart4:ResourcesGettingstartedTutorialsLiteraturereviewDatasetsavailablePart1:Imagerecognition–thebasicsWhatisimagerecognition?Imagerecognitionisacomputervisiontaskthatworkstoidentifyandcategorizevariouselementsofimagesand/orvideos.Imagerecognitionmodelsaretrainedtotakeanimageasinputandoutputoneormorelabelsdescribingtheimage.Thesetofpossibleoutputlabelsarereferredtoastargetclasses.Alongwithapredictedclass,imagerecognitionmodelsmayalsooutputaconfidencescorerelatedtohowcertainthemodelisthatanimagebelongstoaclass.Forinstance,ifyouwantedtobuildanimagerecognitionmodelthatautomaticallydeterminedwhetherornotadogwasinagivenimage,thepipelinewould,broadlyspeaking,looklikethis:Imagerecognitionmodeltrainedonimagesthathavebeenlabeledas“dog”or“notdog”Modelinput:ImageorvideoframeModeloutput:Classname(i.e.dog)withaconfidencescorethatindicatesthelikelihoodofthatimagecontainingthatclassofobject.Here’swhatthislookslikeinpractice:ModesandtypesofimagerecognitionImagerecognitionisabroadandwide-rangingcomputervisiontaskthat’srelatedtothemoregeneralproblemofpatternrecognition.Assuch,thereareanumberofkeydistinctionsthatneedtobemadewhenconsideringwhatsolutionisbestfortheproblemyou’refacing.Broadlyspeaking,wecanbreakimagerecognitionintotwoseparateproblems:singleandmulticlassrecognition.Insingleclassimagerecognition,modelspredictonlyonelabelperimage.Ifyou’retrainingadogorcatrecognitionmodel,apicturewithadogandacatwillstillonlybeassignedasinglelabel.Incaseswhereonlytwoclassesareinvolved(dog;nodog),werefertothesemodelsasbinaryclassifiers.Multiclassrecognitionmodelscanassignseverallabelstoanimage.Animagewithacatandadogcanhaveonelabelforeach.Multiclassmodelstypicallyoutputaconfidencescoreforeachpossibleclass,describingtheprobabilitythattheimagebelongstothatclass.Whilethereareanumberoftraditionalstatisticalapproachestoimagerecognition(linearclassifiers,Bayesianclassification,supportvectormachines,decisiontrees,etc.),thisguidewillfocusonimagerecognitiontechniquesthatemployneuralnetworks,asthosehavebecomethestate-of-the-artapproachestoimagerecognition.Whyisimagerecognitionimportant?Imagerecognitionisoneofthemostfoundationalandwidely-applicablecomputervisiontasks.Recognizingimagepatternsandextractingfeaturesisabuildingblockofother,morecomplexcomputervisiontechniques(i.e.objectdetection,imagesegmentation,etc.),butitalsohasnumerousstandaloneapplicationsthatmakeitanessentialmachinelearningtask.Imagerecognition’sbroadandhighly-generalizablefunctionalitycanenableanumberoftransformativeuserexperiences,includingbutnotlimitedto:AutomatedimageorganizationUser-generatedcontentmoderationEnhancedvisualsearchAutomatedphotoandvideotaggingInteractivemarketing/CreativecampaignsOfcourse,thisisn’tanexhaustivelist,butitincludessomeoftheprimarywaysinwhichimagerecognitionisshapingourfuture.BacktotopPart2:Howdoesimagerecognitionwork?Nowthatweknowabitaboutwhatimagerecognitionis,thedistinctionsbetweendifferenttypesofimagerecognition,andwhatitcanbeusedfor,let’sexploreinmoredepthhowitactuallyworks.Inthissection,we’lllookatseveraldeeplearning-basedapproachestoimagerecognitionandassesstheiradvantagesandlimitations.Whilethereareanumberoftraditionalmethods—includingtheonesmentionedabove—forthepurposesofthisoverview,we’regoingtolookattheapproachesthatuseneuralnetworks,whichhavebecomethestate-of-the-artmethodsforimagerecognition.PopularimagerecognitionbenchmarkdatasetsincludeCIFAR,ImageNet,COCO,andOpenImages.Thoughmanyofthesedatasetsareusedinacademicresearchcontexts,theyaren’talwaysrepresentativeofimagesfoundinthewild.Assuch,youshouldalwaysbecarefulwhengeneralizingmodelstrainedonthem.Forexample,afull3%ofimageswithintheCOCOdatasetcontainsatoilet.BasicstructureIngeneral,deeplearningarchitecturessuitableforimagerecognitionarebasedonvariationsofconvolutionalneuralnetworks(CNNs).ForagentleintroductiontoCNNs,checkoutthisoverview.Nearlyallimagerecognitionmodelsbeginwithanencoder.Encodersaremadeupofblocksoflayersthatlearnstatisticalpatternsinthepixelsofimagesthatcorrespondtothelabelsthey’reattemptingtopredict.Highperformingencoderdesignsfeaturingmanynarrowingblocksstackedontopofeachotherprovidethe“deep”in“deepneuralnetworks”.Thespecificarrangementoftheseblocksanddifferentlayertypesthey’reconstructedfromwillbecoveredinlatersections.Theencoderisthentypicallyconnectedtoafullyconnectedordenselayerthatoutputsconfidencescoresforeachpossiblelabel.It’simportanttonoteherethatimagerecognitionmodelsoutputaconfidencescoreforeverylabelandinputimage.Inthecaseofsingle-classimagerecognition,wegetasinglepredictionbychoosingthelabelwiththehighestconfidencescore.Inthecaseofmulti-classrecognition,finallabelsareassignedonlyiftheconfidencescoreforeachlabelisoveraparticularthreshold.Finally,anoteaboutaccuracy.Mostimagerecognitionmodelsarebenchmarkedusingcommonaccuracymetricsoncommondatasets.Top-1accuracyreferstothefractionofimagesforwhichthemodeloutputclasswiththehighestconfidencescoreisequaltothetruelabeloftheimage.Top-5accuracyreferstothefractionofimagesforwhichthetruelabelfallsinthesetofmodeloutputswiththetop5highestconfidencescores.ModelarchitectureoverviewManyneuralnetworkarchitecturesexistforimagerecognition.Giventhesimplicityofthetask,it’scommonfornewneuralnetworkarchitecturestobetestedonimagerecognitionproblemsandthenappliedtootherareas,likeobjectdetectionorimagesegmentation.Thissectionwillcoverafewmajorneuralnetworkarchitecturesdevelopedovertheyears.AlexNetAlexNet,namedafteritscreator,wasadeepneuralnetworkthatwontheImageNetclassificationchallengein2012byahugemargin.Thoughitwasn’tthefirstconvolutionneuralnetworktobeusedforimagerecognitionorevenwinthisparticularchallenge,it’swidelycreditedwithsparkingaresurgenceofinterestinusingdeepconvolutionalneuralnetworkstosolvecomputervisionproblems.Thenetwork,however,isrelativelylarge,withover60millionparametersandmanyinternalconnections,thankstodenselayersthatmakethenetworkquiteslowtoruninpractice.VGGNetTwoyearsafterAlexNet,researchersfromtheVisualGeometryGroup(VGG)atOxfordUniversitydevelopedanewneuralnetworkarchitecturedubbedVGGNet.VGGNethasmoreconvolutionblocksthanAlexNet,makingit“deeper”,anditcomesin16and19layervarieties,referredtoasVGG16andVGG19,respectively.ThedeepernetworkstructureimprovedaccuracybutalsodoubleditssizeandincreasedruntimescomparedtoAlexNet.Despitethesize,VGGarchitecturesremainapopularchoiceforserver-sidecomputervisionmodelsduetotheirusefulnessintransferlearning.VGGarchitectureshavealsobeenfoundtolearnhierarchicalelementsofimagesliketextureandcontent,makingthempopularchoicesfortrainingstyletransfermodels.InceptionTheInceptionarchitecture,alsoreferredtoasGoogLeNet,wasdevelopedtosolvesomeoftheperformanceproblemswithVGGnetworks.Thoughaccurate,VGGnetworksareverylargeandrequirehugeamountsofcomputeandmemoryduetotheirmanydenselyconnectedlayers.TheInceptionarchitecturesolvesthisproblembyintroducingablockoflayersthatapproximatesthesedenseconnectionswithmoresparse,computationally-efficientcalculations.InceptionnetworkswereabletoachievecomparableaccuracytoVGGusingonlyonetenththenumberofparameters.ResNetThesuccessofAlexNetandVGGNetopenedthefloodgatesofdeeplearningresearch.Asarchitecturesgotlargerandnetworksgotdeeper,however,problemsstartedtoariseduringtraining.Whennetworksgottoodeep,trainingcouldbecomeunstableandbreakdowncompletely.ResNets,shortforresidualnetworks,solvedthisproblemwithacleverbitofarchitecture.Blocksoflayersaresplitintotwopaths,withoneundergoingmoreoperationsthantheother,beforebotharemergedbacktogether.Inthisway,somepathsthroughthenetworkaredeepwhileothersarenot,makingthetrainingprocessmuchmorestableoverall.ThemostcommonvariantofResNetisResNet50,containing50layers,butlargervariantscanhaveover100layers.Theresidualblockshavealsomadetheirwayintomanyotherarchitecturesthatdon’texplicitlybeartheResNetname.SqueezeNetEventhesmallestnetworkarchitecturediscussedthusfarstillhasmillionsofparametersandoccupiesdozensorhundredsofmegabytesofspace.SqueezeNetwasdesignedtoprioritizespeedandsizewhile,quiteastoundingly,givinguplittlegroundinaccuracy.Despitebeing50to500XsmallerthanAlexNet(dependingonthelevelofcompression),SqueezeNetachievessimilarlevelsofaccuracyasAlexNet.Thisfeatispossiblethankstoacombinationofresidual-likelayerblocksandcarefulattentiontothesizeandshapeofconvolutions.SqueezeNetisagreatchoiceforanyonetrainingamodelwithlimitedcomputeresourcesorfordeploymentonembeddedoredgedevices.MobileNetTheMobileNetarchitecturesweredevelopedbyGooglewiththeexplicitpurposeofidentifyingneuralnetworkssuitableformobiledevicessuchassmartphonesortablets.They’retypicallylargerthanSqueezeNet,butachievehigheraccuracy.MobileNetarchitecturesbroughttwoimportantinnovationstonetworkdesigns:depthwiseseparableconvolutionsandahyperparameterknownasawidthmultiplier.Depthwiseseparableconvolutionsareareplacementfortraditionalconvolutionlayers,havingfewerparametersandbeingmorecomputationallyefficient.Thewidthmultiplierisaparameterthatcontrolshowmanyparametersareusedforeachconvolutionlayer.Thisallowsforthecreationofmultiplenetworksalongatradeoffcurveofsizeandspeedversusaccuracy.Acontinuumofmodelscanbecreatedwiththesamebasicarchitecturesothatmorepowerfuldevicescanreceivelarger,moreaccuratemodels,whilelesspowerfuldevicescanusesmaller,lessaccuratemodels.NeuralArchitectureSearchFormuchofthelastdecade,newstate-of-the-artresultswereaccompaniedbyanewnetworkarchitecturewithitsownclevername.Incertaincases,it’sclearthatsomelevelofintuitivedeductioncanleadapersontoaneuralnetworkarchitecturethataccomplishesaspecificgoal.Aswithmanytasksthatrelyonhumanintuitionandexperimentation,however,someoneeventuallyaskedifamachinecoulddoitbetter.Neuralarchitecturesearch(NAS)usesoptimizationtechniquestoautomatetheprocessofneuralnetworkdesign.Givenagoal(e.gmodelaccuracy)andconstraints(networksizeorruntime),thesemethodsrearrangecomposibleblocksoflayerstoformnewarchitecturesneverbeforetested.ThoughNAShasfoundnewarchitecturesthatbeatouttheirhuman-designedpeers,theprocessisincrediblycomputationallyexpensive,aseachnewvariantneedstobetrained.It’sestimatedthatsomepapersreleasedbyGooglewouldcostmillionsofdollarstoreplicateduetothecomputerequired.Forallthiseffort,ithasbeenshownthatrandomarchitecturesearchproducesresultsthatareatleastcompetitivewithNAS.Onefinalfacttokeepinmindisthatthenetworkarchitecturesdiscoveredbyallofthesetechniquestypicallydon’tlookanythinglikethosedesignedbyhumans.Foralltheintuitionthathasgoneintobespokearchitectures,itdoesn’tappearthatthere’sanyuniversaltruthinthem.Theysimplyworkand,inmanycases,that’senough.HowimagerecognitionworksontheedgeIfyourusecaserequiresthatimagerecognitionworkinreal-time,withoutinternetconnectivity,oronprivatedata,youmightbeconsideringrunningyourimagerecognitionmodeldirectlyonanedgedevicelikeamobilephoneorIoTboard.Inthosecases,you’llneedtochoosespecificmodelarchitecturestomakesureeverythingrunssmoothlyontheselowerpowerdevices.Hereareafewtipsandtrickstoensureyourmodelsarereadyforedgedeployment:Pruneyournetworktoincludefewerconvolutionblocks.Mostpapersusenetworkarchitecturesthatarenotconstrainedbycomputeormemoryresources.Thisleadstonetworkswithfarmorelayersandparametersthanarerequiredtogenerategoodpredictions.Addawidthmultipliertoyourmodelsoyoucanadjustthenumberofparametersinyournetworktomeetyourcomputationandmemoryconstraints.Thenumberoffiltersinaconvolutionlayer,forexample,greatlyimpactstheoverallsizeofyourmodel.Manypapersandopen-sourceimplementationswilltreatthisnumberasafixedconstant,butmostofthesemodelswereneverintendedformobileuse.Addingaparameterthatmultipliesthebasenumberoffiltersbyaconstantfractionallowsyoutomodulatethemodelarchitecturetofittheconstraintsofyourdevice.Forsometasks,youcancreatemuch,muchsmallernetworksthatperformjustaswellaslargeones.Shrinkmodelswithquantization,butbewareofaccuracydrops.Quantizingmodelweightscansaveabunchofspace,oftenreducingthesizeofamodelbyafactorof4ormore.However,accuracywillsuffer.Makesureyoutestquantizedmodelsrigorouslytodetermineiftheymeetyourneeds.Inputandoutputsizescanbesmallerthanyouthink!Ifyou’redesigningaphotoorganizationapp,it’stemptingtothinkthatyourimagerecognitionmodelneedstobeabletoacceptfullresolutionphotosasaninput.Inmostcases,edgedeviceswon’thavenearlyenoughprocessingpowertohandlethis.Instead,it’scommontotrainimagerecognitionmodelsatmodestresolutions,thendownscaleinputimagesatruntime.Toseejusthowsmallyoucanmakethesenetworkswithgoodresults,checkoutthispostoncreatingatinyimagerecognitionmodelformobiledevices.BacktotopPart3:UsecasesandapplicationsInthissection,we’llprovideanoverviewofreal-worldusecasesforimagerecognition.We’vementionedseveraloftheminprevioussections,butherewe’lldiveabitdeeperandexploretheimpactthiscomputervisiontechniquecanhaveacrossindustries.Specifically,we’llexaminehowimagerecognitioncanbeusedinthefollowingareas:VisualsearchImageorganizationContentmoderationAccessibilityVisualsearchBroadlyspeaking,visualsearchistheprocessofusingreal-worldimagestoproducemorereliable,accurateonlinesearches.Visualsearchallowsretailerstosuggestitemsthatthematically,stylistically,orotherwiserelatetoagivenshopper’sbehaviorsandinterests.Usingadeeplearningapproachtoimagerecognitionallowsretailerstomoreefficientlyunderstandthecontentandcontextoftheseimages,thusallowingforthereturnofhighly-personalizedandresponsivelistsofrelatedresults.Thesetechniquesarealreadybeingusedbymajorretailers(eBay,ASOS,NeimannMarcus),techgiants(GoogleLens),andsocialmediacompanies(PinterestLens),andthoughtheseapproachesarestill(relativelyspeaking)intheirinfancy,theresultsarecompelling:55%ofconsumerssayVisualSearchisinstrumentalindevelopingtheirstyleandtaste.The“GlobalVisualSearchMarket”isestimatedtosurpass$14.7billionby2023.Whenshoppingonlineforclothingorfurniture,morethan85%ofrespondentsrespectivelyputmoreweightonvisualinfothantextinfo.ImageorganizationWithmodernsmartphonecameratechnology,it’sbecomeincrediblyeasyandfasttosnapcountlessphotosandcapturehigh-qualityvideos.However,withhighervolumesofcontent,anotherchallengearises—creatingsmarter,moreefficientwaystoorganizethatcontent.WithML-poweredimagerecognition,photosandcapturedvideocanmoreeasilyandefficientlybeorganizedintocategoriesthatcanleadtobetteraccessibility,improvedsearchanddiscovery,seamlesscontentsharing,andmore.GooglePhotosalreadyemploysthisfunctionality,helpingusersorganizephotosbyplaces,objectswithinthosephotos,people,andmore—allwithoutrequiringanymanualtagging.Manyofthecurrentapplicationsofautomatedimageorganization(includingGooglePhotosandFacebook),alsoemployfacialrecognition,whichisaspecifictaskwithintheimagerecognitiondomain.ContentmoderationManyofthemostdynamicsocialmediaandcontentsharingcommunitiesexistbecauseofreliableandauthenticstreamsofuser-generatedcontent(USG).ButwhenahighvolumeofUSGisanecessarycomponentofagivenplatformorcommunity,aparticularchallengepresentsitself—verifyingandmoderatingthatcontenttoensureitadherestoplatform/communitystandards.Tobetterexemplifytheimportanceofthisneedforcontentmoderation,let’sconsidertheexampleofOneBite,anonlinecommunitythatreliesheavilyonuser-generatedpizzareviewstopoweritsnetworkofmorethan100,000restaurantsandestablishmentsacrosstheUnitedStates.ManuallyreviewingthisvolumeofUSGisunrealisticandwouldcauselargebottlenecksofcontentqueuedforrelease.Toensurethatthecontentbeingsubmittedfromusersacrossthecountryactuallycontainsreviewsofpizza,theOneBiteteamturnedtoon-deviceimagerecognitiontohelpautomatethecontentmoderationprocess.Tosubmitareview,usersmusttakeandsubmitanaccompanyingphotooftheirpie.Anyirregularities(oranyimagesthatdon’tincludeapizza)arethenpassedalongforhumanreview.Thiskindofautomatedcontentmoderationcanbeanessentialtoolinmoreeffectivelyensuringthatcommunityspacesarefocused,safe,andfulfillingtheirintendedpurposes—allofwhichismadepossiblewiththehelpofAI-poweredimagerecognition.AccessibilityOneofthemorepromisingapplicationsofautomatedimagerecognitionisincreatingvisualcontentthat’smoreaccessibletoindividualswithvisualimpairments.Providingalternativesensoryinformation(soundortouch,generally)isonewaytocreatemoreaccessibleapplicationsandexperiencesusingimagerecognition.Facebookwasanearlyadopterofthistechnologicaladvancement.In2016,theyintroducedautomaticalternativetexttotheirmobileapp,whichusesdeeplearning-basedimagerecognitiontoallowuserswithvisualimpairmentstohearalistofitemsthatmaybeshowninagivenphoto.Similarly,appslikeAipolyandSeeingAIemployAI-poweredimagerecognitiontoolsthathelpusersfindcommonobjects,translatetextintospeech,describescenes,andmore.Andbecausethere’saneedforreal-timeprocessingandusabilityinareaswithoutreliableinternetconnections,theseapps(andotherslikeit)relyonon-deviceimagerecognitiontocreateauthenticallyaccessibleexperiences.BacktotopPart4:ResourcesforimagerecognitionWehopetheaboveoverviewwashelpfulinunderstandingthebasicsofimagerecognitionandhowitcanbeusedintherealworld.Butwithallthings,moreanswersleadtomorequestions.Thisfinalsectionwillprovideaseriesoforganizedresourcestohelpyoutakethenextstepinlearningallthereistoknowaboutimagerecognition.Asareminder,imagerecognitionisalsocommonlyreferredtoasimageclassificationorimagelabeling.Intheinterestofkeepingthislistrelativelyaccessible,we’vecuratedourtopresourcesforeachofthefollowingareas:GettingstartedTutorialsLiteraturereviewAvailabledatasetsGettingstartedBeginner’sGuide:ImageRecognitionandDeepLearningImageRecognitionDemystifiedHowImageRecognitionWorksTutorialsTheCompleteBeginner’sGuidetoDeepLearning:ConvolutionalNeuralNetworksandImageClassificationBasicsofImageClassificationwithPyTorchRecreatingDominos“PointsforPies”apponiOSwithon-deviceimagelabelingImageRecognitionwith10linesofcodeLiteraturereview[GitHub]AwesomeImageClassificationPaperswithCode:ImageClassificationASurveyonImageClassificationandActivityRecognitionusingDeepCNNArchitectureDatasetsavailableCOCOdatasetImageNetMNISTCIFAROpenImagesFashionMNISTImagerecognitiononmobileThebenefitsofusingimagerecognitionaren’tlimitedtoapplicationsthatrunonserversorinthecloud.Infact,imagerecognitionmodelscanbemadesmallandfastenoughtorundirectlyonmobiledevices,openinguparangeofpossibilities,includingbettersearchfunctionality,contentmoderation,improvedappaccessibility,andmuchmore.Frombrandloyalty,touserengagementandretention,andbeyond,implementingimagerecognitionon-devicehasthepotentialtodelightusersinnewandlastingways,allwhilereducingcloudcostsandkeepinguserdataprivate.FritzAIisthemachinelearningplatformthatmakesiteasytoteachdeviceshowtosee,hear,sense,andthink.TolearnmoreabouthowFritzAIcanhelpyoubuildvisualsearchexperienceswithimagerecognition,checkoutourImageRecognitionAPI(referredtoasimagelabeling).Formoreinspiration,checkoutourtutorialforrecreatingDominos“PointsforPies”imagerecognitionapponiOS.BacktotopFRITZAIPRODUCTSRESOURCESCOMPANYWeusethirdpartycookiesandscriptstoimprovethefunctionalityofthiswebsite.Accept



請為這篇文章評分?