What Is Image Recognition? | by Chris Kuo/Dr. Dataman

文章推薦指數: 80 %
投票人數:10人

A common example of image recognition is optical character recognition (OCR). A scanner can identify the characters in the image to convert the texts in an ... GetunlimitedaccessOpeninappHomeNotificationsListsStoriesWritePublishedinDatamaninAIWhatIsImageRecognition?Thefirstquestionyoumayhaveiswhatthedifferenceisbetweencomputervisionandimagerecognition.Indeed,computervisionhasbeenvigorouslydevelopedbyGoogle,AmazonandmanyAIdevelopers,andthetwoterms“computervision”and“imagerecognition”mayhavebeenusedinterchangeably.Computervision(CV)istoletacomputerimitatehumanvisionandtakeaction.Forexample,CVcanbedesignedtosensearunningchildontheroadandproducesawarningsignaltothedriver.Incontrast,imagerecognitionisaboutthepixelandpatternanalysisofanimagetorecognizetheimageasaparticularobject.Computervisionmeansitcan“dosomething”withrecognizedimages.BecauseinthispostIwilldescribethemachinelearningtechniquesforimagerecognition,Iwillstillusetheterm“imagerecognition”.Inthatarticle,IgiveagentleintroductiontotheimagedataandexplainwhytheConvolutionalAutoencodersisthepreferredmethodindealingwithimagedata.Ithoughtitishelpfultomentionthethreebroaddatacategories.Thethreedatacategoriesare(1)Multivariatedata(Incontrastwithserialdata),(2)Serialdata(includingtextandvoicestreamdata),and(3)Imagedata.Deeplearninghasthreebasicvariationstoaddresseachdatacategory:(1)thestandardfeedforwardneuralnetwork,(2)RNN/LSTM,and(3)ConvolutionalNN(CNN).Forreaderswhoarelookingfortutorialsforeachtype,youarerecommendedtocheck“ExplainingDeepLearninginaRegression-FriendlyWay”for(1),thecurrentarticle“ATechnicalGuideforRNN/LSTM/GRUonStockPricePrediction”for(2),and“DeepLearningwithPyTorchIsNotTorturing”,“WhatIsImageRecognition?“,“AnomalyDetectionwithAutoencodersMadeEasy”,and“ConvolutionalAutoencodersforImageNoiseReduction“for(3).Youcanbookmarkthesummaryarticle“DatamanLearningPaths—BuildYourSkills,DriveYourCareer”.Whatisimagerecognition?Justlikethephrase“What-you-see-is-what-you-get”says,humanbrainsmakevisioneasy.Itdoesn’ttakeanyeffortforhumanstotellapartadog,acat,oraflyingsaucer.Butthisprocessisquitehardforacomputertoimitate:theyonlyseemeasybecauseGoddesignsourbrainsincrediblywellinrecognizingimages.Acommonexampleofimagerecognitionisopticalcharacterrecognition(OCR).Ascannercanidentifythecharactersintheimagetoconvertthetextsinanimagetoatextfile.Withthesameprocess,OCRcanbeappliedtorecognizethetextofalicenseplateinanimage.Sinceyouareinterestedinimagerecognition,Iencourageyoutotakealookatthisinterestingvideo:Howdoesimagerecognitionwork?Howdowetrainacomputertotelloneimageapartfromanotherimage?Theprocessofanimagerecognitionmodelisnodifferentfromtheprocessofmachinelearningmodeling.IlistthemodelingprocessforimagerecognitioninSteps1through4.ModelingStep1:ExtractpixelfeaturesfromanimageFigure(A)First,agreatnumberofcharacteristics,calledfeaturesareextractedfromtheimage.Animageisactuallymadeof“pixels”,asshowninFigure(A).Eachpixelisrepresentedbyanumberorasetofnumbers—andtherangeofthesenumbersiscalledthecolordepth(orbitdepth).Inotherwords,thecolordepthindicatesthemaximumnumberofpotentialcolorsthatcanbeusedinanimage.Inan(8-bit)greyscaleimage(blackandwhite)eachpixelhasonevaluethatrangesfrom0to255.Mostimagestodayuse24-bitcolororhigher.AnRGBcolorimagemeansthecolorinapixelisacombinationofred,green,andblue.Eachofthecolorsrangesfrom0to255.ThisRGBcolorgeneratorshowshowanycolorcanbegeneratedbyRGB.SoapixelcontainsasetofthreevaluesRGB(102,255,102)referstocolor#66ff66.Animage800pixelwide,600pixelshighhas800x600=480,000pixels=0.48megapixels(“megapixel”is1millionpixels).Animagewitharesolutionof1024×768isagridwith1,024columnsand768rows,whichthereforecontains1,024×768=0.78megapixels.ModelingStep2:PreparelabeledimagestotrainthemodelFigure(B)Onceeachimageisconvertedtothousandsoffeatures,withtheknownlabelsoftheimageswecanusethemtotrainamodel.Figure(B)showsmanylabeledimagesthatbelongtodifferentcategoriessuchas“dog”or“fish”.Themoreimageswecanuseforeachcategory,thebetteramodelcanbetrainedtotellanimagewhetherisadogorafishimage.Herewealreadyknowthecategorythatanimagebelongstoandweusethemtotrainthemodel.Thisiscalledsupervisedmachinelearning.ModelingStep3:TrainthemodeltobeabletocategorizeimagesFigure(C)Figure(C)demonstrateshowamodelistrainedwiththepre-labeledimages.Thehugenetworksinthemiddlecanbeconsideredagiantfilter.Theimagesintheirextractedformsentertheinputsideandthelabelsareontheoutputside.Thepurposehereistotrainthenetworkssuchthatanimagewithitsfeaturescomingfromtheinputwillmatchthelabelontheright.ModelingStep4:Recognize(orpredict)anewimagetobeoneofthecategoriesOnceamodelistrained,itcanbeusedtorecognize(orpredict)anunknownimage.Figure(D)showsanewimageisrecognizedasadogimage.Noticethatthenewimagewillalsogothroughthepixelfeatureextractionprocess.ConvolutionNeuralNetworks—thealgorithmforimagerecognitionThenetworksinFigure(C)or(D)haveimpliedthepopularmodelsareneuralnetworkmodels.ConvolutionalNeuralNetworks(CNNsorConvNets)havebeenwidelyappliedinimageclassification,objectdetection,orimagerecognition.AgentleexplanationforConvolutionNeuralNetworksIwillusetheMNISThandwritingdigitimagestoexplainCNNs.TheMNISTimagesarefree-formblackandwhiteimagesforthenumbers0to9.Itiseasiertoexplaintheconceptwiththeblackandwhiteimagebecauseeachpixelhasonlyonevalue(from0to255)(notethatacolorimagehasthreevaluesineachpixel).ThenetworklayersofCNNsaredifferentfromthetypicalneuralnetworks.Therearefourtypesoflayers:theconvolution,theReLUs,thepooling,andthefullyconnectedlayers,asshowninFigure(E).Whatdoeseachofthefourtypesdo?Letmeexplain.ConvolutionlayerFigure(F)ThefirststepthatCNNsdoistocreatemanysmallpiecescalledfeatureslikethe2x2boxes.Tovisualizetheprocess,IusethreecolorstorepresentthethreefeaturesinFigure(F).Eachfeaturecharacterizessomeshapeoftheoriginalimage.Leteachfeaturescanthroughtheoriginalimage.Ifthereisaperfectmatch,thereisahighscoreinthatbox.Ifthereisalowmatchornomatch,thescoreisloworzero.Thisprocessinproducingthescoresiscalledfiltering.Figure(G)Figure(G)showsthethreefeatures.Eachfeatureproducesafilteredimagewithhighscoresandlowscoreswhenscanningthroughtheoriginalimage.Forexample,theredboxfoundfourareasintheoriginalimagethatshowaperfectmatchwiththefeature,soscoresarehighforthosefourareas.Thepinkboxesaretheareasthatmatchtosomeextent.Theactoftryingeverypossiblematchbyscanningthroughtheoriginalimageiscalledconvolution.Thefilteredimagesarestackedtogethertobecometheconvolutionlayer.2.ReLUslayerTheRectifiedLinearUnit(ReLU)isthestepthatisthesameasthestepinthetypicalneuralnetworks.Itrectifiesanynegativevaluetozerosoastoguaranteethemathwillbehavecorrectly.3.MaxPoolinglayerFigure(H)Poolingshrinkstheimagesize.InFigure(H)a2x2windowscansthrougheachofthefilteredimagesandassignsthemaxvalueofthat2x2windowtoa1x1boxinanewimage.AsillustratedintheFigure,themaximumvalueinthefirst2x2windowisahighscore(representedbyred),sothehighscoreisassignedtothe1x1box.The2x2boxmovestothesecondwindowwherethereisahighscore(red)andalowscore(pink),soahighscoreisassignedtothe1x1box.Afterpooling,anewstackofsmallerfilteredimagesisproduced.4.Fullyconnectedlayer(thefinallayer)Nowwesplitthesmallerfilteredimagesandstackthemintoasinglelist,asshowninFigure(I).Eachvalueinthesinglelistpredictsaprobabilityforeachofthefinalvalues1,2,…,and0.Thispartisthesameastheoutputlayerinthetypicalneuralnetworks.Inourexample,“2”receivesthehighesttotalscorefromallthenodesofthesinglelist.SoCNNrecognizestheoriginalhandwritingimageas“2”.WhatisthedifferencebetweenCNNsandthetypicalNNs?Thetypicalneuralnetworksstacktheoriginalimageintoalistandturnittobetheinputlayer.Theinformationbetweenneighboringpixelsmaynotberetained.Incontrast,CNN'sconstructstheconvolutionlayerthatretainstheinformationbetweenneighboringpixels.Isthereanypre-trainedCNNscodethatIcanuse?Yes.Ifyouareinterestedinlearningthecode,Kerashasseveralpre-trainedCNNsincludingXception,VGG16,VGG19,ResNet50,InceptionV3,InceptionResNetV2,MobileNet,DenseNet,NASNet,andMobileNetV2.It’sworthmentioningthislargeimagedatabaseImageNetthatyoucancontributetoordownloadforresearchpurposes.BusinessApplicationsImagerecognitionhaswideapplications.InthenextModule,Iwillshowyouhowimagerecognitioncanbeappliedtoclaimstohandleininsurance.MorefromDatamaninAIDataScience,MachineLearning,ArtificialIntelligenceReadmorefromDatamaninAIRecommendedfromMediumVincentinSogetiData|NetherlandsSelf-SupervisionandhowitchangesthewaywetrainAImodels.ManningPublications3DMedicalImageAnalysiswithPyTorchRaviPangaEndtoEndMachineLearningWorkflowonOracleAutonomousDataWarehouseRohanKumawatMachineLearningELI5ManojKumarinHeartbeat6BestReal-WorldApplicationsofReinforcementLearningRebeccaVickeryinTowardsDataScienceHowtoCreateYourFirstMachineLearningModelAbhilashMajumderCartoonizeimageswithCartoonGANDeJunHuangindejunhuangLearningDay70:3DU-Netwith3Dconvolutionlayers,V-Net,DenseNet,FC-DenseNetAboutHelpTermsPrivacyGettheMediumappGetstartedChrisKuo/Dr.Dataman3.3KFollowersTheDatamanarticlesaremyreflectionsondatascienceandteachingnotesatColumbiaUniversityhttps://sps.columbia.edu/faculty/chris-kuoFollowMorefromMediumTa-YingChenginTowardsDataScienceCreateNewAnimalsusingDCGANwithPyTorchRenuKhandelwalinAIGuysVisualizingDeepLearningModelArchitectureNaokiRoBERTa — RobustlyoptimizedBERTapproachSik-HoTsangReview — fastText:EnrichingWordVectorswithSubwordInformationHelpStatusWritersBlogCareersPrivacyTermsAboutKnowable



請為這篇文章評分?