{"668495":{"#nid":"668495","#data":{"type":"news","title":"Stress Test Method Detects When Object Recognition Models are Using Shortcuts ","body":[{"value":"\u003Cp\u003EA new \u201cstress test\u201d method created by a Georgia Tech researcher allows programmers to more easily determine if trained visual recognition models are sensitive to input changes or rely too heavily on context clues to perform their tasks.\u003C\/p\u003E\r\n\r\n\u003Cp\u003EViraj Prabhu, a Ph.D. student in Georgia Tech\u2019s School of Interactive Computing, introduced the LANCE (Language-Guided Counterfactuals) method in a recent research paper that shows how deep object recognition models are prone to taking shortcuts through context clues to produce images.\u003C\/p\u003E\r\n\r\n\u003Cp\u003EIdeally, models should understand exactly what they\u2019re prompted to search for, Prabhu said, but because of spurious correlation, they tend to use irrelevant information in images as they make predictions.\u003C\/p\u003E\r\n\r\n\u003Cp\u003EPrabhu used LANCE to stress test well-known models that have been trained on the image database ImageNet. Working with Assistant Professor Judy Hoffman and co-authors Sriram Yenamandra and Prithvijit Chattopadhyay, he discovered many instances in which the models were overly reliant on context in the images they produced.\u003C\/p\u003E\r\n\r\n\u003Cp\u003EIn some examples, the models showed they were using weather in the background to classify images rather than recognizing the object of interest.\u003C\/p\u003E\r\n\r\n\u003Cp\u003EOn another stress test, Prabhu challenged the models to classify images with seatbelts. All the test images contained seatbelts inside cars. When Prabhu generated new images by changing the parameters to \u201cseatbelts on a bus,\u201d the performance and accuracy of the trained models dropped. This suggested the models thought seat belts were exclusive to cars.\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u201cWhen a model is getting something right, is it getting it right because it really understands it, or is it picking up on some context clues and relying on them?\u201d Prabhu said.\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u201cThere is no reason why it should be relying on what kind of vehicle it is to know whether there is a seatbelt, but models often do this. It\u2019s more generally known as model bias or a spurious correlation problem.\u201d\u003C\/p\u003E\r\n\r\n\u003Cp\u003EThe models displayed the same flaws when Prabhu used LANCE to test images for dog sleds. The models almost exclusively associated dog sleds with Huskies, leading them to focus their searches on the breed most associated with sleds.\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u003Cimg alt=\u0022Three students working together\u0022 height=\u0022567\u0022 src=\u0022https:\/\/www.cc.gatech.edu\/sites\/default\/files\/images\/general\/2023\/208A9981.jpg\u0022 width=\u0022850\u0022 \/\u003E\u003C\/p\u003E\r\n\r\n\u003Cp\u003EFrom left to right, Sriram Yenamandra, Viraj Prabhu, and Prithvijit Chattopadhyay, discuss their LANCE method for detecting input changes that deep object recognition models are sensitive to. Photos by Kevin Beasley\/College of Computing.\u003C\/p\u003E\r\n\r\n\u003Cp\u003EPrabhu said the prompts given to the models were generated by finetuning LLaMA, a large-language model created by Meta AI, while using training data automatically generated by Open AI\u2019s ChatGPT. For an image of someone riding a bike, he generated a caption using an automated captioning system. Then, he used the finetuned LLaMA to make a structured change to the caption, only changing a single concept at a time.\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u201cIt would change \u2018person riding a bicycle\u2019 to \u2018person carrying a bicycle,\u2019 and then we pass it to the generative model and use it to generate a new image while changing nothing else,\u201d he said. \u201cUsing a recently introduced targeted editing technique from Google Research based on prompt-to-prompt tuning, we can now change only the relationship between the person and bicycle. Then we get an image of a person carrying a bicycle, with everything else being the same. Now we can use this as a counterfactual test image.\u201d\u003C\/p\u003E\r\n\r\n\u003Cp\u003EThat allows Prabhu to compare the model\u2019s new prediction to the original. If the prediction has changed, it\u2019s likely the model is relying on spurious correlations.\u003C\/p\u003E\r\n\r\n\u003Cp\u003EPrabhu said the LANCE method can be applied at scale for any new data set.\u003C\/p\u003E\r\n\r\n\u003Cp\u003ESpurious correlation has been a known weak link for deep learning models, but Prabhu said the benefit of LANCE is that it allows programmers to probe their models for those weaknesses before deployment.\u003C\/p\u003E\r\n\r\n\u003Cp\u003ETraditionally, these models are trained through goal-oriented methods in which the models receive points for displaying the correct image and lose points for getting them wrong. Prabhu said that\u2019s the most likely reason why the artificial intelligence in the models tries to find shortcuts, like using contextual clues, to achieve their goals.\u003C\/p\u003E\r\n\r\n\u003Cp\u003EThe implications also expand beyond diagnosing object recognition models trained on ImageNet. LANCE can be applied to computer vision technology used in self-driving vehicles, which need to be as foolproof as possible before they\u2019re deployed on the road.\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u201cIn high-stakes applications like self-driving, people are using discriminative approaches \u2014 you have an object detection system that can detect cars and pedestrians and draw boxes around them,\u201d Prabhu said. \u201cUsing LANCE, we can probe these discriminative models using generative approaches and make them better. The hope is we can discover failures before they happen.\u201d\u003C\/p\u003E\r\n","summary":"","format":"limited_html"}],"field_subtitle":"","field_summary":[{"value":"\u003Cp\u003ENew research from Georgia Tech\u0027s School of Interactive Computing illustrates how deep object recognition models can use irrelevant information in images as they make predictions.\u003C\/p\u003E\r\n","format":"limited_html"}],"field_summary_sentence":[{"value":"New research illustrates how deep object recognition models can use irrelevant information in images as they make predictions."}],"uid":"32045","created_gmt":"2023-07-17 18:36:17","changed_gmt":"2023-07-17 18:51:35","author":"Ben Snedeker","boilerplate_text":"","field_publication":"","field_article_url":"","dateline":{"date":"2023-07-17T00:00:00-04:00","iso_date":"2023-07-17T00:00:00-04:00","tz":"America\/New_York"},"extras":[],"groups":[{"id":"1188","name":"Research Horizons"},{"id":"50876","name":"School of Interactive Computing"}],"categories":[],"keywords":[{"id":"187915","name":"go-researchnews"}],"core_research_areas":[],"news_room_topics":[],"event_categories":[],"invited_audience":[],"affiliations":[],"classification":[],"areas_of_expertise":[],"news_and_recent_appearances":[],"phone":[],"contact":[{"value":"\u003Cp\u003ENathan Deen, Comms. Officer I\u003Cbr \/\u003E\r\nSchool of Interactive Computing\u003C\/p\u003E\r\n\r\n\u003Cp\u003Enathan.deen@cc.gatech.edu\u003C\/p\u003E\r\n","format":"limited_html"}],"email":[],"slides":[],"orientation":[],"userdata":""}}}