{"636603":{"#nid":"636603","#data":{"type":"news","title":"New Training Data Labeling System for Machine Learning Helps Developers","body":[{"value":"\u003Cp\u003EMachine learning (ML) has become one of the most prominent forms of data analysis for everything from fraud detection to visual quality control. Yet the analytic results can often suffer from insufficiently labeled training data.\u003C\/p\u003E\r\n\r\n\u003Cp\u003EA team of Georgia Tech researchers has created a system that allows users to more effectively label a training dataset with higher accuracy than current methods.\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u0026ldquo;We are looking at the problem from a data management perspective,\u0026rdquo; said School of Computer Science (SCS) Assistant Professor \u003Ca href=\u0022https:\/\/www.cc.gatech.edu\/~xchu33\/\u0022\u003E\u003Cstrong\u003EXu Chu\u003C\/strong\u003E\u003C\/a\u003E. \u0026ldquo;In contrast to a lot of ML research that tries to tackle the lack of sufficient training data from an ML algorithm design perspective, we aim at building a system that helps users effectively label a dataset.\u0026rdquo;\u003C\/p\u003E\r\n\r\n\u003Cp\u003EThe system, called GOGGLES, labels datasets using affinity coding, a paradigm that allows ML engineers to use various affinity functions that input two unlabeled examples and output a real-valued score.\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u0026ldquo;You can think of affinity as similarity,\u0026rdquo; said Chu. \u0026ldquo;The core premise of the work is that two examples share the same label if they are similar according to some affinity functions (or similarity functions).\u0026rdquo;\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u003Cstrong\u003EThe benefits of affinity coding\u003C\/strong\u003E\u003C\/p\u003E\r\n\r\n\u003Cp\u003EGOGGLES uses a set of affinity functions that can capture various affinities found in the image. Next, using a new unlabeled dataset and these affinity functions, GOGGLES constructs an affinity matrix, from which it can assign classes to unlabeled images. This doesn\u0026rsquo;t require any metadata or developer intervention like previous .\u003C\/p\u003E\r\n\r\n\u003Cp\u003EFor each new dataset, users can potentially reuse many of the existing affinity functions already in the\u0026nbsp;library, making GOGGLES a domain-agnostic labeling system. Users and developers can always add more affinity functions to increase the labeling power of GOGGLES.\u003C\/p\u003E\r\n\r\n\u003Cp\u003EOn five common image classifying tasks, GOGGLES reaches up to 98 percent accuracy without requiring extensive developer effort. It also outperforms other well-known data programming systems by up to 21 percent.\u003C\/p\u003E\r\n\r\n\u003Cp\u003EChu co-wrote the paper, \u003Ca href=\u0022https:\/\/www.cc.gatech.edu\/~xchu33\/chu-papers\/GOGGLES-SIGMOD2020.pdf\u0022\u003E\u003Cem\u003EGOGGLES: Automatic Image Labeling with Affinity Coding, \u003C\/em\u003E\u003C\/a\u003E\u0026nbsp;with Ph.D. students \u003Ca href=\u0022http:\/\/nilakshdas.com\/\u0022\u003E\u003Cstrong\u003ENilaksh Das\u003C\/strong\u003E\u003C\/a\u003E and \u003Ca href=\u0022https:\/\/wurenzhi.github.io\/\u0022\u003E\u003Cstrong\u003ERenzhi Wu\u003C\/strong\u003E\u003C\/a\u003E, master\u0026rsquo;s alumni \u003Ca href=\u0022https:\/\/www.linkedin.com\/in\/sanyachaba\/\u0022\u003E\u003Cstrong\u003ESanya Chaba\u003C\/strong\u003E\u003C\/a\u003E and \u003Ca href=\u0022https:\/\/www.linkedin.com\/in\/sakshigandhi\/\u0022\u003E\u003Cstrong\u003ESakshi Gandhi\u003C\/strong\u003E\u003C\/a\u003E, and School of Computational Science and Engineering Professor \u003Ca href=\u0022https:\/\/poloclub.github.io\/polochau\/\u0022\u003E\u003Cstrong\u003EPolo Chau\u003C\/strong\u003E\u003C\/a\u003E. They presented it at \u003Ca href=\u0022https:\/\/en.wikipedia.org\/wiki\/Association_for_Computing_Machinery\u0022 title=\u0022Association for Computing Machinery\u0022\u003EAssociation for Computing Machinery\u003C\/a\u003E\u0026#39;s \u003Ca href=\u0022https:\/\/sigmod2020.org\/\u0022 title=\u0022Symposium on Principles of Database Systems\u0022\u003ESpecial Interest Group on Management of Data (SIGMOD) and Symposium on Principles of Database Systems (PODS)\u003C\/a\u003E held virtually from June 14 to 19. \u0026nbsp;\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u0026nbsp;\u003C\/p\u003E\r\n","summary":null,"format":"limited_html"}],"field_subtitle":"","field_summary":"","field_summary_sentence":[{"value":"A team of Georgia Tech researchers has created a system that allows users to more effectively label a training dataset with higher accuracy than current methods. "}],"uid":"34541","created_gmt":"2020-06-29 21:17:28","changed_gmt":"2020-06-29 21:19:57","author":"Tess Malone","boilerplate_text":"","field_publication":"","field_article_url":"","dateline":{"date":"2020-06-29T00:00:00-04:00","iso_date":"2020-06-29T00:00:00-04:00","tz":"America\/New_York"},"extras":[],"hg_media":{"636604":{"id":"636604","type":"image","title":"Goggles","body":null,"created":"1593465569","gmt_created":"2020-06-29 21:19:29","changed":"1593465569","gmt_changed":"2020-06-29 21:19:29","alt":"Goggles","file":{"fid":"242196","name":"person-holding-blue-goggles.jpg","image_path":"\/sites\/default\/files\/images\/person-holding-blue-goggles.jpg","image_full_path":"http:\/\/www.tlwarc.hg.gatech.edu\/\/sites\/default\/files\/images\/person-holding-blue-goggles.jpg","mime":"image\/jpeg","size":64436,"path_740":"http:\/\/www.tlwarc.hg.gatech.edu\/sites\/default\/files\/styles\/740xx_scale\/public\/images\/person-holding-blue-goggles.jpg?itok=t3pfG_Hd"}}},"media_ids":["636604"],"groups":[{"id":"47223","name":"College of Computing"},{"id":"50877","name":"School of Computational Science and Engineering"},{"id":"50875","name":"School of Computer Science"}],"categories":[],"keywords":[],"core_research_areas":[],"news_room_topics":[],"event_categories":[],"invited_audience":[],"affiliations":[],"classification":[],"areas_of_expertise":[],"news_and_recent_appearances":[],"phone":[],"contact":[{"value":"\u003Cp\u003ETess Malone, Communications Officer\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u003Ca href=\u0022mailto:tess.malone@cc.gatech.edu\u0022\u003Etess.malone@cc.gatech.edu\u003C\/a\u003E\u003C\/p\u003E\r\n","format":"limited_html"}],"email":[],"slides":[],"orientation":[],"userdata":""}}}