{"615576":{"#nid":"615576","#data":{"type":"news","title":"Georgia Tech Researchers Improve Fairness in the Machine Learning Pipeline","body":[{"value":"\u003Cp\u003E\u0026nbsp;\u003C\/p\u003E\r\n\r\n\u003Cp\u003EGeorgia Tech researchers have developed a new algorithm to mitigate bias from one of the first steps in the machine learning (ML) process. Known as fair principal component analysis (PCA), the new algorithm runs as fast as existing PCAs, but can reduce bias in low-dimensional representations of large datasets.\u003C\/p\u003E\r\n\r\n\u003Cp\u003EBias is one of the most \u003Ca href=\u0022https:\/\/www.scs.gatech.edu\/news\/610888\/jamie-morgenstern-wants-bring-fairness-machine-learning\u0022\u003Epressing issues\u003C\/a\u003E as ML is used for everything from image classification to determining loans. Although there are plenty of stories about obvious bias like ML algorithms only showing images of white men when asked to query the term \u0026ldquo;CEO,\u0026rdquo; much of the bias is more insidious.\u003C\/p\u003E\r\n\r\n\u003Cp\u003EMany researchers believe unfair ML is the result of biased data or faulty algorithms, but Tech researchers determined it can start as early as the data processing step.\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u003Cstrong\u003EReducing the dimension, increasing bias\u003C\/strong\u003E\u003C\/p\u003E\r\n\r\n\u003Cp\u003EData with high dimension is often the start of the problem. When a dataset needs to be mathematically represented, each feature is represented as one dimension. For example, a 200x200-pixel image transforms into a vector with 40,000 dimensions. Working with such a large representation is often too difficult to process efficiently, so computer scientists use standard PCA to reduce the dimension while keeping as much information from the original data set as possible.\u003C\/p\u003E\r\n\r\n\u003Cp\u003EPCA runs by looking for the main directions that the data is distributed and projects the data onto those directions. Scientists then evaluate the accuracy by determining how far the projection is from the original data or how much information is being lost.\u003C\/p\u003E\r\n\r\n\u003Cp\u003EAlthough this makes the data easier to work with, the low-dimensional representation can be biased, according to the researchers. They ran PCA on a dataset of 1,300 images of males and females and calculated the average error for the different populations. The male population always was more accurate than the female.\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u0026ldquo;If you\u0026rsquo;re already representing one population much better at the preprocessing step, it injects some bias no matter what you\u0026rsquo;re trying to do,\u0026rdquo; School of Computer Science (SCS) Ph.D. student \u003Ca href=\u0022http:\/\/www.samirasamadi.com\u0022\u003E\u003Cstrong\u003ESamira Samadi\u003C\/strong\u003E\u003C\/a\u003E said.\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u003Cstrong\u003EReducing bias\u003C\/strong\u003E\u003C\/p\u003E\r\n\r\n\u003Cp\u003EOne way to combat this bias is to have a definition of what a fair projection means. The researchers evaluate this through marginal error, which determines how far the output projection is from the best projection for each population in the data. They concluded the maximum marginal error must be low for both populations, or equal, to be considered fair.\u003C\/p\u003E\r\n\r\n\u003Cp\u003EThis became the basis for the new fair PCA algorithm.\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u0026ldquo;If you use PCA and care about fairness, now you can use fair PCA,\u0026rdquo; Samadi said.\u003C\/p\u003E\r\n\r\n\u003Cp\u003EThe researchers presented the work at one of the leading \u003Ca href=\u0022https:\/\/www.scs.gatech.edu\/news\/614427\/georgia-tech-will-show-latest-research-ais-hottest-conference\u0022\u003Emachine learning conferences\u003C\/a\u003E of the year, the\u003Ca href=\u0022https:\/\/nips.cc\/\u0022\u003E Conference on Neural Information Processing Systems (NeurIPS)\u003C\/a\u003E, in Montreal Dec. 2-8. Samadi co-authored the paper,\u003Ca href=\u0022https:\/\/arxiv.org\/abs\/1811.00103\u0022\u003E \u003Cem\u003EThe Price of Fair PCA: One Extra Dimension\u003C\/em\u003E\u003C\/a\u003E, with SCS Ph.D. student\u003Ca href=\u0022https:\/\/www.cc.gatech.edu\/~uthaipon3\/\u0022\u003E \u003Cstrong\u003EUthaipon (Tao) Tantipongpipat\u003C\/strong\u003E\u003C\/a\u003E, School of Industrial and Systems Engineering Associate Professor\u003Ca href=\u0022https:\/\/www2.isye.gatech.edu\/~msingh94\/\u0022\u003E \u003Cstrong\u003EMohit Singh\u003C\/strong\u003E\u003C\/a\u003E, SCS Assistant Professor\u003Ca href=\u0022http:\/\/jamiemorgenstern.com\/\u0022\u003E \u003Cstrong\u003EJamie Morgenstern\u003C\/strong\u003E\u003C\/a\u003E, and SCS Professor\u003Ca href=\u0022https:\/\/www.cc.gatech.edu\/~vempala\/\u0022\u003E \u003Cstrong\u003ESantosh Vempala\u003C\/strong\u003E\u003C\/a\u003E.\u003C\/p\u003E\r\n","summary":null,"format":"limited_html"}],"field_subtitle":"","field_summary":"","field_summary_sentence":[{"value":"Georgia Tech researchers have developed a new algorithm to mitigate bias from one of the first steps in the machine learning (ML) process. "}],"uid":"34541","created_gmt":"2018-12-18 18:18:51","changed_gmt":"2018-12-18 18:20:18","author":"Tess Malone","boilerplate_text":"","field_publication":"","field_article_url":"","dateline":{"date":"2018-12-18T00:00:00-05:00","iso_date":"2018-12-18T00:00:00-05:00","tz":"America\/New_York"},"extras":[],"hg_media":{"615577":{"id":"615577","type":"image","title":"Fair PCA Image","body":null,"created":"1545157197","gmt_created":"2018-12-18 18:19:57","changed":"1545157197","gmt_changed":"2018-12-18 18:19:57","alt":"Scale showing gender bias in illustrated form","file":{"fid":"234375","name":"Gender-Bias.jpg","image_path":"\/sites\/default\/files\/images\/Gender-Bias.jpg","image_full_path":"http:\/\/www.tlwarc.hg.gatech.edu\/\/sites\/default\/files\/images\/Gender-Bias.jpg","mime":"image\/jpeg","size":42256,"path_740":"http:\/\/www.tlwarc.hg.gatech.edu\/sites\/default\/files\/styles\/740xx_scale\/public\/images\/Gender-Bias.jpg?itok=fRqnCQRK"}}},"media_ids":["615577"],"groups":[{"id":"47223","name":"College of Computing"},{"id":"50875","name":"School of Computer Science"}],"categories":[],"keywords":[],"core_research_areas":[{"id":"39541","name":"Systems"}],"news_room_topics":[],"event_categories":[],"invited_audience":[],"affiliations":[],"classification":[],"areas_of_expertise":[],"news_and_recent_appearances":[],"phone":[],"contact":[{"value":"\u003Cp\u003ETess Malone, Communications Officer\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u003Ca href=\u0022mailto:tess.malone@cc.gatech.edu\u0022\u003Etess.malone@cc.gatech.edu\u003C\/a\u003E\u003C\/p\u003E\r\n","format":"limited_html"}],"email":["tess.malone@cc.gatech.edu"],"slides":[],"orientation":[],"userdata":""}}}