{"635081":{"#nid":"635081","#data":{"type":"news","title":"OMSCS Student Uses Machine Learning to Help Understand COVID-19","body":[{"value":"\u003Cp\u003EWith dozens of research papers about COVID-19 being published each week, it can be difficult for doctors and scientists to read the most important studies.\u003C\/p\u003E\r\n\r\n\u003Cp\u003EA student at Georgia Tech, however, is using artificial intelligence (AI) techniques like natural language processing and machine learning (ML) to narrow down the most relevant information in this growing data set.\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u003Ca href=\u0022http:\/\/www.erskinelaw.com\/ken-miller\/\u0022\u003E\u003Cstrong\u003EKenneth Miller\u003C\/strong\u003E\u003C\/a\u003E, a student in Georgia Tech\u0026rsquo;s Online Master of Science in Computer Science (OMSCS), is using these tools to develop algorithms to ensure that the most important COVID-19 research reaches doctors. His work is part of an ongoing challenge to use ML to empower the medical community to find the best COVID-19 studies.\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u003Cstrong\u003EInformation Overload\u003C\/strong\u003E\u003C\/p\u003E\r\n\r\n\u003Cp\u003EThe challenge started when Kaggle, a Google data science and ML community, partnered with the White House and several leading research groups to create the COVID-19 Open Research Dataset (CORD-19). With more than 47,000 scholarly articles about COVID-19 and other coronaviruses, it\u0026rsquo;s one of the most comprehensive research databases for the pandemic.\u003C\/p\u003E\r\n\r\n\u003Cp\u003ETo sift through the data, Kaggle \u003Ca href=\u0022https:\/\/www.kaggle.com\/allen-institute-for-ai\/CORD-19-research-challenge\u0022\u003Ereleased\u003C\/a\u003E CORD-19 to its community and asked them to use it to answer some of the \u0026nbsp;\u003Ca href=\u0022https:\/\/www.kaggle.com\/covid19\u0022\u003Etoughest research questions\u003C\/a\u003E about COVID-19. As incentive, for every task completed successfully, participants like Miller receive $1,000 in prize money.\u003C\/p\u003E\r\n\r\n\u003Cp\u003EAs an OMSCS student specializing in ML, Miller has joined a few previous Kaggle challenges, but for much less significant tasks like home values or NCAA brackets. For Miller, working on this dataset presented an especially relevant problem.\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u0026ldquo;I am fascinated with everything AI, so when I heard about this, I figured if any of my skills could help anyone, I should try,\u0026rdquo; said Miller, who is a lawyer outside of his studies.\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u003Cstrong\u003EKeep it Simple\u003C\/strong\u003E\u003C\/p\u003E\r\n\r\n\u003Cp\u003EMiller said his OMSCS studies prepared him for the challenge. The AI track focuses on the practical implementation of AI methods. This made it easier for Miller to start with an overwhelming amount of data and get to an endpoint that solves the problem. His experience using the programming language Python for class also enabled him to agilely work with the data.\u003C\/p\u003E\r\n\r\n\u003Cp\u003EArmed with this knowledge, Miller applied a strategy he uses on every project.\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u0026ldquo;Whenever I start a new project, I try and see if I can craft a simple yet effective solution from scratch,\u0026rdquo; he said.\u003C\/p\u003E\r\n\r\n\u003Cp\u003EHe has worked on specific Kaggle challenges he can apply this strategy. The first ML model Miller developed finds the most relevant sentences in a study. To accomplish this, he used a simple scoring algorithm that determines how many times keywords appear in a sentence. Then the model measures the ratio of keyword occurrences to sentence length.\u003C\/p\u003E\r\n\r\n\u003Cp\u003EFor a separate challenge, Miller created a \u003Ca href=\u0022http:\/\/edocdiscovery.com\/covid_19\/index.php\u0022\u003Esearch engine\u003C\/a\u003E for common COVID-19 research questions, such as: What is the average time the disease takes to incubate? How long is it contagious? How long until symptoms appear?\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u003Cstrong\u003EUp to the Challenge\u003C\/strong\u003E\u003C\/p\u003E\r\n\r\n\u003Cp\u003EThese are just a few of Miller\u0026rsquo;s models, and he continues to work on new challenges Kaggle offers. Tasks now include deep dives into epidemiology, \u003Ca href=\u0022https:\/\/www.kaggle.com\/allen-institute-for-ai\/CORD-19-research-challenge\/discussion\/139355\u0022\u003Eunderstanding how many patients a study was based on\u003C\/a\u003E, and what scientific method was employed.\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u0026ldquo;The trick, as in any project like this, is understanding and assimilating the data to start with,\u0026rdquo; Miller said. \u0026ldquo;But using Python makes the initial data wrangling pretty easy. The hardest part is building new ways to squeeze more desired info out of the documents.\u0026rdquo;\u003C\/p\u003E\r\n\r\n\u003Cp\u003EMiller\u0026rsquo;s efforts have been noticed. His work has been \u003Ca href=\u0022https:\/\/www.kaggle.com\/covid-19-contributions\u0022\u003Ecited\u003C\/a\u003E several times on the contributions page.\u003C\/p\u003E\r\n\r\n\u003Cp\u003EFor more coverage of Georgia Tech\u0026rsquo;s response to the coronavirus pandemic, please visit our \u003Ca href=\u0022https:\/\/helpingstories.gatech.edu\/\u0022\u003EResponding to COVID-19 page\u003C\/a\u003E.\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u0026nbsp;\u003C\/p\u003E\r\n","summary":null,"format":"limited_html"}],"field_subtitle":"","field_summary":"","field_summary_sentence":[{"value":"A student at Georgia Tech is using artificial intelligence (AI) techniques like natural language processing and machine learning (ML) to narrow down the most relevant information in this growing COVID-19 data set. "}],"uid":"34541","created_gmt":"2020-05-05 16:28:57","changed_gmt":"2020-06-04 13:11:23","author":"Tess Malone","boilerplate_text":"","field_publication":"","field_article_url":"","dateline":{"date":"2020-05-05T00:00:00-04:00","iso_date":"2020-05-05T00:00:00-04:00","tz":"America\/New_York"},"extras":[],"hg_media":{"635083":{"id":"635083","type":"image","title":"Ken Miller","body":null,"created":"1588696222","gmt_created":"2020-05-05 16:30:22","changed":"1588696222","gmt_changed":"2020-05-05 16:30:22","alt":"Ken Miller","file":{"fid":"241674","name":"ken headshot large.jpg","image_path":"\/sites\/default\/files\/images\/ken%20headshot%20large.jpg","image_full_path":"http:\/\/www.tlwarc.hg.gatech.edu\/\/sites\/default\/files\/images\/ken%20headshot%20large.jpg","mime":"image\/jpeg","size":362900,"path_740":"http:\/\/www.tlwarc.hg.gatech.edu\/sites\/default\/files\/styles\/740xx_scale\/public\/images\/ken%20headshot%20large.jpg?itok=TorRwQU1"}}},"media_ids":["635083"],"groups":[{"id":"47223","name":"College of Computing"}],"categories":[],"keywords":[],"core_research_areas":[],"news_room_topics":[],"event_categories":[],"invited_audience":[],"affiliations":[],"classification":[],"areas_of_expertise":[],"news_and_recent_appearances":[],"phone":[],"contact":[{"value":"\u003Cp\u003ETess Malone, Communications Officer\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u003Ca href=\u0022mailto:tess.malone@cc.gatech.edu\u0022\u003Etess.malone@cc.gatech.edu\u003C\/a\u003E\u003C\/p\u003E\r\n","format":"limited_html"}],"email":[],"slides":[],"orientation":[],"userdata":""}}}