{"617061":{"#nid":"617061","#data":{"type":"news","title":"See and Say: Abhishek Das Working to Provide Crucial Communication Tools for Intelligent Agents","body":[{"value":"\u003Cp\u003ESchool of Interactive Computing Ph.D. student \u003Ca href=\u0022https:\/\/abhishekdas.com\/\u0022\u003E\u003Cstrong\u003EAbhishek Das\u003C\/strong\u003E\u003C\/a\u003E remembers the moment his interests in computer vision and language began to come into focus. It was early in his time as a Ph.D. student when he came across an algorithm that could generate a one-line natural language description of an image with incredible accuracy. When he saw the results, it seemed almost magical, he said.\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u0026ldquo;I was blown away because you could give it any image, and it would generate a fairly plausible sentence,\u0026rdquo; he said. \u0026ldquo;I had never seen that before.\u0026rdquo;\u003C\/p\u003E\r\n\r\n\u003Cp\u003ESix months later, there were papers being published on question answering, where the algorithm could not only generate a sentence but could even answer questions about the image. He was similarly floored by the impressive results.\u003C\/p\u003E\r\n\r\n\u003Cp\u003EHe was advised by \u003Cstrong\u003E\u003Ca href=\u0022https:\/\/www.cc.gatech.edu\/~dbatra\/\u0022\u003EDhruv Batra\u003C\/a\u003E\u003C\/strong\u003E and also working closely with \u003Cstrong\u003E\u003Ca href=\u0022https:\/\/www.cc.gatech.edu\/~parikh\/\u0022\u003EDevi Parikh\u003C\/a\u003E\u003C\/strong\u003E, both assistant professors at Virginia Tech at the time. When they joined Georgia Tech, Das brought his thirst for research in that space to Atlanta, as well. Now, nearly two years later, he has published a number of research papers in projects ranging from \u003Ca href=\u0022https:\/\/visualdialog.org\/\u0022\u003Evisual dialogue\u003C\/a\u003E to a task called \u0026ldquo;\u003Ca href=\u0022https:\/\/embodiedqa.org\/\u0022\u003Eembodied question answering\u003C\/a\u003E.\u0026rdquo; He is working toward additional research involving multiple agents, and sees a world not far off that takes advantage of all of this simulated research to develop hardware for assistive tech like in-home robots.\u003C\/p\u003E\r\n\r\n\u003Ch3\u003E\u0026#39;It feels within reach...\u0026#39;\u003C\/h3\u003E\r\n\r\n\u003Cp\u003EIt\u0026rsquo;s a future that has been featured in popular culture for years \u0026ndash; think about \u003Ca href=\u0022http:\/\/thejetsons.wikia.com\/wiki\/Rosey\u0022\u003ERosie, the robot maid who first appeared on \u003Cem\u003EThe Jetsons \u003C\/em\u003Ein 1962\u003C\/a\u003E \u0026ndash; but is one that Das is beginning to see on the horizon.\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u0026ldquo;It feels within reach, the vision that we see in science fiction,\u0026rdquo; he said. \u0026ldquo;Movies of robots that you can talk to or give instructions to.\u0026rdquo;\u003C\/p\u003E\r\n\r\n\u003Cp\u003EWhile people outside of the research sphere may just see the cold steel exterior of these imagined robots, it requires so many different elements to develop a viable foundation. This includes work in computer vision, which involves analysis of visual information by a machine, and language, which involves written or verbal communication and instruction. Das works at the intersection of both domains.\u003C\/p\u003E\r\n\r\n\u003Cp\u003EBroadly, his research has been in developing algorithms and intelligent agents that can see, talk, and ultimately act on that understanding in physical environments, taking actions like navigation or executing instructions.\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u003Ca href=\u0022https:\/\/embodiedqa.org\/paper.pdf\u0022\u003EFindings from a recent research project\u003C\/a\u003E were published and \u003Ca href=\u0022https:\/\/www.youtube.com\/watch?v=gz2VoDrvX-A\u0026amp;feature=youtu.be\u0026amp;t=1h29m14s\u0022\u003Epresented\u003C\/a\u003E at the \u003Ca href=\u0022http:\/\/cvpr2018.thecvf.com\/\u0022\u003E2018 Computer Vision and Pattern Recognition conference\u003C\/a\u003E in Salt Lake City, Utah. They explored an idea called embodied question answering. In this project, there is an agent that is asked a question and must ascertain an answer by moving through and inquiring about other aspects of its environment.\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u0026ldquo;It combines these three modalities: computer vision, language understanding, and reinforcement learning to take actions in this environment,\u0026rdquo; Das said.\u003C\/p\u003E\r\n\r\n\u003Cp\u003EThe application here could be an assistive robot that could take a question or a command \u0026ndash; \u0026ldquo;Where are my keys,\u0026rdquo; for example \u0026ndash; and provide an answer or perform a task based on its understanding of the environment. He\u0026rsquo;s also conducting similar work with multiple agents, which could help coordinate to perform certain tasks.\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u0026ldquo;I\u0026rsquo;m not currently working with the hardware side of things,\u0026rdquo; he said. \u0026ldquo;All of this is simulation, but these are the end goals. The vision is that these will make it to robots with these sorts of capabilities. And, more importantly, the algorithms that I\u0026rsquo;m building will hopefully generalize and be useful for a wide variety of tasks.\u0026rdquo;\u003C\/p\u003E\r\n\r\n\u003Ch3\u003EA culture of collaboration\u003C\/h3\u003E\r\n\r\n\u003Cp\u003EDas\u0026rsquo; work has received extensive media attention and he has had the opportunity to work under some prestigious grants and fellowships. Currently, he is supported by fellowships from Facebook, Adobe, and Snap. He was recently awarded fellowships from Facebook, Microsoft Research, NVIDIA. He declined the latter two and accepted Facebook.\u003C\/p\u003E\r\n\r\n\u003Cp\u003EOne of the great benefits, he said, of working at Georgia Tech in this space has been the opportunity to collaborate with individuals who are conducting research in complementary domains.\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u0026ldquo;On my floor in the College of Computing, there are people who are experts in computer vision, natural language processing, reinforcement learning, in robotics, or other areas, and it\u0026rsquo;s always awesome to bounce ideas off of them,\u0026rdquo; he said.\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u0026ldquo;Just this semester, I was taking (Associate Professor) \u003Ca href=\u0022https:\/\/www.cc.gatech.edu\/~chernova\/\u0022\u003E\u003Cstrong\u003ESonia Chernova\u003C\/strong\u003E\u003C\/a\u003E\u0026rsquo;s course in human-robot interaction, and we prototyped a version of a tabletop embodied robot that could actually implement a very primitive version of the embodied question answering algorithm. That was a very interesting experience.\u0026rdquo;\u003C\/p\u003E\r\n\r\n\u003Cp\u003EDas is gaining new valuable experience this semester, as well. Having interned three times at Facebook AI Research, he is spending this semester in London interning with DeepMind, where he will work in areas related to this general space of agents that can see, talk, and act.\u003C\/p\u003E\r\n","summary":null,"format":"limited_html"}],"field_subtitle":"","field_summary":"","field_summary_sentence":[{"value":"School of Interactive Computing student Abhishek Das has published a number of research papers in projects ranging from visual dialogue to a task called \u201cembodied question answering.\u201d"}],"uid":"33939","created_gmt":"2019-01-30 18:19:28","changed_gmt":"2019-01-30 18:19:28","author":"David Mitchell","boilerplate_text":"","field_publication":"","field_article_url":"","dateline":{"date":"2019-01-30T00:00:00-05:00","iso_date":"2019-01-30T00:00:00-05:00","tz":"America\/New_York"},"extras":[],"hg_media":{"617059":{"id":"617059","type":"image","title":"Abhishek Das","body":null,"created":"1548872305","gmt_created":"2019-01-30 18:18:25","changed":"1548872305","gmt_changed":"2019-01-30 18:18:25","alt":"Abhishek Das","file":{"fid":"234841","name":"Abhishek Das.jpeg","image_path":"\/sites\/default\/files\/images\/Abhishek%20Das.jpeg","image_full_path":"http:\/\/www.tlwarc.hg.gatech.edu\/\/sites\/default\/files\/images\/Abhishek%20Das.jpeg","mime":"image\/jpeg","size":36461,"path_740":"http:\/\/www.tlwarc.hg.gatech.edu\/sites\/default\/files\/styles\/740xx_scale\/public\/images\/Abhishek%20Das.jpeg?itok=xL1MbCWA"}}},"media_ids":["617059"],"groups":[{"id":"47223","name":"College of Computing"},{"id":"1299","name":"GVU Center"},{"id":"576481","name":"ML@GT"},{"id":"50876","name":"School of Interactive Computing"}],"categories":[],"keywords":[{"id":"176750","name":"Abhishek Das"},{"id":"11506","name":"computer vision"},{"id":"180344","name":"nlp"},{"id":"23981","name":"natural language processing"},{"id":"173615","name":"dhruv batra"},{"id":"173616","name":"devi parikh"},{"id":"180345","name":"embodied question answering"},{"id":"176752","name":"visual dialogue"},{"id":"166848","name":"School of Interactive Computing"},{"id":"654","name":"College of Computing"},{"id":"1051","name":"Computer Science"},{"id":"667","name":"robotics"},{"id":"2352","name":"robots"}],"core_research_areas":[{"id":"39501","name":"People and Technology"},{"id":"39521","name":"Robotics"}],"news_room_topics":[],"event_categories":[],"invited_audience":[],"affiliations":[],"classification":[],"areas_of_expertise":[],"news_and_recent_appearances":[],"phone":[],"contact":[{"value":"\u003Cp\u003EDavid Mitchell\u003C\/p\u003E\r\n\r\n\u003Cp\u003ECommunications Officer\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u003Ca href=\u0022http:\/\/david.mitchell@cc.gatech.edu\u0022\u003Edavid.mitchell@cc.gatech.edu\u003C\/a\u003E\u003C\/p\u003E\r\n","format":"limited_html"}],"email":[],"slides":[],"orientation":[],"userdata":""}}}