<node id="489091">
  <nid>489091</nid>
  <type>event</type>
  <uid>
    <user id="28150"><![CDATA[28150]]></user>
  </uid>
  <created>1453390703</created>
  <changed>1492118218</changed>
  <title><![CDATA[Accelerating Advanced Analytics - Arun Kumar]]></title>
  <body><![CDATA[<p class="p1">Title:&nbsp;Accelerating Advanced Analytics</p><p class="p1"><br /></p><p class="p1">Abstract:</p><p class="p2">Advanced analytics -- the analysis of large and complex data with machine</p><p class="p1">learning (ML) -- is becoming ubiquitous, with a growing demand for</p><p class="p1">advanced analytics tools in the enterprise domains. However, there exist</p><p class="p1">several challenging bottlenecks in the end-to-end process of building and</p><p class="p1">deploying advanced analytics applications. My research focuses on</p><p class="p1">abstractions, algorithms, and systems to mitigate such bottlenecks and</p><p class="p1">accelerate advanced analytics from a data management standpoint.</p><p class="p2">&nbsp;</p><p class="p1">In this talk, I will focus on my work on mitigating one such pervasive</p><p class="p1">bottleneck in the process of feature engineering for ML -- joins of</p><p class="p1">multiple tables. Many real-world datasets are multi-table, connected by</p><p class="p1">key-foreign key relationships, but almost all ML toolkits expect</p><p class="p1">single-table inputs. This forces data scientists to join all tables and</p><p class="p1">materialize a single table that collects all features. Alas, such joins</p><p class="p1">often cause the output to blow up in size, which slows down ML, increases</p><p class="p1">costs, and leads to data maintenance headaches. In my work, I show how it</p><p class="p1">is possible to mitigate these issues by "avoiding joins physically,"</p><p class="p1">i.e., pushing ML down through joins. This reduces runtime without</p><p class="p1">affecting accuracy. Going further, I apply statistical learning theory to</p><p class="p1">show how it is often possible to also "avoid joins logically," i.e.,</p><p class="p1">ignore entire tables outright without losing much accuracy, but achieving</p><p class="p1">significant runtime gains.</p><p class="p2">&nbsp;</p><p class="p1">Bio:</p><p class="p1">Arun Kumar is a Ph.D. candidate at the University of Wisconsin-Madison.</p><p class="p1">His primary research interests are in data management and its</p><p class="p1">intersection with machine learning. He is co-advised by Jeffrey Naughton</p><p class="p1">and Jignesh M. Patel, and has also worked closely with Christopher Re and</p><p class="p1">Xiaojin Zhu. Systems and ideas from his research have been shipped in</p><p class="p1">products by EMC, Oracle, Cloudera, and IBM. A paper co-authored by him</p><p class="p1">was accorded the Best Paper Award at ACM SIGMOD 2014. He was awarded the</p><p class="p1">Anthony C. Klug NCR Fellowship in database systems in 2015. He received</p><p class="p1">his M.S. from UW-Madison in 2011 and his B.Tech. from IIT Madras in 2009.</p><p class="p1"><br /></p><p class="p1">Webpage:</p><p class="p3"><a href="http://pages.cs.wisc.edu/~arun/%3Chttp://pages.cs.wisc.edu/%7Earun/%3E">http://pages.cs.wisc.edu/~arun/&lt;http://pages.cs.wisc.edu/%7Earun/&gt;</a></p>]]></body>
  <field_summary_sentence>
    <item>
      <value><![CDATA[Accelerating Advanced Analytics - Arun Kumar]]></value>
    </item>
  </field_summary_sentence>
  <field_summary>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_summary>
  <field_time>
    <item>
      <value><![CDATA[2016-02-02T10:00:00-05:00]]></value>
      <value2><![CDATA[2016-02-02T11:00:00-05:00]]></value2>
      <rrule><![CDATA[]]></rrule>
      <timezone><![CDATA[America/New_York]]></timezone>
    </item>
  </field_time>
  <field_fee>
    <item>
      <value><![CDATA[0.00]]></value>
    </item>
  </field_fee>
  <field_extras>
      </field_extras>
  <field_audience>
          <item>
        <value><![CDATA[Undergraduate students]]></value>
      </item>
          <item>
        <value><![CDATA[Faculty/Staff]]></value>
      </item>
          <item>
        <value><![CDATA[Public]]></value>
      </item>
          <item>
        <value><![CDATA[Graduate students]]></value>
      </item>
      </field_audience>
  <field_media>
          <item>
        <nid>
          <node id="489201">
            <nid>489201</nid>
            <type>image</type>
            <title><![CDATA[Arun Kumar]]></title>
            <body><![CDATA[]]></body>
                          <field_image>
                <item>
                  <fid>204398</fid>
                  <filename><![CDATA[facecrop.jpg]]></filename>
                  <filepath><![CDATA[/sites/default/files/images/facecrop.jpg]]></filepath>
                  <file_full_path><![CDATA[http://www.tlwarc.hg.gatech.edu//sites/default/files/images/facecrop.jpg]]></file_full_path>
                  <filemime>image/jpeg</filemime>
                  <image_740><![CDATA[]]></image_740>
                  <image_alt><![CDATA[Arun Kumar]]></image_alt>
                </item>
              </field_image>
            
                      </node>
        </nid>
      </item>
      </field_media>
  <field_contact>
    <item>
      <value><![CDATA[<p>Susie McClain</p><p><a href="mailto:smcclain@cc.gatech.edu">smcclain@cc.gatech.edu</a></p>]]></value>
    </item>
  </field_contact>
  <field_location>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_location>
  <field_sidebar>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_sidebar>
  <field_phone>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_phone>
  <field_url>
    <item>
      <url><![CDATA[]]></url>
      <title><![CDATA[]]></title>
            <attributes><![CDATA[]]></attributes>
    </item>
  </field_url>
  <field_email>
    <item>
      <email><![CDATA[]]></email>
    </item>
  </field_email>
  <field_boilerplate>
    <item>
      <nid><![CDATA[]]></nid>
    </item>
  </field_boilerplate>
  <links_related>
      </links_related>
  <files>
      </files>
  <og_groups>
          <item>47223</item>
          <item>50875</item>
      </og_groups>
  <og_groups_both>
          <item><![CDATA[College of Computing]]></item>
          <item><![CDATA[School of Computer Science]]></item>
      </og_groups_both>
  <field_categories>
          <item>
        <tid>1795</tid>
        <value><![CDATA[Seminar/Lecture/Colloquium]]></value>
      </item>
      </field_categories>
  <field_keywords>
          <item>
        <tid>654</tid>
        <value><![CDATA[College of Computing]]></value>
      </item>
          <item>
        <tid>109</tid>
        <value><![CDATA[Georgia Tech]]></value>
      </item>
          <item>
        <tid>166941</tid>
        <value><![CDATA[School of Computer Science]]></value>
      </item>
      </field_keywords>
  <userdata><![CDATA[]]></userdata>
</node>
