<node id="620823">
  <nid>620823</nid>
  <type>news</type>
  <uid>
    <user id="34540"><![CDATA[34540]]></user>
  </uid>
  <created>1556041732</created>
  <changed>1556118787</changed>
  <title><![CDATA[Understanding How Data Scientists Understand Machine Learning Models]]></title>
  <body><![CDATA[<p>How do data scientists read and understand machine learning model outputs? This is the question that a new design probe built by a team of researchers led by&nbsp;<a href="https://www.cse.gatech.edu/">School of Computational Science and Engineering</a>&nbsp;(CSE) Ph.D. student&nbsp;<strong>Fred Hohman&nbsp;</strong>aims to answer.</p>

<p>&ldquo;Without good models and the right tools to interpret them, data scientists risk making decisions based on hidden biases, spurious correlations, and false generalizations. This has led to a rallying cry for model interpretability,&rdquo; said Hohman.</p>

<p>To address this issue, Hohman teamed up with&nbsp;U.C. Berkeley Ph.D. candidate&nbsp;<strong>Andrew Head&nbsp;</strong>and Microsoft researchers&nbsp;<strong>Rich Caruana</strong>,&nbsp;<strong>Robert DeLine</strong>, and&nbsp;<strong>Steven Drucker</strong>, to create&nbsp;<a href="https://fredhohman.com/papers/gamut">Gamut</a>. Gamut is an interactive system designed to investigate how data scientists interpret models, and how interactive interfaces&nbsp;can support data scientists in answering questions about model interpretability.&nbsp;</p>

<p>&ldquo;Machine learning is doing all this amazing work nowadays like cancer prediction, predicting fire risks in buildings, and poverty prediction via satellite images. But there are many applications where demographic bias such as gender, age, or race, is learned from data,&rdquo; continued Hohman.</p>

<p>&ldquo;That brings us to Gamut, which focuses on an area of machine learning called interpretability, which is essentially trying to understand what a machine learning algorithm has actually learned so data scientists can trust its predictions.&rdquo;</p>

<p>[VIDEO::https://youtu.be/R-amW_yNX6I::aVideoStyle]</p>

<p>The system uses generalized additive models (GAMs), models that combine high accuracy with an inherently intelligible structure, and interactive data visualization, to display model results and predictions to ultimately study how data scientists use explainable interfaces for interpretability.</p>

<p>Surprisingly, while the term interpretability loosely describes a human understanding of some component of a model, no formal agreed upon definition has been reached about what component should be understood, according to Hohman. This is another reason why Gamut is a critical piece to solving the interpretability puzzle.</p>

<p>Rather than aiming to define interpretability, Hohman says Gamut instead aims to&nbsp;operationalize it, or&nbsp;turn the&nbsp;<a href="https://en.wikipedia.org/wiki/Fuzzy_concept" title="Fuzzy concept">fuzzy concept</a>&nbsp;of interpretability into something more easily usable and actionable.</p>

<p>&ldquo;Since machine learning models are still being used despite their problems, the idea is that we can break interpretability down into a suite of techniques to help data scientists interpret models today. And, by collaborating with Microsoft, our human-centered approach using rich user interaction and data visualization can be informed and tested by professional data scientists who work with machine learning daily.</p>

<p>&ldquo;Our investigation showed that interpretability is not a monolithic concept. Data scientists have different reasons to interpret models and tailor explanations for specific audiences, often balancing competing concerns of simplicity and completeness,&rdquo; Hohman said.</p>

<p>&nbsp;</p>
]]></body>
  <field_subtitle>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_subtitle>
  <field_dateline>
    <item>
      <value>2019-04-24T00:00:00-04:00</value>
      <timezone><![CDATA[America/New_York]]></timezone>
    </item>
  </field_dateline>
  <field_summary_sentence>
    <item>
      <value><![CDATA[CSE Ph.D. Student Fred Hohman releases his latest software, Gamut, that aims to help data scientists understand machine learning outputs.]]></value>
    </item>
  </field_summary_sentence>
  <field_summary>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_summary>
  <field_media>
          <item>
        <nid>
          <node id="620818">
            <nid>620818</nid>
            <type>image</type>
            <title><![CDATA[Gamut - Visualization Software]]></title>
            <body><![CDATA[]]></body>
                          <field_image>
                <item>
                  <fid>236429</fid>
                  <filename><![CDATA[19-gamut-chi.png]]></filename>
                  <filepath><![CDATA[/sites/default/files/images/19-gamut-chi.png]]></filepath>
                  <file_full_path><![CDATA[http://www.tlwarc.hg.gatech.edu//sites/default/files/images/19-gamut-chi.png]]></file_full_path>
                  <filemime>image/png</filemime>
                  <image_740><![CDATA[]]></image_740>
                  <image_alt><![CDATA[Interacting with Gamut's multiple coordinated views together. (A) Selecting the OverallQual feature from the sorted Feature Sidebar displays its shape curve in the Shape Curve View. (B) Brushing over either explanation for Instance 550 or Instance 798 shows the contribution of the Ove]]></image_alt>
                </item>
              </field_image>
            
                      </node>
        </nid>
      </item>
      </field_media>
  <field_contact_email>
    <item>
      <email><![CDATA[kristen.perez@cc.gatech.ed]]></email>
    </item>
  </field_contact_email>
  <field_location>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_location>
  <field_contact>
    <item>
      <value><![CDATA[<p>Kristen Perez</p>

<p>Communications Officer</p>
]]></value>
    </item>
  </field_contact>
  <field_sidebar>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_sidebar>
  <field_boilerplate>
    <item>
      <nid><![CDATA[]]></nid>
    </item>
  </field_boilerplate>
  <!--  TO DO: correct to not conflate categories and news room topics  -->
  <!--  Disquisition: it's funny how I write these TODOs and then never
         revisit them. It's as though the act of writing the thing down frees me
         from the responsibility to actually solve the problem. But what can I
         say? There are more problems than there's time to solve.  -->
  <links_related> </links_related>
  <files> </files>
  <og_groups>
          <item>47223</item>
          <item>431631</item>
          <item>50877</item>
      </og_groups>
  <og_groups_both>
          <item>
        <![CDATA[Student Research]]>
      </item>
      </og_groups_both>
  <field_categories>
          <item>
        <tid>8862</tid>
        <value><![CDATA[Student Research]]></value>
      </item>
      </field_categories>
  <core_research_areas>
          <term tid="39431"><![CDATA[Data Engineering and Science]]></term>
      </core_research_areas>
  <field_news_room_topics>
      </field_news_room_topics>
  <links_related>
      </links_related>
  <files>
      </files>
  <og_groups>
          <item>47223</item>
          <item>431631</item>
          <item>50877</item>
      </og_groups>
  <og_groups_both>
          <item><![CDATA[College of Computing]]></item>
          <item><![CDATA[OMS]]></item>
          <item><![CDATA[School of Computational Science and Engineering]]></item>
      </og_groups_both>
  <field_keywords>
          <item>
        <tid>9167</tid>
        <value><![CDATA[machine learning]]></value>
      </item>
          <item>
        <tid>7257</tid>
        <value><![CDATA[visualization]]></value>
      </item>
          <item>
        <tid>167449</tid>
        <value><![CDATA[software]]></value>
      </item>
      </field_keywords>
  <field_userdata>
      <![CDATA[]]>
  </field_userdata>
</node>
