<node id="615776">
  <nid>615776</nid>
  <type>event</type>
  <uid>
    <user id="34868"><![CDATA[34868]]></user>
  </uid>
  <created>1546444364</created>
  <changed>1546444364</changed>
  <title><![CDATA[ISyE Seminar - Pragya Sur]]></title>
  <body><![CDATA[<p><strong>Title:&nbsp;</strong></p>

<p>A modern maximum-likelihood approach for high-dimensional logistic regression</p>

<p><strong>Abstract</strong><strong>:&nbsp;</strong></p>

<p>Logistic regression is arguably the most widely used and studied non-linear model in statistics.&nbsp;Classical maximum-likelihood theory based statistical inference is ubiquitous in this context. This theory&nbsp;hinges on well-known fundamental results---(1) the maximum-likelihood-estimate (MLE) is asymptotically unbiased and normally distributed, (2) its variability can be quantified via the inverse Fisher information, and (3) the likelihood-ratio-test (LRT) is asymptotically a Chi-Squared. In this talk, I will show that in the common modern setting where the number of features and the sample size are both large and comparable, classical results are far from accurate. In fact, &nbsp;(1) the MLE is biased, (2) its variability is far greater than classical results, and (3) the LRT is not distributed as a Chi-Square. Consequently, p-values obtained based on classical theory are completely invalid in high dimensions.&nbsp;In turn, I will propose a new theory that characterizes the asymptotic behavior of both the MLE and the LRT under some assumptions on the covariate distribution, in a high-dimensional setting. Empirical evidence demonstrates that this asymptotic theory provides accurate inference in finite samples. Practical implementation of these results necessitates the estimation of a single scalar, the overall signal strength, and I will propose a procedure for estimating this parameter precisely.&nbsp;This is based on joint work with Emmanuel Candes and Yuxin Chen.</p>

<p><strong>Bio:</strong></p>

<p>Pragya Sur is a fifth year Ph.D. student in the <a href="http://www-stat.stanford.edu/">Department of Statistics</a> at Stanford University. She is fortunate to be advised by <a href="http://statweb.stanford.edu/~candes/">Prof. Emmanuel Cand&egrave;s</a>, and is supported by a generous <a href="https://humsci.stanford.edu/current-students/fellowships-and-funding">Ric Weiland Graduate Fellowship</a> in the <a href="https://humsci.stanford.edu/">Stanford School of Humanities and Sciences</a>. In 2017, she spent a wonderful summer as a research intern in <a href="https://www.microsoft.com/en-us/research/">Microsoft Research</a>, mentored by <a href="https://www.seas.harvard.edu/directory/dwork">Prof. Cynthia Dwork</a>.</p>

<p>Her research spectrum broadly spans high-dimensional statistical inference, controlled variable selection and connections to causality, and fairness in machine learning algorithms.</p>

<p>Prior to joining Stanford, she received a Bachelor of Statistics in 2012 and a Master of Statistics in 2014 from the <a href="http://www.isical.ac.in/">Indian Statistical Institute, Kolkata</a>.</p>
]]></body>
  <field_summary_sentence>
    <item>
      <value><![CDATA[A modern maximum-likelihood approach for high-dimensional logistic regression]]></value>
    </item>
  </field_summary_sentence>
  <field_summary>
    <item>
      <value><![CDATA[<p><strong>Abstract</strong><strong>:&nbsp;</strong></p>

<p>Logistic regression is arguably the most widely used and studied non-linear model in statistics.&nbsp;Classical maximum-likelihood theory based statistical inference is ubiquitous in this context. This theory&nbsp;hinges on well-known fundamental results---(1) the maximum-likelihood-estimate (MLE) is asymptotically unbiased and normally distributed, (2) its variability can be quantified via the inverse Fisher information, and (3) the likelihood-ratio-test (LRT) is asymptotically a Chi-Squared. In this talk, I will show that in the common modern setting where the number of features and the sample size are both large and comparable, classical results are far from accurate. In fact, &nbsp;(1) the MLE is biased, (2) its variability is far greater than classical results, and (3) the LRT is not distributed as a Chi-Square. Consequently, p-values obtained based on classical theory are completely invalid in high dimensions.&nbsp;In turn, I will propose a new theory that characterizes the asymptotic behavior of both the MLE and the LRT under some assumptions on the covariate distribution, in a high-dimensional setting. Empirical evidence demonstrates that this asymptotic theory provides accurate inference in finite samples. Practical implementation of these results necessitates the estimation of a single scalar, the overall signal strength, and I will propose a procedure for estimating this parameter precisely.&nbsp;This is based on joint work with Emmanuel Candes and Yuxin Chen.</p>
]]></value>
    </item>
  </field_summary>
  <field_time>
    <item>
      <value><![CDATA[2019-01-07T11:00:00-05:00]]></value>
      <value2><![CDATA[2019-01-07T12:00:00-05:00]]></value2>
      <rrule><![CDATA[]]></rrule>
      <timezone><![CDATA[America/New_York]]></timezone>
    </item>
  </field_time>
  <field_fee>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_fee>
  <field_extras>
      </field_extras>
  <field_audience>
          <item>
        <value><![CDATA[Faculty/Staff]]></value>
      </item>
          <item>
        <value><![CDATA[Postdoc]]></value>
      </item>
          <item>
        <value><![CDATA[Public]]></value>
      </item>
          <item>
        <value><![CDATA[Graduate students]]></value>
      </item>
          <item>
        <value><![CDATA[Undergraduate students]]></value>
      </item>
      </field_audience>
  <field_media>
      </field_media>
  <field_contact>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_contact>
  <field_location>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_location>
  <field_sidebar>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_sidebar>
  <field_phone>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_phone>
  <field_url>
    <item>
      <url><![CDATA[https://www.isye.gatech.edu/about/maps-directions/isye-building-complex]]></url>
      <title><![CDATA[ISyE Building Complex]]></title>
            <attributes><![CDATA[]]></attributes>
    </item>
  </field_url>
  <field_email>
    <item>
      <email><![CDATA[]]></email>
    </item>
  </field_email>
  <field_boilerplate>
    <item>
      <nid><![CDATA[]]></nid>
    </item>
  </field_boilerplate>
  <links_related>
      </links_related>
  <files>
      </files>
  <og_groups>
          <item>1242</item>
      </og_groups>
  <og_groups_both>
          <item><![CDATA[School of Industrial and Systems Engineering (ISYE)]]></item>
      </og_groups_both>
  <field_categories>
          <item>
        <tid>1795</tid>
        <value><![CDATA[Seminar/Lecture/Colloquium]]></value>
      </item>
      </field_categories>
  <field_keywords>
      </field_keywords>
  <userdata><![CDATA[]]></userdata>
</node>
