<node id="65151">
  <nid>65151</nid>
  <type>event</type>
  <uid>
    <user id="27187"><![CDATA[27187]]></user>
  </uid>
  <created>1300976838</created>
  <changed>1475891678</changed>
  <title><![CDATA[UPS Delivers Optimal Phase Diagram for High Dimensional Variable  Selection]]></title>
  <body><![CDATA[<p><strong>TITLE:&nbsp;&nbsp; </strong>UPS
Delivers Optimal Phase Diagram for High Dimensional Variable Selection</p><p><strong>SPEAKER:</strong>&nbsp; Jiashun Jin</p><p><strong>ABSTRACT: </strong></p><p>Consider a linear&nbsp; regression model&nbsp; \begin{equation*}<br />Y =&nbsp; X \beta + z, \qquad z \sim N(0, I_n),&nbsp;&nbsp; \qquad X = X_{n, p},<br />\end{equation*} where both $p$ and $n$ are large but $p &gt;&nbsp; n$.&nbsp;&nbsp; The vector $\beta$ is unknown but is&nbsp; sparse&nbsp; in the sense that only a small proportion of&nbsp;&nbsp; its coordinates is&nbsp; nonzero, and we are interested in&nbsp; identifying these nonzero ones.&nbsp; We
 model the coordinates of $\beta$ as&nbsp; samples from a two-component 
mixture $(1 - \eps) \nu_0 + \eps&nbsp; {\pi}$,&nbsp;&nbsp; and the rows of $X$ as&nbsp; 
samples from $N(0, \frac{1}{n}\Omega)$, where $\nu_0$ is the point mass at $0$,&nbsp; $\pi$ is a&nbsp; distribution, and $\Omega$ is a $p$ by $p$ correlation matrix which is unknown but is presumably sparse.<br /><br />We
 propose a two-stage variable selection procedure which we call the {\it
 UPS}.&nbsp;&nbsp; This&nbsp; is a Screen and Clean procedure,&nbsp; in which&nbsp;&nbsp; we screen 
with the&nbsp; Univariate thresholding, and clean with the Penalized MLE. In many situations,&nbsp; the UPS possesses two important properties: Sure 
Screening and Separable After Screening (SAS). These properties enable 
us to reduce&nbsp; the original regression problem to many small-size 
regression problems that can be fitted separately.&nbsp; As a result, the UPS
 is effective both in theory and in computation.<br />
<br />We measure the performance of&nbsp; variable selection procedure by the Hamming distance,&nbsp; and use an asymptotic framework where $p \goto \infty$ and $(\eps, \pi, n, \Omega)$ depend on $p$. We find that&nbsp; in many situations, the UPS achieves the optimal rate of convergence.<br />
We also find that in the $(\eps_p, \pi_p)$ space, there is a&nbsp; three-phase diagram shared&nbsp;&nbsp;&nbsp; by many choices of $\Omega$.&nbsp; In the first phase, it is possible to&nbsp;&nbsp; recover&nbsp; all&nbsp;&nbsp; signals. In the second phase, exact recovery is impossible, but it is possible to recover most of the signals.<br />
In the third phase, successful variable selection is impossible. The UPS partitions the phase space&nbsp; in the same way that&nbsp; the optimal procedures do, and recovers most of the signals<br />as long as successful variable selection is possible.<br />
<br />The lasso and the subset selection (also known as the $L^1$- and 
$L^0$-penalization methods, respectively) are well-known approaches to 
variable selection. However,<br />somewhat surprisingly, there are regions
 in the phase space where&nbsp; neither the lasso nor the subset selection is
 rate optimal, even&nbsp; for very simple $\Omega$. The lasso is non-optimal 
because it is too loose in&nbsp; filtering out&nbsp; fake signals (i.e. noise that
 is highly correlated with a signal), and the subset selection is 
non-optimal&nbsp; because it tends to kill one or more signals&nbsp;&nbsp; in 
correlated pairs, triplets, etc..</p><p><strong><br /></strong></p><table border="0" cellspacing="0" cellpadding="0" width="563"><tbody><tr><td align="left" valign="top">
  </td>
 </tr>
</tbody></table>]]></body>
  <field_summary_sentence>
    <item>
      <value><![CDATA[UPS Delivers Optimal Phase Diagram for High Dimensional Variable  Selection]]></value>
    </item>
  </field_summary_sentence>
  <field_summary>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_summary>
  <field_time>
    <item>
      <value><![CDATA[2011-04-07T12:00:00-04:00]]></value>
      <value2><![CDATA[2011-04-07T13:00:00-04:00]]></value2>
      <rrule><![CDATA[]]></rrule>
      <timezone><![CDATA[America/New_York]]></timezone>
    </item>
  </field_time>
  <field_fee>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_fee>
  <field_extras>
      </field_extras>
  <field_audience>
      </field_audience>
  <field_media>
      </field_media>
  <field_contact>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_contact>
  <field_location>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_location>
  <field_sidebar>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_sidebar>
  <field_phone>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_phone>
  <field_url>
    <item>
      <url><![CDATA[]]></url>
      <title><![CDATA[]]></title>
            <attributes><![CDATA[]]></attributes>
    </item>
  </field_url>
  <field_email>
    <item>
      <email><![CDATA[]]></email>
    </item>
  </field_email>
  <field_boilerplate>
    <item>
      <nid><![CDATA[]]></nid>
    </item>
  </field_boilerplate>
  <links_related>
      </links_related>
  <files>
      </files>
  <og_groups>
          <item>1242</item>
      </og_groups>
  <og_groups_both>
          <item><![CDATA[School of Industrial and Systems Engineering (ISYE)]]></item>
      </og_groups_both>
  <field_categories>
      </field_categories>
  <field_keywords>
      </field_keywords>
  <userdata><![CDATA[]]></userdata>
</node>
