<node id="595496">
  <nid>595496</nid>
  <type>event</type>
  <uid>
    <user id="34003"><![CDATA[34003]]></user>
  </uid>
  <created>1504630162</created>
  <changed>1505147676</changed>
  <title><![CDATA[PACE Big Data Workshop]]></title>
  <body><![CDATA[<p>About this workshop:<br />
<br />
This workshop is sponsored by the NSF&#39;s XSEDE (The Extreme Science and Engineering Development Environment, <a href="https://www.xsede.org/" id="LPlnk509258" rel="noopener noreferrer" target="_blank">https://www.xsede.org/</a>) program. Staff members from Texas Advanced Computing Center (<a href="https://www.tacc.utexas.edu/" rel="noopener noreferrer" target="_blank">https://www.tacc.utexas.edu/</a>) will teach the workshop. The workshop is organized as four separate sessions to cover various topics in Big Data Analysis.&nbsp; Although participants are strongly encouraged to attend all sessions, the workshop is designed in a way such that participants may just attend selected sessions based on their background, schedule and needs.</p>

<p>&nbsp;</p>

<p>About Instructors:<br />
<br />
Ruizhu Huang is a research associate in the data intensive computing group at TACC. He has years of experience in big data analytics, machine learning, and data visualization. He has involved in various projects developing technologies that bridge the gap between traditional machine learning approaches and next-generation, data intensive computing methods involving High-Performance Computing (HPC) resources<br />
<br />
Amit Gupta is a Research Engineering/Scientist Associate III in the Data Mining and Statistics group at TACC. His research interests are in Distributed Systems and Tools to enable scaling of Big Data Applications on HPC infrastructure, Parallel Programming and Information Retrieval Systems for text. He has extensive experience with various applications ranging from scaling Transportation Simulations to Text Mining of Biological literature. He earned an MS in Computer Science from the University of Colorado at Boulder with Thesis research in the area of Operating Systems.<br />
<br />
Dr. Weijia Xu is a research scientist and manager of Data Mining and Statistics group at TACC. He received his Ph.D. in Computer Science from The University of Texas At Austin. Dr. Xu has over 50 peer-reviewed conference and journal publications in similarity-based data retrieval, data analysis, and information visualization with data from various scientific domains. He has served on program committees for several workshops and conferences in big data and high-performance computing area.</p>

<p>Part One: Introduction to Hadoop and Spark [<a class="x_OWAAutoLink" href="http://training.gatech.edu/courses/searchupcoming#view-14949" rel="noopener noreferrer" target="_blank">register here</a>]</p>

<div>Time: Sept 28 08:30am-12:30pm</div>

<div>Location: Marcus Nano Rm 1116</div>

<div>Capacity: 30 people</div>

<div>&nbsp;</div>

<div>The session will focus on introducing Hadoop and Spark cluster to beginner, the topic includes:</div>

<ul>
	<li>basic concepts used in MapReduce programming model</li>
	<li>major components of a Hadoop cluster</li>
	<li>how to get started with Hadoop on your own computer and with computing resources at TACC</li>
	<li>introduce Spark programming models and how Spark can work with a Hadoop cluster</li>
	<li>different ways to use Hadoop and Spark for analysis</li>
</ul>

<div>&nbsp;</div>

<div>Participants do not need have any particular programming background, but working knowledge of Linux operating system is preferred. Class includes 3 hours lecture and 1 hour hands-on.</div>

<div>&nbsp;</div>

<div>No show fee $25.00 applies if you don&#39;t show up in the session without cancelling it 5 days before the class.</div>

<div>&nbsp;</div>

<div>Part Two: Developing a scalable application with Spark [<a class="x_OWAAutoLink" href="http://training.gatech.edu/courses/searchupcoming#view-14950" rel="noopener noreferrer" target="_blank">register here</a>]</div>

<div>&nbsp;</div>

<div>Time: Sept 28 1:30pm-5:30pm</div>

<div>Location: Marcus Nano Rm 1116</div>

<div>Capacity: 30 people</div>

<div>&nbsp;</div>

<div>This session will focus on how to develop a scalable application with Spark programming model, the topic includes:</div>

<div>&nbsp;</div>

<ul>
	<li>review Spark programming model</li>
	<li>basic introduction to the Scala programming language</li>
	<li>how to run a Spark application</li>
	<li>keys features to make scalable application</li>
	<li>how to get started development using Spark after the class</li>
</ul>

<div>&nbsp;</div>

<div>Participant is expected to have prior knowledge on the concept of Hadoop and Spark cluster, knowledge of any programming language is preferred but not required.Class includes 3 hours lecture and 1 hour hands-on.</div>

<div>&nbsp;</div>

<div>No show fee $25.00 applies if you don&#39;t show up in the session without cancelling it 5 days before the class.</div>

<div>&nbsp;</div>

<div>Part Three: Common Practices on Hadoop and Spark Ecosystem [<a class="x_OWAAutoLink" href="http://training.gatech.edu/courses/searchupcoming#view-14951" rel="noopener noreferrer" target="_blank">register here</a>]</div>

<div>&nbsp;</div>

<div>Time: Sept 29 08:30am-12:30pm</div>

<div>Location: Marcus Nano Rm 1116</div>

<div>Capacity: 30 people</div>

<div>&nbsp;</div>

<div>This session will focus on general practices for practical analysis problem, the topic includes:</div>

<ul>
	<li>running batch jobs with different cluster deployment mode</li>
	<li>running interactive jobs</li>
	<li>explore existing libraries and applications including Hadoop streaming, MLlib, SparkSQL and Graph X</li>
	<li>Using Hadoop/Spark with R and Python</li>
</ul>

<div>&nbsp;</div>

<div>Participants should have basic knowledge, experience and are comfortable with coding with knowledge of the Hadoop system, concepts of parallelism. Class includes 3 hours lecture and 1 hour hands-on.</div>

<div>&nbsp;</div>

<div>No show fee $25.00 applies if you don&#39;t show up in the session without cancelling it 5 days before the class.</div>

<div>&nbsp;</div>

<div>Part Four: Advanced Topic on Big Data Analysis [<a class="x_OWAAutoLink" href="http://training.gatech.edu/courses/searchupcoming#view-14952" rel="noopener noreferrer" target="_blank">register here</a>]</div>

<div>&nbsp;</div>

<div>Time: Sept 29 01:30pm-03:30pm</div>

<div>Location: Marcus Nano Rm 1116</div>

<div>Capacity: 30 people</div>

<p>&nbsp;</p>

<div>This session will cover more algorithm details and also provides a hands-on consultation for GT researchers&#39; application, we will collect the use cases before the session, and walk through the selected use cases in details to demonstrate how to resolve the real world problem more efficiently.</div>
]]></body>
  <field_summary_sentence>
    <item>
      <value><![CDATA[Big Data Training]]></value>
    </item>
  </field_summary_sentence>
  <field_summary>
    <item>
      <value><![CDATA[<p>This workshop is provided by Texas Advanced Computing Center&nbsp; (TACC) researchers, and the aim is to introduce the Big Data Toolset to GT researchers and help researchers to identify and map their research problem to Big Data world, and find solution to the problem in the hand. There are four sessions, and researchers can choose one or more sessions to attend based on programming level and experience.</p>
]]></value>
    </item>
  </field_summary>
  <field_time>
    <item>
      <value><![CDATA[2017-09-28T01:00:00-04:00]]></value>
      <value2><![CDATA[2017-09-29T01:00:00-04:00]]></value2>
      <rrule><![CDATA[]]></rrule>
      <timezone><![CDATA[America/New_York]]></timezone>
    </item>
  </field_time>
  <field_fee>
    <item>
      <value><![CDATA[None]]></value>
    </item>
  </field_fee>
  <field_extras>
      </field_extras>
  <field_audience>
          <item>
        <value><![CDATA[Faculty/Staff]]></value>
      </item>
          <item>
        <value><![CDATA[Public]]></value>
      </item>
          <item>
        <value><![CDATA[Graduate students]]></value>
      </item>
          <item>
        <value><![CDATA[Undergraduate students]]></value>
      </item>
      </field_audience>
  <field_media>
      </field_media>
  <field_contact>
    <item>
      <value><![CDATA[<p>Fang (Cherry) Liu (Ph.D.)</p>

<p>fang.liu at gatech.edu</p>
]]></value>
    </item>
  </field_contact>
  <field_location>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_location>
  <field_sidebar>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_sidebar>
  <field_phone>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_phone>
  <field_url>
    <item>
      <url><![CDATA[http://www.myatlascms.com/map/?id=82&amp;mrkIid=11278]]></url>
      <title><![CDATA[Marcus Nano Rm 1116]]></title>
            <attributes><![CDATA[]]></attributes>
    </item>
  </field_url>
  <field_email>
    <item>
      <email><![CDATA[]]></email>
    </item>
  </field_email>
  <field_boilerplate>
    <item>
      <nid><![CDATA[]]></nid>
    </item>
  </field_boilerplate>
  <links_related>
      </links_related>
  <files>
      </files>
  <og_groups>
          <item>337231</item>
      </og_groups>
  <og_groups_both>
          <item><![CDATA[Georgia Tech High Performance Computing (PACE)]]></item>
      </og_groups_both>
  <field_categories>
          <item>
        <tid>1789</tid>
        <value><![CDATA[Conference/Symposium]]></value>
      </item>
      </field_categories>
  <field_keywords>
          <item>
        <tid>15092</tid>
        <value><![CDATA[big data]]></value>
      </item>
          <item>
        <tid>175412</tid>
        <value><![CDATA[Hadoop]]></value>
      </item>
          <item>
        <tid>167041</tid>
        <value><![CDATA[spark]]></value>
      </item>
          <item>
        <tid>9167</tid>
        <value><![CDATA[machine learning]]></value>
      </item>
      </field_keywords>
  <userdata><![CDATA[]]></userdata>
</node>
