<![CDATA[MS Proposal by Joshua Kuperman]]>

663534 event 1669826026 1669826026 <![CDATA[MS Proposal by Joshua Kuperman]]> Joshua Kuperman
[Advisor: Dr. Evangelos Theodorou]
will propose a master’s thesis entitled,
Integrating Perception into Safe Differentiable Control
On
Tuesday, December 13 at 10:00 a.m.
Weber SS&T 200
Abstract
A great challenge exists at the intersection of perception and controls – integrating the uncertainty
present in perception-based state and obstacle estimation into safe control and trajectory optimization.
This thesis proposes a model-based learning framework with a policy defined by a safe differentiable
optimal controller. We will leverage many of the ideas of the world model, an unsupervised
reinforcement learning technique that has achieved human-level or better-than-human performance on
many Atari games. Specifically, we intend on training a variational autoencoder, a common
unsupervised image processing technique, to learn a latent space representation that is decoded into a
form that a safety function can be defined on, such as a depth map or occupancy grid. We will learn the
dynamics of this latent space, as well as a mapping from the latent space directly to a safety function, to
provide a differentiable controller with information on how the agent and the environment changes
over time as a function of the control actions. The controller will have the safety function embedded
into the dynamics using barrier states. The barrier state (BaS), and its discrete counterpart (DBaS), is a
recently developed method of embedding the safety of a system into the dynamics, providing greater
safety information than penalty methods, a regularizing effect, and safety guarantees to complex
dynamical systems in environments with many obstacles. Tolerant discrete barrier states (TDBaS)
approximate the safety guarantees of DBaS while improving exploration, allowing for unsafe initial state
trajectories, and providing several parameters that can be intuitively tuned for any application. This
thesis explores how differentiable trajectory optimization can learn these TDBaS safety parameters
given safety uncertainty in a reinforcement learning setting with limited supervision. Towards this end,
we will explore a variety of strategies and structures for the encoder-decoder network, the dynamics
network, the safety function network, and the differentiable controller such as Parametric Differentiable
Dynamic Programming (PDDP), Pontryagin Differentiable Programming (PDP), Barrier Nets, and
Differentiable MPC. We will test this framework in simulation, and if time allows, on hardware in the
Indoor Flight Laboratory or Robotarium.
Committee
• Prof. Evangelos Theodorou – School of Aerospace Engineering (advisor)
• Prof. Kyriakos G. Vamvoudakis – School of Aerospace Engineering
• Prof. Patricio Vela – School of Electrical and Computer Engineering

]]> <![CDATA[]]> 221981 1788 166866