Markov Decision Processes




Blimp Control

The blimp was found to be extremely difficult to control for many reasons. Thus it was thought that to be possible to machine learn a method to fly the blimp. If data could be gathered from actual flights of the blimp, then perhaps the blimp could plan appropriate motor controls.

This does not solve the problem of inaccurate headings from the tracker. Perhaps it would also be possible to plan paths which would help the tracker provide better data (similar to active sensing?), but this was not investigated.

Markov Decision Processes

Markov decision processes, or MDPs, recursively estimate the values of different locations in state space. Provided stochastic movement functions and reward values for different states, values for each location and policies for how to maximize reward are computed.

In this problem, a large reward value is given for a certain position and the transition functions are the accelerations from the blimp motors. Thus the state spaces incorporate both the location and speed of the blimp. It would be ideal to incorporate uncertainty of position into this model, but this make the problem difficult to scale.

Results

While never used to control the blimp, the initial results seem promising and appear to be similar to controls produced from hand-coded heuristics.

This image shows the computed values for different vertical states. The goal location is at 0,0. Brighter colors imply larger values.

This image shows the computed policy. Brighter colors mean that the motors should be powering stronger. Note that the acceleration data was gathered from the very noisy and jumpy tracking data.

Movies

These movies show the values and policies converging through value interation. Note that initially only the center of the state space is initialized and the information spreads outward.