Implement TtcaDca::GetGradient() correctly
There seem to be mistakes in the gradient implementation of the TtcaDca cost function (aka Dutra). As a result, the TtcaDca function does not work properly in a "gradient" policy, and we can only approximate the algorithm's intentions via sampling.
From what Axel and I can tell, this is a literal implementation of the equations from the Dutra paper, and we really do not see what is wrong. Maybe the paper itself contained a mistake?