This repository has been archived by the owner on Oct 31, 2023. It is now read-only.

I don't understand a detail about the transition_reward_model loss function #13

Open

TianQi-777 opened this issue Apr 21, 2021 · 1 comment

TianQi-777 commented Apr 21, 2021 •

edited

Loading

Hi, my confusion is why the loss of transition_reward_model is as follows (here):

diff = (pred_next_latent_mu - next_h.detach()) / pred_next_latent_sigma
loss = torch.mean(0.5 * diff.pow(2) + torch.log(pred_next_latent_sigma))

Especially term torch.log(pred_next_latent_sigma), can you explain or provide some relevant references?

The text was updated successfully, but these errors were encountered:

Yufei-Kuang commented Jun 4, 2021

I guess this is a Maximum Likelihood Loss for Gaussian distribution. That is, $\log ( 1/sigma \exp(-(x-\mu)^2 / (2\sigma^2) )$

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.