Skip to content
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.

I don't understand a detail about the transition_reward_model loss function #13

Open
TianQi-777 opened this issue Apr 21, 2021 · 1 comment

Comments

@TianQi-777
Copy link

TianQi-777 commented Apr 21, 2021

Hi, my confusion is why the loss of transition_reward_model is as follows (here):

diff = (pred_next_latent_mu - next_h.detach()) / pred_next_latent_sigma
loss = torch.mean(0.5 * diff.pow(2) + torch.log(pred_next_latent_sigma))

Especially term torch.log(pred_next_latent_sigma), can you explain or provide some relevant references?

@Yufei-Kuang
Copy link

I guess this is a Maximum Likelihood Loss for Gaussian distribution. That is, $\log ( 1/sigma \exp(-(x-\mu)^2 / (2\sigma^2) )$

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants