You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, @ultmaster Firstly, thanks for your awesome reproduction work. It really helps! However, there is something I don't understand and want to discuss with you. In the original paper, they added the sigmoid layer to the output of the predictor to help constrain the prediction within [0.1, 1]. However, in the current codes, the sigmoid layer is missing since you used MSE loss to train the model (the prediction is within [0, 100]). From my perspective, maybe they used the BCE loss to train the model and used MSE to evaluate the final performance (the prediction needs to multiply the base accuracy). Is my understanding right? Your help is much appreciated.
The text was updated successfully, but these errors were encountered:
Hi @cardwing. My bad. I re-read the paper and I think I completely missed that part. As I'm currently on some projects now and it takes a while to re-setup the environment, I was wondering if you had tried adding sigmoid and BCE loss. If you had, sharing experiment results would be highly appreciated. You can also submit a pull request if you feel interested.
Currently, the performance of adding the sigmoid layer and using BCE loss is bad (mse loss on the testing set is around 7). I will share the results here if it outperforms the MSE loss.
Hi, @ultmaster Firstly, thanks for your awesome reproduction work. It really helps! However, there is something I don't understand and want to discuss with you. In the original paper, they added the sigmoid layer to the output of the predictor to help constrain the prediction within [0.1, 1]. However, in the current codes, the sigmoid layer is missing since you used MSE loss to train the model (the prediction is within [0, 100]). From my perspective, maybe they used the BCE loss to train the model and used MSE to evaluate the final performance (the prediction needs to multiply the base accuracy). Is my understanding right? Your help is much appreciated.
The text was updated successfully, but these errors were encountered: