Evaluation details of the paper. #4

LuPaoPao · 2024-12-18T06:20:32Z

Thank you for your outstanding contribution. The paper says that the protocol test follows papers 4,41,66. However, waymo was not evaluated in their paper. I'd like to compare with your approach, but I can't find any more details. In what scenarios are your methods trained and tested? Was T+5 tested on all frames (from 5-th to 200)? Looking forward to your reply with more details of the experimental scheme. Thank you for your help!

yifanlu0227 · 2024-12-18T07:08:42Z

Hi, we use the official validation set and take the exact 100-th frame as the input (T=100), and render 100-th, 105-th, 110-th frames for evaluation. We render images of original resolution for SCube and all the baseline methods.

Since not all the scenes are static in the official validation set, we further apply dynamic object masks for 105-th/110-th frames, which project the dynamic bounding boxes from input frame (100-th) and target frame(105-th/110-th) to mask out possible dynamics.

LuPaoPao · 2024-12-18T07:13:02Z

Hi, we use the official validation set and take the exact 100-th frame as the input (T=100), and render 100-th, 105-th, 110-th frames for evaluation. We render images of original resolution for SCube and all the baseline methods.

Since not all the scenes are static in the official validation set, we further apply dynamic object masks for 105-th/110-th frames, which project the dynamic bounding boxes from input frame (100-th) and target frame(105-th/110-th) to mask out possible dynamics.

Thank you for your response. The last question is are you training on all training set of Waymo datasets? Looking forward to your reply, thank you very much!

yifanlu0227 · 2024-12-18T07:32:37Z

It depends on the training stages:

For the geometry reconstruction stage, we use both static and dynamic scenes with good voxels. They remove bad clips (e.g. those have few ego movements so few voxels are accumulated) from the whole training set.
For the appearance reconstruction stage, since we need future timestamps for supervision, we simplify the training set into static scenes only with good voxels.

You can find the difference in their configuration yaml.

LuPaoPao · 2024-12-18T08:21:46Z

It depends on the training stages:

* For the geometry reconstruction stage, we use both static and dynamic scenes with good voxels. They remove bad clips (e.g. those have few ego movements so few voxels are accumulated) from the whole training set.

* For the appearance reconstruction stage, since we need future timestamps for supervision, we simplify the training set into static scenes only with good voxels.

You can find the difference in their configuration yaml.

Thank you for your reply! Exclamation for your huge workload! You solved my problem!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluation details of the paper. #4

Evaluation details of the paper. #4

LuPaoPao commented Dec 18, 2024

yifanlu0227 commented Dec 18, 2024

LuPaoPao commented Dec 18, 2024 •

edited

Loading

yifanlu0227 commented Dec 18, 2024 •

edited

Loading

LuPaoPao commented Dec 18, 2024

Evaluation details of the paper. #4

Evaluation details of the paper. #4

Comments

LuPaoPao commented Dec 18, 2024

yifanlu0227 commented Dec 18, 2024

LuPaoPao commented Dec 18, 2024 • edited Loading

yifanlu0227 commented Dec 18, 2024 • edited Loading

LuPaoPao commented Dec 18, 2024

LuPaoPao commented Dec 18, 2024 •

edited

Loading

yifanlu0227 commented Dec 18, 2024 •

edited

Loading