Skip to content

Commit

Permalink
#17 rename full render script
Browse files Browse the repository at this point in the history
  • Loading branch information
balthazarneveu committed Mar 21, 2024
1 parent aaf9429 commit ddedc98
Show file tree
Hide file tree
Showing 6 changed files with 41 additions and 25 deletions.
22 changes: 10 additions & 12 deletions report/content/1_introduction.tex
Original file line number Diff line number Diff line change
@@ -1,18 +1,16 @@
\section{Introduction}
\label{sec:intro}
The purpose of this report is a review of the paper ADOP: Approximate Differentiable One-Pixel Point Rendering by \citet{ruckert2022adop}.
Since the NERF paper published at ECCV 2020, there's been an incredible number of papers on neural rendering . Different approaches have been proposed with an underlying 3D data structure which allows rendering novel views of a scene. Neural radiance fields use a volumetric representation but other families of methods use a "proxy" such as a point clouds \cite{Aliev2020} or even meshes \cite{worchel2022nds}. \\
The purpose of this report is a review of the paper ADOP: Approximate Differentiable One-Pixel Point Rendering by \citet{ruckert2022adop}.
Novel view synthesis is an intense topic of research since Neural Radiance Fields (NERF \cite{mildenhall2020nerf}) showed that a neural network could model a complex radiance field and lead to impressive novel view synthesis using volumetric rendering. NERF jointly recovers geometry and object appearace without any prior knowledge on the geometry.
Other families of methods use a geometric "proxy" of the scene such as a point cloud \cite{Aliev2020} (or even meshes \cite{worchel2022nds}). \\
Let's put things simply: Point based rendering leads to images filled with holes and at first sight does not really look like an appropriate data structure to render continuous surfaces of objects.
We'll see how ADOP manages to use a point cloud structure jointly with an CNN (processing in the image space) to sample dense novel views of large real scenes.

A re-implementation from scratch in Pytorch of some of the key elements of the paper has been made in order to understand the most important points of the ADOP paper. To simplify the study, it seemed like a good idea to work on calibrated synthetic scenes. This way, we can focus on trying to evaluate the relevance of point based rendering and avoid the difficulties inherent to working with real world scenes, most nottably:
We'll see how ADOP:
\begin{itemize}
\item We assume linear RGB cameras without tone mappings.
\item We discard environment map (e.g. our background is black).
\item We generate photorealistic renders of synthetic meshes.
\item Camera poses are perfectly known.
\item Using meshes allows us sampling point clouds with normals without estimation errors such as the one we'd face with COLMAP.
\item We can easily control the number of points to be able to tests on limited capacity GPU.
\item manages to use a point cloud structure jointly with a CNN (processing in the image space) to sample dense novel views of large real scenes.
\item makes a special effort to try to model the camera pipeline to improve the quality of the rendered images.
\item does not inherently have an ability to model view dependent effects such as specularities or reflections.
\end{itemize}

\noindent Our code is available on ~\href{https://github.com/balthazarneveu/per-pixel-point-rendering}{GitHub}.
A re-implementation from scratch in Pytorch of some of the key elements of the paper has been made in order to understand the core aspects of the ADOP paper (which were already present in a previous paper named Neural Point Based Graphics \cite{Aliev2020}). To simplify the study, it seemed like a good idea to work on \textbf{calibrated synthetic scenes}. This way, I have been able to focus on trying to evaluate the relevance of point based rendering, see their limitations and avoid the difficulties inherent to working with real world scenes (large data and point cloud, imperfect).

\noindent Finally, my code is fully available on ~\href{https://github.com/balthazarneveu/per-pixel-point-rendering}{GitHub} and offers the possibility to generate novel views interactively.
30 changes: 24 additions & 6 deletions report/content/3_our_work.tex
Original file line number Diff line number Diff line change
Expand Up @@ -6,17 +6,26 @@ \section{Reimplementation Strategy}
\begin{itemize}
\item I used BlenderProc \cite{Denninger2023} to script and generate multiple synthetic scenes from the samples used in the NerF paper. All camera positions are known and share the same referential as my pytorch point renderer.
\item A perfect point cloud is sampled at random from the mesh (through the .obj file).
\item A pytorch function allows to project the points onto the image plane and includes soft depth test and normal culling.
\item A few pytorch based function allows to project the points onto the image plane and include soft depth test and normal culling.
\item A CNN is trained to predict the right dense color of the points in the image space.
\end{itemize}

Each element has been carefully tested by mostly visual tests.
Below is the list of all simplifications that have been made compared to the original ADOP paper.
\begin{itemize}
\item We assume linear RGB cameras without tone mappings.
\item We discard environment map (e.g. our background is black).
\item We generate photorealistic renders of synthetic meshes instead of using real photos.
\item Camera poses are perfectly known.
\item Using meshes allows us sampling point clouds with normals without estimation errors such as the one we'd face with COLMAP.
\item We can easily control the number of points to be able to tests on limited capacity GPU.
\end{itemize}

\begin{figure}[H]
\centering
\includegraphics[width=0.5\textwidth]{figures/data_prep_and_training.png}
\caption{Workflow: A python script allows preparing camera multiviews positions saved as \texttt{.json} files in order to render photorealistic views of a \texttt{.blend} or \texttt{.obj} files which come from internet resources or test scenes generated in python.
The .obj and view .json files tie together the BlenderProc rendering and my Pytorch point rendering implementation: The point cloud is sampled from the mesh and the camera poses are known. A neural network is trained to predict colors of the points by trying to match the multiple photorealistic renderings of BlenderProc.}
\caption{Workflow: \href{https://github.com/balthazarneveu/per-pixel-point-rendering/blob/main/studies/photorealistic\_rendering.py}{\texttt{photorealistic\_rendering.py}} allows preparing camera multiviews positions saved as \texttt{.json} files in order to render photorealistic views of a \texttt{.blend} or \texttt{.obj} files which come from internet resources or test scenes generated in python.
The .obj and view .json files tie together the BlenderProc rendering and my Pytorch point rendering implementation: The point cloud is sampled from the mesh and the camera poses are known. A neural network is trained using \href{https://github.com/balthazarneveu/per-pixel-point-rendering/blob/main/scripts/optimize\_point\_based\_neural\_renderer.py}{\texttt{optimize\_point\_based\_neural\_renderer.py}} to predict colors of the points by trying to match the multiple photorealistic renderings. CNN weights, pointcloud, normals and pseudo-colors are all saved along in a \texttt{.pt} file which later allows performing live novel view synthesis \href{https://github.com/balthazarneveu/per-pixel-point-rendering/blob/main/scripts/novel\_views\_interactive.py}{\texttt{novel\_views\_interactive.py}} based on the self developped \href{https://github.com/balthazarneveu/interactive\_pipe}{interactive\_pipe} library}.
\label{fig:data_and_train}
\end{figure}

Expand Down Expand Up @@ -110,17 +119,26 @@ \subsubsection{Scatter operation.}
\begin{itemize}
\item $(C1)$ The point must be inside the image frame $0\leq x<W$ and $0\leq y <H$.
\item $(C2)$ The point must be in front of the camera ($Z>0$).
\item $(C3)$ The normal of the point must be facing the camera.
\item $(C3)$ The normal of the point must face the camera - see ~\cref{fig:normal_culling_validation}.
\item $(C4)$ The point must not be occluded by another point.
\end{itemize}

\begin{figure}[htpb]
\centering
\includegraphics[width=0.45\textwidth]{figures/normal_culling_validation.png}
\caption{Validation of normal culling. We do not render points with normals pointing away from the camera.}
\label{fig:normal_culling_validation}
\end{figure}



\noindent \textbf{Soft depth test.}
That last condition requires some work:
Using a Z-buffer, it is possible to take the closest point to the camera. Since several points may fall into the same pixel cell, there will be aliasing as several pixels may be located on the same suface. The authors rely on previous work \cite{schutz2021rendering} to average pixels in a tiny range of depths behind the closest point. This is called a soft depth test and we'll describe this part in details as it required a tricky implementation with Pytorch.
Using a Z-buffer, it is possible to take the closest point to the camera. Since several points may fall into the same pixel cell, there will be aliasing as several pixels may be located on the same suface. The authors rely on previous work \cite{schutz2021rendering} to average point colors in a tiny range of depths located behind the closest point. This is called a soft depth test and we'll describe this part in details as it required a tricky implementation with Pytorch.

\begin{figure}[H]
\centering
\includegraphics[width=0.5\textwidth]{figures/fuzzy_depth_test_aliasing_large.png}
\includegraphics[width=0.45\textwidth]{figures/fuzzy_depth_test_aliasing_large.png}
\caption{Fuzzy depth test acts as an anti-aliasing filter. On this test scene, a point cloud made of 500.000 point located in the same plane with an alternate vertical red an blue stripes on . We use $\alpha=0$ on the top (hard depth test) and $\alpha=0.01$ on the bottom (soft depth test). On the right side, when using larger scales (lower resolutions), aliasing effect if naturally amplified.}
\label{fig:fuzzy_depth_test_AA}
\end{figure}
Expand Down
Binary file added report/figures/normal_culling_validation.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added report/figures/teaser_figure_2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
14 changes: 7 additions & 7 deletions report/report.tex
Original file line number Diff line number Diff line change
Expand Up @@ -17,12 +17,12 @@
\title{Review of ADOP: Approximate Differentiable One-Pixel Point Rendering}

%% AUTHORS
\author{Balthazar Neveu}
\affiliation{%
\institution{ENS Paris-Saclay}
\city{Saclay}
\country{France}
}
\author{Balthazar Neveu - ENS Paris-Saclay}
% \affiliation{%
% \institution{ENS Paris-Saclay}
% % \city{Saclay}
% % \country{France}
% }
\email{[email protected]}


Expand All @@ -34,7 +34,7 @@

%% Teaser figure
\begin{teaserfigure}
\includegraphics[width=1.\textwidth]{figures/teaser_figure.png}
\includegraphics[width=1.\textwidth]{figures/teaser_figure_2.png}
\centering
\caption{Overview of our partial re-implementation to study the ADOP \cite{ruckert2022adop} paper in ideal conditions with calibrated scenes. \\
\textit{Left}: Point based neural rendering reconstructs novel view from a point cloud. Original paper implementation in ADOP works on real photos of large scale scenes. It therefore tries to model camera exposure and non linear tone mapping to adapt to each camera rendering. \\
Expand Down
File renamed without changes.

0 comments on commit ddedc98

Please sign in to comment.