change readme

SyneRBI · Sep 27, 2024 · 3b8573b · 3b8573b
1 parent 2217e0a
commit 3b8573b
Showing 1 changed file with 4 additions and 2 deletions.
diff --git a/README.md b/README.md
@@ -19,8 +19,10 @@ We employ the traditional BSREM algorithm. The number of subsets is automaticall
 ### 2) BSREM preconditioner, DOwG step size rule, SGD (in branch: ews_sgd)
 We employ the traditional BSREM algorithm. The number of subsets is automatically calculated using some heuristics (see below). We use [DoWG](https://arxiv.org/abs/2305.16284) (Distance over Weighted Gradients) for the step size calculation.
 
-### 3) BSREM preconditioner, full gradient descent, Barzilai-Borwein step size rule (in branch: full_gd)
-The characteristics of the datasets varied a lot, i.e., we had low count data, different scanner setups, TOF data, and so on. We tried a lot of different algorithms and approaches, both classical and deep learning based, but it was hard to design a method, which works consistently for all these different settings. Based on this experience, we submit a full gradient descent algorithm with a Barzilai-Borwein step size rule and a BSREM-type preconditioner. Using the full gradient goes against almost all empirical results, which show that the convergence can be speed by using subsets. However, most work look at a speed up with respect to number of iterations. For the challenge, we are interested in raw computation time. With respect to raw computation time, we sometimes saw only a minor different between full gradient descent and gradient descent using subsets.  
+### 3) adaptive preconditioner, full gradient descent, Barzilai-Borwein step size rule (in branch: full_gd)
+The characteristics of the datasets varied a lot, i.e., we had low count data, different scanner setups, TOF data, and so on. We tried a lot of different algorithms and approaches, both classical and deep learning based, but it was hard to design a method, which works consistently for all these different settings. Based on this experience, we submit a full gradient descent algorithm with a Barzilai-Borwein step size rule. Using the full gradient goes against almost all empirical results, which show that the convergence can be speed by using subsets. However, most work look at a speed up with respect to number of iterations. For the challenge, we are interested in raw computation time. With respect to raw computation time, we sometimes saw only a minor different between full gradient descent and gradient descent using subsets.  
+
+At the start of the optimisation we compare the norm of the gradient of the RDP prior with the norm of the gradient of the full objective function. If this fraction is less than 0.5, we simply use the BSREM type preconditioner. If this fraction is larger than 0.5, we use a similar preconditioner to [Tsai et al. (2018)](https://pubmed.ncbi.nlm.nih.gov/29610077/). However, we do not update the Hessian row sum of the likelihood term. Further, we found that the Hessian row sum of the RDP prior was instable, so instead we only used the diagonal elements of the Hessian evaluated at the current iterate. This defines a kind of strange preconditioner, where only the RDP part in the preconditioner is updated.
 
 
 ### Subset Choice