diff --git a/README.md b/README.md
index ea0ee59..a328786 100644
--- a/README.md
+++ b/README.md
@@ -8,18 +8,19 @@
 
 
 ## Reconstruction Methods - Educated Warm Start
-To reduce the time required to reach the minimiser, we want to start closer to the minimiser. A better initialisation could reduce the number of steps an iterative algorithm needs and thus reduce the time. To this end, we employ a neural network, to learn a suitable initial image. The network is a (small) 3D convolutional neural network. It takes as input only the OSEM image, provided by the organisers. The network weights are available in the folder *checkpoint/*.
+To reduce the time required to reach the minimiser, we want to start closer to the minimiser. A better initialisation could reduce the number of steps an iterative algorithm needs and thus reduce the time. To this end, we employ a neural network, to learn a suitable initial image. The network is a (small) 3D convolutional neural network. All layers in the network have no bias and we ReLU activation functions. This results in a 1-homogeneous network. In this way, the network should be independent of the intensity of the image. It takes as input only the OSEM image, provided by the organisers. The network weights are available in the folder *checkpoint/*.
 
 We employ three different classical iterative algortihms.
 
+### 1) BSREM preconditioner, DOwG step size rule, SAGA gradient estimation (in branch: main)
+We employ the traditional BSREM algorithm. The number of subsets is automatically calculated using some heuristics (see below). We use [DoWG](https://arxiv.org/abs/2305.16284) (Distance over Weighted Gradients) for the step size calculation. We use SAGA to get an estimate of the full gradient.
 
-### 1) BSREM - DOwG (in branch: main)
-First, we just employ the traditional BSREM algorithm. The number of subsets is automatically calculated using some heuristics (see below). We employ [DoWG](https://arxiv.org/abs/2305.16284) (Distance over Weighted Gradients) for the step size calculation.
 
-### 2) BSREM - Full Gradient Descent
-The characteristics of the datasets varied a lot, i.e., we had low count data, different scanner setups, TOF data, and so on. We tried a lot of different algorithms and approaches, both classical and deep learning based, but it was hard to design a method, which works consistently for all these different settings. Based on this experience, we submit a full gradient descent algorithm with a Barzilai-Borwein step size rule and a BSREM-type preconditioner. Using the full gradient goes against almost all empirical results, which show that the convergence can be speed by using subsets. However, most work look at a speed up with respect to number of iterations. For the challenge, we are interested in raw computation time. With respect to raw computation time, we sometimes saw only a minor different between full gradient descent and gradient descent using subsets.  
+### 2) BSREM preconditioner, DOwG step size rule, SGD (in branch: ews_sgd)
+We employ the traditional BSREM algorithm. The number of subsets is automatically calculated using some heuristics (see below). We use [DoWG](https://arxiv.org/abs/2305.16284) (Distance over Weighted Gradients) for the step size calculation.
 
-### 3) To be decided
+### 3) BSREM preconditioner, full gradient descent, Barzilai-Borwein step size rule (in branch: full_gd)
+The characteristics of the datasets varied a lot, i.e., we had low count data, different scanner setups, TOF data, and so on. We tried a lot of different algorithms and approaches, both classical and deep learning based, but it was hard to design a method, which works consistently for all these different settings. Based on this experience, we submit a full gradient descent algorithm with a Barzilai-Borwein step size rule and a BSREM-type preconditioner. Using the full gradient goes against almost all empirical results, which show that the convergence can be speed by using subsets. However, most work look at a speed up with respect to number of iterations. For the challenge, we are interested in raw computation time. With respect to raw computation time, we sometimes saw only a minor different between full gradient descent and gradient descent using subsets.  
 
 
 ### Subset Choice 
diff --git a/bsrem.py b/bsrem.py
deleted file mode 100644
index 313947f..0000000
--- a/bsrem.py
+++ /dev/null
@@ -1,163 +0,0 @@
-#
-# SPDX-License-Identifier: Apache-2.0
-#
-# Classes implementing the BSREM algorithm in sirf.STIR
-#
-# Authors:  Kris Thielemans
-#
-# Copyright 2024 University College London
-
-import numpy
-import numpy as np 
-import sirf.STIR as STIR
-from sirf.Utilities import examples_data_path
-
-
-from cil.optimisation.algorithms import Algorithm 
-from utils.herman_meyer import herman_meyer_order
-import time 
-
-
-class BSREMSkeleton(Algorithm):
-    ''' Main implementation of a modified BSREM algorithm
-
-    This essentially implements constrained preconditioned gradient ascent
-    with an EM-type preconditioner.
-
-    In each update step, the gradient of a subset is computed, multiplied by a step_size and a EM-type preconditioner.
-    Before adding this to the previous iterate, an update_filter can be applied.
-
-    '''
-    def __init__(self, data, initial, 
-                 update_filter=STIR.TruncateToCylinderProcessor(),
-                 **kwargs):
-        '''
-        Arguments:
-        ``data``: list of items as returned by `partitioner`
-        ``initial``: initial estimate
-        ``initial_step_size``, ``relaxation_eta``: step-size constants
-        ``update_filter`` is applied on the (additive) update term, i.e. before adding to the previous iterate.
-        Set the filter to `None` if you don't want any.
-        '''
-        super().__init__(**kwargs)
-        self.x = initial.copy()
-        self.data = data
-        self.num_subsets = len(data)
-
-        # compute small number to add to image in preconditioner
-        # don't make it too small as otherwise the algorithm cannot recover from zeroes.
-        self.eps = initial.max()/1e3
-        self.average_sensitivity = initial.get_uniform_copy(0)
-        for s in range(len(data)):
-            self.average_sensitivity += self.subset_sensitivity(s)/self.num_subsets
-        # add a small number to avoid division by zero in the preconditioner
-        self.average_sensitivity += self.average_sensitivity.max()/1e4
-        
-        self.subset = 0
-        self.update_filter = update_filter
-        self.configured = True
-
-        self.subset_order = herman_meyer_order(self.num_subsets)
-
-        self.x_prev = None 
-        self.x_update_prev = None 
-
-        self.x_update = initial.get_uniform_copy(0)
-        self.new_x = initial.get_uniform_copy(0)
-        self.last_x = initial.get_uniform_copy(0)
-
-    def subset_sensitivity(self, subset_num):
-        raise NotImplementedError
-
-    def subset_gradient(self, x, subset_num):
-        raise NotImplementedError
-
-    def epoch(self):
-        return (self.iteration + 1) // self.num_subsets
-
-    def step_size(self):
-        return self.initial_step_size / (1 + self.relaxation_eta * self.epoch())
-
-    def update(self):
-        
-        g = self.subset_gradient(self.x, self.subset_order[self.subset])
-
-        g.multiply(self.x + self.eps, out=self.x_update)
-        self.x_update.divide(self.average_sensitivity, out=self.x_update)
-
-        if self.iteration == 0:
-            step_size = min(max(1/(self.x_update.norm() + 1e-3), 0.005), 3.5)
-        else:
-            delta_x = self.x - self.x_prev
-            delta_g = self.x_update_prev - self.x_update
-
-            dot_product = delta_g.dot(delta_x) # (deltag * deltax).sum()
-            alpha_long = 1.25*delta_x.norm()**2 / np.abs(dot_product)
-            #dot_product = delta_x.dot(delta_g)
-            #alpha_short = np.abs((dot_product).sum()) / delta_g.norm()**2 
-            #print("short / long: ", alpha_short, alpha_long)
-
-            step_size = max(alpha_long, 0.01) #np.sqrt(alpha_long*alpha_short)
-            #print("step size: ", step_size)
-        #print("step size: ", step_size)
-
-        self.x_prev = self.x.copy()
-        self.x_update_prev = self.x_update.copy()
-
-        if self.update_filter is not None:
-            self.update_filter.apply(self.x_update)
-        
-        momentum = 0.4
-        self.new_x.fill(self.x + step_size * self.x_update + momentum * (self.x - self.last_x))
-        self.last_x = self.x.copy()
-        
-        self.x.fill(self.new_x)
-        #self.x.sapyb(1.0, self.x_update, step_size, out=self.x)
-        #self.x += beta * (self.x - self.last_x)
-        #self.x += self.x_update * step_size
-
-        # threshold to non-negative
-        self.x.maximum(0, out=self.x)
-        self.subset = (self.subset + 1) % self.num_subsets
-
-
-    def update_objective(self):
-        # required for current CIL (needs to set self.loss)
-        self.loss.append(self.objective_function(self.x))
-
-    def objective_function(self, x):
-        ''' value of objective function summed over all subsets '''
-        v = 0
-        #for s in range(len(self.data)):
-        #    v += self.subset_objective(x, s)
-        return v
-
-    def subset_objective(self, x, subset_num):
-        ''' value of objective function for one subset '''
-        raise NotImplementedError
-
-
-class BSREM(BSREMSkeleton):
-    ''' BSREM implementation using sirf.STIR objective functions'''
-    def __init__(self, data, obj_funs, initial, **kwargs):
-        '''
-        construct Algorithm with lists of data and, objective functions, initial estimate, initial step size,
-        step-size relaxation (per epoch) and optionally Algorithm parameters
-        '''
-        self.obj_funs = obj_funs
-        super().__init__(data, initial, **kwargs)
-
-    def subset_sensitivity(self, subset_num):
-        ''' Compute sensitiSvity for a particular subset'''
-        self.obj_funs[subset_num].set_up(self.x)
-        # note: sirf.STIR Poisson likelihood uses `get_subset_sensitivity(0) for the whole
-        # sensitivity if there are no subsets in that likelihood
-        return self.obj_funs[subset_num].get_subset_sensitivity(0)
-
-    def subset_gradient(self, x, subset_num):
-        ''' Compute gradient at x for a particular subset'''
-        return self.obj_funs[subset_num].gradient(x)
-
-    def subset_objective(self, x, subset_num):
-        ''' value of objective function for one subset '''
-        return self.obj_funs[subset_num](x)
diff --git a/bsrem_bb.py b/bsrem_bb.py
deleted file mode 100644
index b004ef6..0000000
--- a/bsrem_bb.py
+++ /dev/null
@@ -1,159 +0,0 @@
-#
-# SPDX-License-Identifier: Apache-2.0
-#
-# Classes implementing the BSREM algorithm in sirf.STIR
-#
-# Authors:  Kris Thielemans
-#
-# Copyright 2024 University College London
-
-import numpy
-import numpy as np 
-import sirf.STIR as STIR
-from sirf.Utilities import examples_data_path
-
-
-from cil.optimisation.algorithms import Algorithm 
-from utils.herman_meyer import herman_meyer_order
-import time 
-
-
-class BSREMSkeleton(Algorithm):
-    ''' Main implementation of a modified BSREM algorithm
-
-    This essentially implements constrained preconditioned gradient ascent
-    with an EM-type preconditioner.
-
-    In each update step, the gradient of a subset is computed, multiplied by a step_size and a EM-type preconditioner.
-    Before adding this to the previous iterate, an update_filter can be applied.
-
-    '''
-    def __init__(self, data, initial, 
-                 update_filter=STIR.TruncateToCylinderProcessor(),
-                 **kwargs):
-        '''
-        Arguments:
-        ``data``: list of items as returned by `partitioner`
-        ``initial``: initial estimate
-        ``initial_step_size``, ``relaxation_eta``: step-size constants
-        ``update_filter`` is applied on the (additive) update term, i.e. before adding to the previous iterate.
-        Set the filter to `None` if you don't want any.
-        '''
-        super().__init__(**kwargs)
-        self.x = initial.copy()
-        self.data = data
-        self.num_subsets = len(data)
-
-        # compute small number to add to image in preconditioner
-        # don't make it too small as otherwise the algorithm cannot recover from zeroes.
-        self.eps = initial.max()/1e3
-        self.average_sensitivity = initial.get_uniform_copy(0)
-        for s in range(len(data)):
-            self.average_sensitivity += self.subset_sensitivity(s)/self.num_subsets
-        # add a small number to avoid division by zero in the preconditioner
-        self.average_sensitivity += self.average_sensitivity.max()/1e4
-        
-
-
-        self.precond = initial.get_uniform_copy(0)
-
-        self.subset = 0
-        self.update_filter = update_filter
-        self.configured = True
-
-        self.subset_order = herman_meyer_order(self.num_subsets)
-
-        self.x_prev = None 
-        self.x_update_prev = None 
-
-        self.x_update = initial.get_uniform_copy(0)
-
-    def subset_sensitivity(self, subset_num):
-        raise NotImplementedError
-
-    def subset_gradient(self, x, subset_num):
-        raise NotImplementedError
-
-    def epoch(self):
-        return (self.iteration + 1) // self.num_subsets
-
-    def step_size(self):
-        return self.initial_step_size / (1 + self.relaxation_eta * self.epoch())
-
-    def update(self):
-        
-        g = self.subset_gradient(self.x, self.subset_order[self.subset])
-        
-        g.multiply(self.x + self.eps, out=self.x_update)
-        
-        self.x_update.divide(self.average_sensitivity, out=self.x_update)
-        
-        if self.iteration == 0:
-            step_size = min(max(1/(self.x_update.norm() + 1e-3), 0.005), 3.0)
-        else:
-            delta_x = self.x - self.x_prev
-            delta_g = self.x_update_prev - self.x_update
-
-            dot_product = delta_g.dot(delta_x) # (deltag * deltax).sum()
-            alpha_long = delta_x.norm()**2 / np.abs(dot_product)
-            #dot_product = delta_x.dot(delta_g)
-            #alpha_short = np.abs((dot_product).sum()) / delta_g.norm()**2 
-            #print("short / long: ", alpha_short, alpha_long)
-
-            step_size = max(alpha_long, 0.01) #np.sqrt(alpha_long*alpha_short)
-            #print("step size: ", step_size)
-        #print("step size: ", step_size)
-
-        self.x_prev = self.x.copy()
-        self.x_update_prev = self.x_update.copy()
-
-        if self.update_filter is not None:
-            self.update_filter.apply(self.x_update)
-        
-        self.x.sapyb(1.0, self.x_update, step_size, out=self.x)
-
-        # threshold to non-negative
-        self.x.maximum(0, out=self.x)
-        self.subset = (self.subset + 1) % self.num_subsets
-
-
-    def update_objective(self):
-        # required for current CIL (needs to set self.loss)
-        self.loss.append(self.objective_function(self.x))
-
-    def objective_function(self, x):
-        ''' value of objective function summed over all subsets '''
-        v = 0
-        #for s in range(len(self.data)):
-        #    v += self.subset_objective(x, s)
-        return v
-
-    def subset_objective(self, x, subset_num):
-        ''' value of objective function for one subset '''
-        raise NotImplementedError
-
-
-class BSREM(BSREMSkeleton):
-    ''' BSREM implementation using sirf.STIR objective functions'''
-    def __init__(self, data, obj_funs, initial, **kwargs):
-        '''
-        construct Algorithm with lists of data and, objective functions, initial estimate, initial step size,
-        step-size relaxation (per epoch) and optionally Algorithm parameters
-        '''
-        self.obj_funs = obj_funs
-        super().__init__(data, initial, **kwargs)
-
-    def subset_sensitivity(self, subset_num):
-        ''' Compute sensitiSvity for a particular subset'''
-        self.obj_funs[subset_num].set_up(self.x)
-        # note: sirf.STIR Poisson likelihood uses `get_subset_sensitivity(0) for the whole
-        # sensitivity if there are no subsets in that likelihood
-        return self.obj_funs[subset_num].get_subset_sensitivity(0)
-
-    def subset_gradient(self, x, subset_num):
-        ''' Compute gradient at x for a particular subset'''
-        return self.obj_funs[subset_num].gradient(x)
-
-    def subset_objective(self, x, subset_num):
-        ''' value of objective function for one subset '''
-        return self.obj_funs[subset_num](x)
diff --git a/bsrem_saga.py b/bsrem_saga.py
index a9f34af..84632d9 100644
--- a/bsrem_saga.py
+++ b/bsrem_saga.py
@@ -31,43 +31,51 @@ class SAGASkeleton(Algorithm):
     Before adding this to the previous iterate, an update_filter can be applied.
 
     '''
-    def __init__(self, data, initial, average_sensitivity,
-                 update_filter=STIR.TruncateToCylinderProcessor(), **kwargs):
+    def __init__(self, data, initial, 
+                 update_filter=STIR.TruncateToCylinderProcessor(),
+                 **kwargs):
         '''
         Arguments:
         ``data``: list of items as returned by `partitioner`
         ``initial``: initial estimate
+        ``initial_step_size``, ``relaxation_eta``: step-size constants
         ``update_filter`` is applied on the (additive) update term, i.e. before adding to the previous iterate.
         Set the filter to `None` if you don't want any.
         '''
         super().__init__(**kwargs)
-
-        self.x = initial    
-        self.initial = initial.copy()
+        self.x = initial.copy()
         self.data = data
         self.num_subsets = len(data)
-        self.average_sensitivity = average_sensitivity
-        self.eps = self.dataset.OSEM_image.max()/1e3
+
+        # compute small number to add to image in preconditioner
+        # don't make it too small as otherwise the algorithm cannot recover from zeroes.
+        self.eps = initial.max()/1e3
+        self.average_sensitivity = initial.get_uniform_copy(0)
+        for s in range(len(data)):
+            self.average_sensitivity += self.subset_sensitivity(s)/self.num_subsets
+        # add a small number to avoid division by zero in the preconditioner
+        self.average_sensitivity += self.average_sensitivity.max()/1e4
         
+        self.precond = initial.get_uniform_copy(0)
+
         self.subset = 0
         self.update_filter = update_filter
         self.configured = True
 
-        # DOG parameters
-        self.max_distance = 0 
-        self.sum_gradient = 0    
-
-        self.alpha = None 
-        self.last_alpha = None 
         self.subset_order = herman_meyer_order(self.num_subsets)
 
+        self.x_prev = None 
+        self.x_update_prev = None 
+
+        self.x_update = initial.get_uniform_copy(0)
+
         self.gm = [self.x.get_uniform_copy(0) for _ in range(self.num_subsets)]
         
         self.sum_gm = self.x.get_uniform_copy(0)
         self.x_update = self.x.get_uniform_copy(0)
 
-        self.last_objective_function = self.objective_function_inter(self.x)
-        self.gamma = 1.0 # scaling for learning rate 
+        self.r = 0.1
+        self.v = 0 # weighted gradient sum 
 
     def subset_sensitivity(self, subset_num):
         raise NotImplementedError
@@ -85,52 +93,34 @@ def epoch(self):
         return self.iteration // self.num_subsets
 
     def update(self):
-        if self.epoch() % 4 == 0 and self.iteration % self.num_subsets == 0 and self.epoch() > 0:
-            loss = self.objective_function_inter(self.x)
-            #print("Objective at ", self.epoch(), " is = ", loss)
 
-            if loss < self.last_objective_function:
-                #print("Reduce learning rate!")
-                self.gamma = self.gamma * 0.75
-
-            self.last_objective_function = loss 
         # for the first epochs just do SGD
         if self.epoch() < 2:
             # construct gradient of subset 
             subset_choice = self.subset_order[self.subset]
             g = self.subset_gradient(self.x, subset_choice) 
-            #print("Gradient norm: ", g.norm())
+
             g.multiply(self.x + self.eps, out=self.x_update)
             self.x_update.divide(self.average_sensitivity, out=self.x_update)
-            #self.x_update = (self.x + self.eps) * g / self.average_sensitivity 
             
-            # SGD for two epochs 
-            if self.iteration == 0:
-                step_size_estimate = min(max(1/(self.x_update.norm() + 1e-3), 0.05), 3.0)
-                self.alpha = step_size_estimate
-
-            distance = (self.x - self.initial).norm()
-            if distance > self.max_distance:
-                self.max_distance = distance 
-
-            self.sum_gradient += self.x_update.norm()**2
-
-            if self.iteration > 0:
-                self.alpha = self.max_distance / np.sqrt(self.sum_gradient)
+            # DOwG learning rate: DOG unleashed!
+            self.r = max((self.x - self.initial).norm(), self.r)
+            self.v += self.r**2 * self.x_update.norm()**2
+            step_size = self.r**2 / np.sqrt(self.v)
+            step_size = max(step_size, 1e-4) # dont get too small
             
             if self.update_filter is not None:
                 self.update_filter.apply(self.x_update)
 
             #print(self.alpha, self.sum_gradient)
-            self.x.sapyb(1.0, self.x_update, self.alpha, out=self.x)
+            self.x.sapyb(1.0, self.x_update, step_size, out=self.x)
             #self.x += self.alpha * self.x_update
             self.x.maximum(0, out=self.x)
 
         # do SAGA
         else:
             # do one step of full gradient descent to set up subset gradients
-            
-            if (self.epoch() in [2]) and self.iteration % self.num_subsets == 0:
+            if (self.epoch() in [2,6,10,14]) and self.iteration % self.num_subsets == 0:
                 # construct gradient of subset 
                 #print("One full gradient step to intialise SAGA")
                 g = self.x.get_uniform_copy(0)
@@ -141,22 +131,20 @@ def update(self):
                     #g += gm
 
                 g /= self.num_subsets
-                #print("Gradient norm: ", g.norm())
-                distance = (self.x - self.initial).norm()
-                if distance > self.max_distance:
-                    self.max_distance = distance 
+                
+                # DOwG learning rate: DOG unleashed!
+                self.r = max((self.x - self.initial).norm(), self.r)
+                self.v += self.r**2 * self.x_update.norm()**2
+                step_size = self.r**2 / np.sqrt(self.v)
+                step_size = max(step_size, 1e-4) # dont get too small
 
                 g.multiply(self.x + self.eps, out=self.x_update)
                 self.x_update.divide(self.average_sensitivity, out=self.x_update)
-                #self.x_update = (self.x + self.eps) * g / self.average_sensitivity 
-                self.sum_gradient += self.x_update.norm()**2
-                self.alpha = self.max_distance / np.sqrt(self.sum_gradient)
-
+                
                 if self.update_filter is not None:
                     self.update_filter.apply(self.x_update)
                 
-                self.x.sapyb(1.0, self.x_update, self.gamma*self.alpha, out=self.x)
-                #self.x += self.alpha * self.x_update
+                self.x.sapyb(1.0, self.x_update, step_size, out=self.x)
 
                 # threshold to non-negative
                 self.x.maximum(0, out=self.x)
@@ -169,27 +157,21 @@ def update(self):
             subset_choice = self.subset_order[self.subset]
             g = self.subset_gradient(self.x, subset_choice) 
 
-            #gradient = self.num_subsets * (g - self.gm[subset_choice]) + self.sum_gm #/ self.num_subsets
             gradient = (g - self.gm[subset_choice]) + self.sum_gm / self.num_subsets
-
-            distance = (self.x - self.initial).norm()
-            if distance > self.max_distance:
-                self.max_distance = distance 
-
+        
             gradient.multiply(self.x + self.eps, out=self.x_update)
             self.x_update.divide(self.average_sensitivity, out=self.x_update)
-            #self.x_update = (self.x + self.eps) * gradient / self.average_sensitivity 
-
-            self.sum_gradient += self.x_update.norm()**2
-
+        
             if self.update_filter is not None:
                 self.update_filter.apply(self.x_update)
 
-            # DOG lr
-            self.alpha = self.max_distance / np.sqrt(self.sum_gradient)
+            # DOwG learning rate: DOG unleashed!
+            self.r = max((self.x - self.initial).norm(), self.r)
+            self.v += self.r**2 * self.x_update.norm()**2
+            step_size = self.r**2 / np.sqrt(self.v)
+            step_size = max(step_size, 1e-4) # dont get too small
 
-            self.x.sapyb(1.0, self.x_update, self.gamma*self.alpha, out=self.x)
-            #self.x += self.alpha * self.x_update
+            self.x.sapyb(1.0, self.x_update, step_size, out=self.x)
 
             # threshold to non-negative
             self.x.maximum(0, out=self.x)
@@ -197,9 +179,7 @@ def update(self):
         self.sum_gm = self.sum_gm - self.gm[subset_choice] + g
         self.gm[subset_choice] = g
 
-        
         self.subset = (self.subset + 1) % self.num_subsets
-        self.last_alpha = self.alpha
         
     def update_objective(self):
         # required for current CIL (needs to set self.loss)
diff --git a/legacy_stuff/bsrem.py b/legacy_stuff/bsrem.py
index 4534a17..313947f 100644
--- a/legacy_stuff/bsrem.py
+++ b/legacy_stuff/bsrem.py
@@ -7,122 +7,119 @@
 #
 # Copyright 2024 University College London
 
-
+import numpy
+import numpy as np 
 import sirf.STIR as STIR
 from sirf.Utilities import examples_data_path
 
+
 from cil.optimisation.algorithms import Algorithm 
 from utils.herman_meyer import herman_meyer_order
-import numpy as np
+import time 
+
 
 class BSREMSkeleton(Algorithm):
-    def __init__(self, data,
-                 update_filter=STIR.TruncateToCylinderProcessor(), **kwargs):
+    ''' Main implementation of a modified BSREM algorithm
+
+    This essentially implements constrained preconditioned gradient ascent
+    with an EM-type preconditioner.
+
+    In each update step, the gradient of a subset is computed, multiplied by a step_size and a EM-type preconditioner.
+    Before adding this to the previous iterate, an update_filter can be applied.
+
+    '''
+    def __init__(self, data, initial, 
+                 update_filter=STIR.TruncateToCylinderProcessor(),
+                 **kwargs):
+        '''
+        Arguments:
+        ``data``: list of items as returned by `partitioner`
+        ``initial``: initial estimate
+        ``initial_step_size``, ``relaxation_eta``: step-size constants
+        ``update_filter`` is applied on the (additive) update term, i.e. before adding to the previous iterate.
+        Set the filter to `None` if you don't want any.
+        '''
         super().__init__(**kwargs)
-        initial = self.dataset.OSEM_image
-        self.initial = initial.copy()
         self.x = initial.copy()
         self.data = data
         self.num_subsets = len(data)
 
-        self.g = initial.get_uniform_copy(0)  
-
+        # compute small number to add to image in preconditioner
+        # don't make it too small as otherwise the algorithm cannot recover from zeroes.
         self.eps = initial.max()/1e3
         self.average_sensitivity = initial.get_uniform_copy(0)
         for s in range(len(data)):
             self.average_sensitivity += self.subset_sensitivity(s)/self.num_subsets
         # add a small number to avoid division by zero in the preconditioner
         self.average_sensitivity += self.average_sensitivity.max()/1e4
-
-        #self.kappa_sq = self.dataset.kappa.power(2)
-
-        self.x_update = initial.get_uniform_copy(0)
+        
         self.subset = 0
         self.update_filter = update_filter
-        self.subset_order = herman_meyer_order(self.num_subsets)
         self.configured = True
-        self.v_t = initial.get_uniform_copy(0)
-
 
-class BSREM(BSREMSkeleton):
-    def __init__(self, data, obj_funs, accumulate_gradient_iter, accumulate_gradient_num, gamma, **kwargs):
-        
-        self.obj_funs = obj_funs
-        
-        super().__init__(data, **kwargs)
-        self.accumulate_gradient_iter = accumulate_gradient_iter
-        self.accumulate_gradient_num = accumulate_gradient_num
-
-        self.alpha = None
+        self.subset_order = herman_meyer_order(self.num_subsets)
 
-        self.gamma = gamma
+        self.x_prev = None 
+        self.x_update_prev = None 
 
-        # DOG parameters
-        self.max_distance = 0 
-        self.sum_gradient = 0    
+        self.x_update = initial.get_uniform_copy(0)
+        self.new_x = initial.get_uniform_copy(0)
+        self.last_x = initial.get_uniform_copy(0)
 
-        self.num_subsets_initial = len(data)
+    def subset_sensitivity(self, subset_num):
+        raise NotImplementedError
 
-        # check list of accumulate_gradient_iter is monotonically increasing
-        assert all(self.accumulate_gradient_iter[i] < self.accumulate_gradient_iter[i+1] for i in range(len(self.accumulate_gradient_iter)-1))
-        # check if accumulate_gradient_iter and accumulate_gradient_num have the same length
-        assert len(self.accumulate_gradient_iter) == len(self.accumulate_gradient_num)
+    def subset_gradient(self, x, subset_num):
+        raise NotImplementedError
 
     def epoch(self):
-            return (self.iteration + 1) // self.num_subsets_initial
+        return (self.iteration + 1) // self.num_subsets
 
-    def get_number_of_subsets_to_accumulate_gradient(self):
-        for index, boundary in enumerate(self.accumulate_gradient_iter):
-            if self.iteration < boundary*self.num_subsets_initial:
-                return self.accumulate_gradient_num[index]
-        return self.num_subsets
+    def step_size(self):
+        return self.initial_step_size / (1 + self.relaxation_eta * self.epoch())
 
     def update(self):
-        if self.iteration == 0:
-            num_to_accumulate = self.num_subsets
-        else:
-            num_to_accumulate = self.get_number_of_subsets_to_accumulate_gradient()
-
-        # use at most all subsets
-        if num_to_accumulate > self.num_subsets_initial:
-            num_to_accumulate = self.num_subsets_initial
-        
-        #print(f"Use {num_to_accumulate} subsets at iteration {self.iteration}")
-        for i in range(num_to_accumulate):
-            if i == 0:
-                self.g = self.obj_funs[self.subset_order[self.subset]].gradient(self.x)
-            else:
-                self.g += self.obj_funs[self.subset_order[self.subset]].gradient(self.x)
-            self.subset = (self.subset + 1) % self.num_subsets
-            #print(f"\n Added subset {i+1} (i.e. {self.subset}) of {num_to_accumulate}\n")
-        self.g /= num_to_accumulate
         
-        #self.x_update = self.g / (self.kappa_sq + 0.01) #(self.x + self.eps) * self.g / self.average_sensitivity 
-        self.x_update = (self.x + self.eps) * self.g / self.average_sensitivity 
+        g = self.subset_gradient(self.x, self.subset_order[self.subset])
 
-        if self.update_filter is not None:
-            self.update_filter.apply(self.x_update)
+        g.multiply(self.x + self.eps, out=self.x_update)
+        self.x_update.divide(self.average_sensitivity, out=self.x_update)
 
         if self.iteration == 0:
-            self.v_t = self.x_update
+            step_size = min(max(1/(self.x_update.norm() + 1e-3), 0.005), 3.5)
         else:
-            self.v_t = self.gamma * self.v_t + (1 - self.gamma) * self.x_update
+            delta_x = self.x - self.x_prev
+            delta_g = self.x_update_prev - self.x_update
 
-        # compute DOG learning rate 
-        if self.iteration == 0:
-            step_size_estimate = min(max(1/(self.v_t.norm() + 1e-3), 0.05), 3.0)
-            self.alpha = step_size_estimate
+            dot_product = delta_g.dot(delta_x) # (deltag * deltax).sum()
+            alpha_long = 1.25*delta_x.norm()**2 / np.abs(dot_product)
+            #dot_product = delta_x.dot(delta_g)
+            #alpha_short = np.abs((dot_product).sum()) / delta_g.norm()**2 
+            #print("short / long: ", alpha_short, alpha_long)
 
-        distance = (self.x - self.initial).norm()
-        if distance > self.max_distance:
-            self.max_distance = distance 
+            step_size = max(alpha_long, 0.01) #np.sqrt(alpha_long*alpha_short)
+            #print("step size: ", step_size)
+        #print("step size: ", step_size)
 
-        self.sum_gradient += self.x_update.norm()**2
+        self.x_prev = self.x.copy()
+        self.x_update_prev = self.x_update.copy()
 
-        if self.iteration > 0:
-            self.alpha = self.max_distance / np.sqrt(self.sum_gradient)
-        self.x +=  self.alpha * self.v_t
+        if self.update_filter is not None:
+            self.update_filter.apply(self.x_update)
+        
+        momentum = 0.4
+        self.new_x.fill(self.x + step_size * self.x_update + momentum * (self.x - self.last_x))
+        self.last_x = self.x.copy()
+        
+        self.x.fill(self.new_x)
+        #self.x.sapyb(1.0, self.x_update, step_size, out=self.x)
+        #self.x += beta * (self.x - self.last_x)
+        #self.x += self.x_update * step_size
+
+        # threshold to non-negative
         self.x.maximum(0, out=self.x)
+        self.subset = (self.subset + 1) % self.num_subsets
+
 
     def update_objective(self):
         # required for current CIL (needs to set self.loss)
@@ -132,12 +129,35 @@ def objective_function(self, x):
         ''' value of objective function summed over all subsets '''
         v = 0
         #for s in range(len(self.data)):
-        #    v += self.obj_funs[s](x)
+        #    v += self.subset_objective(x, s)
         return v
 
+    def subset_objective(self, x, subset_num):
+        ''' value of objective function for one subset '''
+        raise NotImplementedError
+
+
+class BSREM(BSREMSkeleton):
+    ''' BSREM implementation using sirf.STIR objective functions'''
+    def __init__(self, data, obj_funs, initial, **kwargs):
+        '''
+        construct Algorithm with lists of data and, objective functions, initial estimate, initial step size,
+        step-size relaxation (per epoch) and optionally Algorithm parameters
+        '''
+        self.obj_funs = obj_funs
+        super().__init__(data, initial, **kwargs)
+
     def subset_sensitivity(self, subset_num):
-        ''' Compute sensitivity for a particular subset'''
+        ''' Compute sensitiSvity for a particular subset'''
         self.obj_funs[subset_num].set_up(self.x)
         # note: sirf.STIR Poisson likelihood uses `get_subset_sensitivity(0) for the whole
         # sensitivity if there are no subsets in that likelihood
-        return self.obj_funs[subset_num].get_subset_sensitivity(0)
\ No newline at end of file
+        return self.obj_funs[subset_num].get_subset_sensitivity(0)
+
+    def subset_gradient(self, x, subset_num):
+        ''' Compute gradient at x for a particular subset'''
+        return self.obj_funs[subset_num].gradient(x)
+
+    def subset_objective(self, x, subset_num):
+        ''' value of objective function for one subset '''
+        return self.obj_funs[subset_num](x)
diff --git a/legacy_stuff/bsrem_bb.py b/legacy_stuff/bsrem_bb.py
index 5d3feed..b004ef6 100644
--- a/legacy_stuff/bsrem_bb.py
+++ b/legacy_stuff/bsrem_bb.py
@@ -12,10 +12,12 @@
 import sirf.STIR as STIR
 from sirf.Utilities import examples_data_path
 
+
 from cil.optimisation.algorithms import Algorithm 
 from utils.herman_meyer import herman_meyer_order
 import time 
 
+
 class BSREMSkeleton(Algorithm):
     ''' Main implementation of a modified BSREM algorithm
 
@@ -26,7 +28,7 @@ class BSREMSkeleton(Algorithm):
     Before adding this to the previous iterate, an update_filter can be applied.
 
     '''
-    def __init__(self, data, initial, initial_step_size, bb_init_mode, beta, 
+    def __init__(self, data, initial, 
                  update_filter=STIR.TruncateToCylinderProcessor(),
                  **kwargs):
         '''
@@ -41,8 +43,7 @@ def __init__(self, data, initial, initial_step_size, bb_init_mode, beta,
         self.x = initial.copy()
         self.data = data
         self.num_subsets = len(data)
-        self.initial_step_size = initial_step_size
-        self.bb_init_mode = bb_init_mode
+
         # compute small number to add to image in preconditioner
         # don't make it too small as otherwise the algorithm cannot recover from zeroes.
         self.eps = initial.max()/1e3
@@ -51,8 +52,10 @@ def __init__(self, data, initial, initial_step_size, bb_init_mode, beta,
             self.average_sensitivity += self.subset_sensitivity(s)/self.num_subsets
         # add a small number to avoid division by zero in the preconditioner
         self.average_sensitivity += self.average_sensitivity.max()/1e4
+        
 
-        np.save("average_sens.npy", self.average_sensitivity.as_array())
+
+        self.precond = initial.get_uniform_copy(0)
 
         self.subset = 0
         self.update_filter = update_filter
@@ -63,21 +66,7 @@ def __init__(self, data, initial, initial_step_size, bb_init_mode, beta,
         self.x_prev = None 
         self.x_update_prev = None 
 
-        self.x_update_epoch_prev = initial.get_uniform_copy(0)
-        self.x_epoch = None 
-        self.x_epoch_prev = None 
-        
-        self.g_minus1 = None
-
-        self.g = initial.get_uniform_copy(0)  
-
-        self.beta = beta #0.6 # / self.num_subsets
-
-        self.alpha = self.initial_step_size
-
-        #self.phi = lambda x: (x + 1) // self.num_subsets + 1
-
-        self.c = 1 
+        self.x_update = initial.get_uniform_copy(0)
 
     def subset_sensitivity(self, subset_num):
         raise NotImplementedError
@@ -92,88 +81,42 @@ def step_size(self):
         return self.initial_step_size / (1 + self.relaxation_eta * self.epoch())
 
     def update(self):
+        
         g = self.subset_gradient(self.x, self.subset_order[self.subset])
         
-        self.x_update = (self.x + self.eps) * g / self.average_sensitivity 
+        g.multiply(self.x + self.eps, out=self.x_update)
         
+        self.x_update.divide(self.average_sensitivity, out=self.x_update)
         
+        if self.iteration == 0:
+            step_size = min(max(1/(self.x_update.norm() + 1e-3), 0.005), 3.0)
+        else:
+            delta_x = self.x - self.x_prev
+            delta_g = self.x_update_prev - self.x_update
+
+            dot_product = delta_g.dot(delta_x) # (deltag * deltax).sum()
+            alpha_long = delta_x.norm()**2 / np.abs(dot_product)
+            #dot_product = delta_x.dot(delta_g)
+            #alpha_short = np.abs((dot_product).sum()) / delta_g.norm()**2 
+            #print("short / long: ", alpha_short, alpha_long)
 
+            step_size = max(alpha_long, 0.01) #np.sqrt(alpha_long*alpha_short)
+            #print("step size: ", step_size)
+        #print("step size: ", step_size)
 
-        #print(self.iteration, self.epoch())
-
-        if (self.iteration + 1) % self.num_subsets == 0:
-            #print("iteration: ", self.iteration)
-            if self.x_epoch is not None:
-                #print("self.x_epoch_prev = self.x_epoch.clone() ")
-                self.x_epoch_prev = self.x_epoch.clone() 
-            
-            #print("self.x_epoch = self.x.clone()")
-            self.x_epoch = self.x.clone()
-
-
-        if self.epoch() >= 2:
-            if (self.iteration + 1) % self.num_subsets == 0:
-                #print("iteration: ", self.iteration)
-                #print("calculate step size!")
-                delta_x = self.x_epoch - self.x_epoch_prev
-                delta_g = self.g_minus1 - self.g 
-
-                numerator = (delta_x * (self.x + self.eps)  / self.average_sensitivity * delta_x)
-                if self.update_filter is not None:
-                    self.update_filter.apply(self.numerator)
-                
-                self.alpha = 1. / self.num_subsets * numerator.sum() / np.abs((delta_x * delta_g).sum())
-                print("step size: ", self.alpha)
-                #self.alpha = np.clip(self.alpha, 0.1, 3.0)
-                k = self.epoch() 
-                phik = 0.1 * (k  + 1)
-                self.c = self.c ** ((k-2)/(k-1)) * (self.alpha*phik) ** (1/(k-1))
-                self.alpha = self.c / phik
-                self.alpha = np.clip(self.alpha, 0.05, 10)
-
-        if (self.iteration + 1) % self.num_subsets == 0 or self.iteration == 0:
-            
-            self.g_minus1 = self.g.clone()
-            self.g = self.g.get_uniform_copy(0)
-
-        if self.epoch() < 2:
-            ## compute step size 
-            
-            """
-            if self.x_prev is not None:
-                delta_x = self.x - self.x_prev
-                delta_g = self.x_update_prev - self.x_update 
-
-                alpha_long = delta_x.norm()**2 / np.abs((delta_x * delta_g).sum())
-                alpha_short = np.abs((delta_x * delta_g).sum()) / delta_g.norm()**2 
-                
-                if self.bb_init_mode == "long":
-                    self.alpha = alpha_long
-                elif self.bb_init_mode == "short":
-                    self.alpha = alpha_short
-                elif self.bb_init_mode == "mean":
-                    self.alpha = np.sqrt(alpha_long*alpha_short)
-                else:
-                    raise NotImplementedError
-                print(alpha_short, alpha_long, self.alpha)
-            """
-            self.x_prev = self.x.clone()
-            self.x_update_prev = self.x_update.clone()
-
-            self.alpha = self.step_size()
+        self.x_prev = self.x.copy()
+        self.x_update_prev = self.x_update.copy()
 
         if self.update_filter is not None:
             self.update_filter.apply(self.x_update)
         
+        self.x.sapyb(1.0, self.x_update, step_size, out=self.x)
 
-        self.g = self.beta * self.x_update + (1 - self.beta) * self.g
-        self.x += self.x_update * self.alpha
         # threshold to non-negative
         self.x.maximum(0, out=self.x)
         self.subset = (self.subset + 1) % self.num_subsets
 
 
-
     def update_objective(self):
         # required for current CIL (needs to set self.loss)
         self.loss.append(self.objective_function(self.x))
@@ -181,23 +124,24 @@ def update_objective(self):
     def objective_function(self, x):
         ''' value of objective function summed over all subsets '''
         v = 0
-        for s in range(len(self.data)):
-            v += self.subset_objective(x, s)
+        #for s in range(len(self.data)):
+        #    v += self.subset_objective(x, s)
         return v
 
     def subset_objective(self, x, subset_num):
         ''' value of objective function for one subset '''
         raise NotImplementedError
 
-class BSREM1(BSREMSkeleton):
+
+class BSREM(BSREMSkeleton):
     ''' BSREM implementation using sirf.STIR objective functions'''
-    def __init__(self, data, obj_funs, initial, initial_step_size=1, **kwargs):
+    def __init__(self, data, obj_funs, initial, **kwargs):
         '''
         construct Algorithm with lists of data and, objective functions, initial estimate, initial step size,
         step-size relaxation (per epoch) and optionally Algorithm parameters
         '''
         self.obj_funs = obj_funs
-        super().__init__(data, initial, initial_step_size, **kwargs)
+        super().__init__(data, initial, **kwargs)
 
     def subset_sensitivity(self, subset_num):
         ''' Compute sensitiSvity for a particular subset'''
@@ -213,35 +157,3 @@ def subset_gradient(self, x, subset_num):
     def subset_objective(self, x, subset_num):
         ''' value of objective function for one subset '''
         return self.obj_funs[subset_num](x)
-
-class BSREM2(BSREMSkeleton):
-    ''' BSREM implementation using acquisition models and prior'''
-    def __init__(self, data, acq_models, prior, initial, initial_step_size=1, relaxation_eta=0, **kwargs):
-        '''
-        construct Algorithm with lists of data and acquisition models, prior, initial estimate, initial step size,
-        step-size relaxation (per epoch) and optionally Algorithm parameters.
-
-        WARNING: This version will use the same prior in each subset without rescaling. You should
-        therefore rescale the penalisation_factor of the prior before calling this function. This will
-        change in the future.
-        '''
-        self.acq_models = acq_models
-        self.prior = prior
-        super().__init__(data, initial, initial_step_size, relaxation_eta, **kwargs)
-
-    def subset_sensitivity(self, subset_num):
-        ''' Compute sensitivity for a particular subset'''
-        self.acq_models[subset_num].set_up(self.data[subset_num], self.x)
-        return self.acq_models[subset_num].backward(self.data[subset_num].get_uniform_copy(1))
-
-    def subset_gradient(self, x, subset_num):
-        ''' Compute gradient at x for a particular subset'''
-        f = self.acq_models[subset_num].forward(x)
-        quotient = self.data[subset_num] / f
-        return self.acq_models[subset_num].backward(quotient - 1) - self.prior.gradient(x)
-
-    def subset_objective(self, x, subset_num):
-        ''' value of objective function for one subset '''
-        f = self.acq_models[subset_num].forward(x)
-        return self.data[subset_num].dot(f.log()) - f.sum() - self.prior(x)
-
diff --git a/bsrem_bb_rdp.py b/legacy_stuff/bsrem_bb_rdp.py
similarity index 100%
rename from bsrem_bb_rdp.py
rename to legacy_stuff/bsrem_bb_rdp.py
diff --git a/bsrem_bb_saga.py b/legacy_stuff/bsrem_bb_saga.py
similarity index 100%
rename from bsrem_bb_saga.py
rename to legacy_stuff/bsrem_bb_saga.py
diff --git a/bsrem_bb_subset.py b/legacy_stuff/bsrem_bb_subset.py
similarity index 100%
rename from bsrem_bb_subset.py
rename to legacy_stuff/bsrem_bb_subset.py
diff --git a/bsrem_dowg.py b/legacy_stuff/bsrem_dowg.py
similarity index 100%
rename from bsrem_dowg.py
rename to legacy_stuff/bsrem_dowg.py
diff --git a/modified_petric.py b/legacy_stuff/modified_petric.py
similarity index 100%
rename from modified_petric.py
rename to legacy_stuff/modified_petric.py
diff --git a/one_step_model.py b/legacy_stuff/one_step_model.py
similarity index 100%
rename from one_step_model.py
rename to legacy_stuff/one_step_model.py
diff --git a/plot_results.py b/legacy_stuff/plot_results.py
similarity index 100%
rename from plot_results.py
rename to legacy_stuff/plot_results.py
diff --git a/reduced_petric.py b/legacy_stuff/reduced_petric.py
old mode 100755
new mode 100644
similarity index 100%
rename from reduced_petric.py
rename to legacy_stuff/reduced_petric.py
diff --git a/setup_model.py b/legacy_stuff/setup_model.py
similarity index 100%
rename from setup_model.py
rename to legacy_stuff/setup_model.py
diff --git a/test_single_iteration.py b/legacy_stuff/test_single_iteration.py
similarity index 100%
rename from test_single_iteration.py
rename to legacy_stuff/test_single_iteration.py
diff --git a/train_single_iteration.py b/legacy_stuff/train_single_iteration.py
similarity index 100%
rename from train_single_iteration.py
rename to legacy_stuff/train_single_iteration.py
diff --git a/main.py b/main.py
index 02a6abe..6a47290 100644
--- a/main.py
+++ b/main.py
@@ -8,7 +8,7 @@
 """
 
 
-from bsrem_bb import BSREM
+from bsrem_saga import BSREM
 from utils.number_of_subsets import compute_number_of_subsets
 
 from sirf.contrib.partitioner import partitioner
@@ -36,10 +36,12 @@ def __call__(self, algorithm: Algorithm):
 
 class Submission(BSREM):
     def __init__(self, data, 
-                       update_objective_interval: int = 2,
+                       update_objective_interval: int = 10,
                        **kwargs):
         
-        num_subsets = 1
+        tof = (data.acquired_data.shape[0] > 1)
+        views = data.acquired_data.shape[2]
+        num_subsets = compute_number_of_subsets(views, tof)
 
         data_sub, _, obj_funs = partitioner.data_partition(data.acquired_data, data.additive_term,
                                                                     data.mult_factors, num_subsets,
diff --git a/main_BSREM.py b/main_BSREM.py
deleted file mode 100644
index 3a2021d..0000000
--- a/main_BSREM.py
+++ /dev/null
@@ -1,67 +0,0 @@
-"""Main file to modify for submissions.
-
-Once renamed or symlinked as `main.py`, it will be used by `petric.py` as follows:
-
->>> from main import Submission, submission_callbacks
->>> from petric import data, metrics
->>> algorithm = Submission(data)
->>> algorithm.run(np.inf, callbacks=metrics + submission_callbacks)
-"""
-from cil.optimisation.algorithms import Algorithm
-from cil.optimisation.utilities import callbacks
-
-from sirf.contrib.partitioner import partitioner
-from copy import deepcopy
-from utils.number_of_subsets import compute_number_of_subsets
-from bsrem_bb_saga import BSREM
-
-assert issubclass(BSREM, Algorithm)
-
-
-class MaxIteration(callbacks.Callback):
-    """
-    The organisers try to `Submission(data).run(inf)` i.e. for infinite iterations (until timeout).
-    This callback forces stopping after `max_iteration` instead.
-    """
-    def __init__(self, max_iteration: int, verbose: int = 1):
-        super().__init__(verbose)
-        self.max_iteration = max_iteration
-
-    def __call__(self, algorithm: Algorithm):
-        if algorithm.iteration >= self.max_iteration:
-            raise StopIteration
-
-class Submission(BSREM):
-    def __init__(self, data, 
-                 update_objective_interval: int = 10,
-                 **kwargs):
-        
-        tof = (data.acquired_data.shape[0] > 1)
-        views = data.acquired_data.shape[2]
-        num_subsets = compute_number_of_subsets(views, tof)
-        print("Number of views: ", views, " use ", num_subsets, " subsets")
-        data_sub, _, obj_funs = partitioner.data_partition(data.acquired_data, data.additive_term,
-                                                                    data.mult_factors, num_subsets,
-                                                                    initial_image=data.OSEM_image)
-        
-        # WARNING: modifies prior strength with 1/num_subsets (as currently needed for BSREM implementations      
-        self.dataset = data 
-        data.prior.set_penalisation_factor(data.prior.get_penalisation_factor() / len(obj_funs))
-        data.prior.set_up(data.OSEM_image)
-
-        #print("prior: ", data.prior(data.OSEM_image))
-        #data.prior = data.prior.set_penalisation_factor(data.prior.get_penalisation_factor())
-        #data.prior.set_up(data.OSEM_image)
-        #print(data.prior.get_penalisation_factor())
-
-
-        #print("prior: ", data.prior(data.OSEM_image))
-
-        super().__init__(data_sub, 
-                         obj_funs, 
-                         prior=data.prior,
-                         initial=data.OSEM_image, 
-                         update_objective_interval=update_objective_interval)
-
-
-submission_callbacks = [] #[MaxIteration(660)]
diff --git a/main_Dowg.py b/main_Dowg.py
deleted file mode 100644
index 5f7882f..0000000
--- a/main_Dowg.py
+++ /dev/null
@@ -1,78 +0,0 @@
-
-from cil.optimisation.algorithms import Algorithm
-from cil.optimisation.utilities import callbacks
-
-"""
-EWS: Postprocessing + some iterative method
-
-"""
-
-
-from bsrem_dowg import BSREM
-from utils.number_of_subsets import compute_number_of_subsets
-
-from sirf.contrib.partitioner import partitioner
-#from utils.partioner_function import data_partition
-#from utils.partioner_function_no_obj import data_partition
-
-assert issubclass(BSREM, Algorithm)
-
-
-import torch 
-torch.cuda.set_per_process_memory_fraction(0.8)
-
-import setup_postprocessing 
-
-
-class MaxIteration(callbacks.Callback):
-    def __init__(self, max_iteration: int, verbose: int = 1):
-        super().__init__(verbose)
-        self.max_iteration = max_iteration
-
-    def __call__(self, algorithm: Algorithm):
-        if algorithm.iteration >= self.max_iteration:
-            raise StopIteration
-
-
-class Submission(BSREM):
-    def __init__(self, data, 
-                       update_objective_interval: int = 10,
-                       **kwargs):
-        
-        tof = (data.acquired_data.shape[0] > 1)
-        views = data.acquired_data.shape[2]
-        num_subsets = compute_number_of_subsets(views, tof)
-
-        data_sub, _, obj_funs = partitioner.data_partition(data.acquired_data, data.additive_term,
-                                                                    data.mult_factors, num_subsets,
-                                                                    initial_image=data.OSEM_image,
-                                                                    mode = "staggered")
-
-        self.dataset = data
-
-        # WARNING: modifies prior strength with 1/num_subsets
-        data.prior.set_penalisation_factor(data.prior.get_penalisation_factor() / len(data_sub))
-        data.prior.set_up(data.OSEM_image)
-
-        DEVICE = "cuda"
-
-        initial_images = torch.from_numpy(data.OSEM_image.as_array()).float().to(DEVICE).unsqueeze(0).unsqueeze(0)
-        with torch.no_grad():
-            x_pred = setup_postprocessing.postprocessing_model(initial_images) 
-            x_pred[x_pred < 0] = 0
-        
-        #del setup_model.network_precond
-        del initial_images
-       
-        initial = data.OSEM_image.clone()
-        initial.fill(x_pred.detach().cpu().numpy().squeeze())
-
-        for f in obj_funs: # add prior evenly to every objective function
-            f.set_prior(data.prior)
-
-        super().__init__(data=data_sub, 
-                         obj_funs=obj_funs,
-                         initial=initial,
-                         update_objective_interval=update_objective_interval)
-
-submission_callbacks = []
\ No newline at end of file
diff --git a/main_EWS_SAGA.py b/main_EWS_SAGA.py
deleted file mode 100644
index 6e1b74b..0000000
--- a/main_EWS_SAGA.py
+++ /dev/null
@@ -1,102 +0,0 @@
-
-from cil.optimisation.algorithms import Algorithm
-from cil.optimisation.utilities import callbacks
-
-
-from bsrem_saga import SAGA
-from utils.number_of_subsets import compute_number_of_subsets
-
-from sirf.contrib.partitioner import partitioner
-#from utils.partioner_function import data_partition
-#from utils.partioner_function_no_obj import data_partition
-
-assert issubclass(SAGA, Algorithm)
-
-
-import torch 
-torch.cuda.set_per_process_memory_fraction(0.8)
-
-import setup_model 
-
-
-class MaxIteration(callbacks.Callback):
-    def __init__(self, max_iteration: int, verbose: int = 1):
-        super().__init__(verbose)
-        self.max_iteration = max_iteration
-
-    def __call__(self, algorithm: Algorithm):
-        if algorithm.iteration >= self.max_iteration:
-            raise StopIteration
-
-
-class Submission(SAGA):
-    def __init__(self, data, 
-                       update_objective_interval: int = 10,
-                       **kwargs):
-        
-        tof = (data.acquired_data.shape[0] > 1)
-        views = data.acquired_data.shape[2]
-        num_subsets = compute_number_of_subsets(views, tof)
-
-
-        data_sub, _, obj_funs = partitioner.data_partition(data.acquired_data, data.additive_term,
-                                                                    data.mult_factors, num_subsets,
-                                                                    initial_image=data.OSEM_image,
-                                                                    mode = "staggered")
-
-        self.dataset = data
-
-        # WARNING: modifies prior strength with 1/num_subsets
-        data.prior.set_penalisation_factor(data.prior.get_penalisation_factor() / len(data_sub))
-        data.prior.set_up(data.OSEM_image)
-
-        sensitivity = data.OSEM_image.get_uniform_copy(0)
-        for s in range(len(data_sub)):
-            obj_funs[s].set_up(data.OSEM_image)
-            sensitivity.add(obj_funs[s].get_subset_sensitivity(0), out=sensitivity)
-
-        pll_grad = data.OSEM_image.get_uniform_copy(0)
-        for s in range(len(data_sub)):
-            pll_grad.add(obj_funs[s].gradient(data.OSEM_image), out=pll_grad)
-            
-        average_sensitivity = sensitivity.clone() / num_subsets
-        average_sensitivity += average_sensitivity.max()/1e4
-
-        sensitivity += sensitivity.max()/1e4
-        eps = data.OSEM_image.max()/1e3
-
-        prior_grad = data.prior.gradient(data.OSEM_image) * num_subsets
-
-        grad = (data.OSEM_image + eps) * pll_grad / sensitivity 
-        prior_grad = (data.OSEM_image + eps) * prior_grad / sensitivity 
-
-        DEVICE = "cuda"
-
-        initial_images = torch.from_numpy(data.OSEM_image.as_array()).float().to(DEVICE).unsqueeze(0)
-        prior_grads = torch.from_numpy(prior_grad.as_array()).float().to(DEVICE).unsqueeze(0)
-        pll_grads = torch.from_numpy(grad.as_array()).float().to(DEVICE).unsqueeze(0)
-
-        model_inp = torch.cat([initial_images, pll_grads, prior_grads], dim=0).unsqueeze(0)
-        with torch.no_grad():
-            x_pred = setup_model.network_precond(model_inp) 
-            x_pred[x_pred < 0] = 0
-        
-        #del setup_model.network_precond
-        del initial_images
-        del prior_grads
-        del pll_grads
-        del model_inp
-
-        initial = data.OSEM_image.clone()
-        initial.fill(x_pred.detach().cpu().numpy().squeeze())
-
-        for f in obj_funs: # add prior evenly to every objective function
-            f.set_prior(data.prior)
-
-        super().__init__(data=data_sub, 
-                         obj_funs=obj_funs,
-                         initial=initial,
-                         average_sensitivity=average_sensitivity,
-                         update_objective_interval=update_objective_interval)
-
-submission_callbacks = []
\ No newline at end of file
diff --git a/main_Full_Gradient.py b/main_Full_Gradient.py
deleted file mode 100644
index 007732e..0000000
--- a/main_Full_Gradient.py
+++ /dev/null
@@ -1,54 +0,0 @@
-"""Main file to modify for submissions.
-
-Once renamed or symlinked as `main.py`, it will be used by `petric.py` as follows:
-
->>> from main import Submission, submission_callbacks
->>> from petric import data, metrics
->>> algorithm = Submission(data)
->>> algorithm.run(np.inf, callbacks=metrics + submission_callbacks)
-"""
-from cil.optimisation.algorithms import Algorithm
-from cil.optimisation.utilities import callbacks
-
-from sirf.contrib.partitioner import partitioner
-
-from bsrem_bb import BSREM
-
-assert issubclass(BSREM, Algorithm)
-
-
-class MaxIteration(callbacks.Callback):
-    """
-    The organisers try to `Submission(data).run(inf)` i.e. for infinite iterations (until timeout).
-    This callback forces stopping after `max_iteration` instead.
-    """
-    def __init__(self, max_iteration: int, verbose: int = 1):
-        super().__init__(verbose)
-        self.max_iteration = max_iteration
-
-    def __call__(self, algorithm: Algorithm):
-        if algorithm.iteration >= self.max_iteration:
-            raise StopIteration
-
-class Submission(BSREM):
-    def __init__(self, data, 
-                 update_objective_interval: int = 1,
-                 **kwargs):
-
-        data_sub, _, obj_funs = partitioner.data_partition(data.acquired_data, data.additive_term,
-                                                                    data.mult_factors, 1,
-                                                                    initial_image=data.OSEM_image)
-        # WARNING: modifies prior strength with 1/num_subsets (as currently needed for BSREM implementations      
-        data.prior.set_penalisation_factor(data.prior.get_penalisation_factor() / len(obj_funs))
-        data.prior.set_up(data.OSEM_image)
-        for f in obj_funs: # add prior evenly to every objective function
-            f.set_prior(data.prior)
-        self.dataset = data
-
-        super().__init__(data_sub, 
-                         obj_funs, 
-                         initial=data.OSEM_image, 
-                         update_objective_interval=update_objective_interval)
-
-
-submission_callbacks = [] #[MaxIteration(660)]
diff --git a/main_SAGA.py b/main_SAGA.py
deleted file mode 100644
index 4b8fda1..0000000
--- a/main_SAGA.py
+++ /dev/null
@@ -1,57 +0,0 @@
-
-from cil.optimisation.algorithms import Algorithm
-from cil.optimisation.utilities import callbacks
-
-
-from bsrem_saga import SAGA
-from utils.number_of_subsets import compute_number_of_subsets
-
-#from sirf.contrib.partitioner import partitioner
-from utils.partioner_function import data_partition
-
-assert issubclass(SAGA, Algorithm)
-
-
-#import torch 
-#torch.cuda.set_per_process_memory_fraction(0.8)
-
-#from one_step_model import NetworkPreconditioner
-
-
-class MaxIteration(callbacks.Callback):
-    def __init__(self, max_iteration: int, verbose: int = 1):
-        super().__init__(verbose)
-        self.max_iteration = max_iteration
-
-    def __call__(self, algorithm: Algorithm):
-        if algorithm.iteration >= self.max_iteration:
-            raise StopIteration
-
-
-class Submission(SAGA):
-    def __init__(self, data, 
-                       update_objective_interval: int = 10,
-                       **kwargs):
-        
-        tof = (data.acquired_data.shape[0] > 1)
-        views = data.acquired_data.shape[2]
-        num_subsets = compute_number_of_subsets(views, tof)
-        print("Number of views: ", views, " use ", num_subsets, " subsets")
-        data_sub, _, obj_funs = data_partition(data.acquired_data, data.additive_term,
-                                                                    data.mult_factors, num_subsets,
-                                                                    initial_image=data.OSEM_image,
-                                                                    mode = "staggered")
-        self.dataset = data
-
-        # WARNING: modifies prior strength with 1/num_subsets (as currently needed for BSREM implementations)
-        data.prior.set_penalisation_factor(data.prior.get_penalisation_factor() / len(obj_funs))
-        data.prior.set_up(data.OSEM_image)
-        for f in obj_funs: # add prior evenly to every objective function
-            f.set_prior(data.prior)
-
-        super().__init__(data_sub, 
-                         obj_funs,
-                         data.OSEM_image, 
-                         update_objective_interval=1)
-
-submission_callbacks = []
\ No newline at end of file