Added helper function for initialising a neural estimator

msainsburydale · Dec 4, 2023 · d1f67ff · d1f67ff
1 parent 38c5ed7
commit d1f67ff
Show file tree

Hide file tree

Showing 8 changed files with 131 additions and 7 deletions.
diff --git a/docs/src/API/architectures.md b/docs/src/API/architectures.md
@@ -9,7 +9,7 @@ Pages   = ["architectures.md"]
 
 ## Architectures
 
-Although the user is free to construct their neural estimator however they see fit, `NeuralEstimators` provides several useful architectures described below.
+Although the user is free to construct their neural estimator however they see fit (i.e., using arbitrary `Flux` code), `NeuralEstimators` provides several useful architectures described below that are specifically relevant to neural estimation. See also the convenience constructor [`initialise_estimator`](@ref).  
 
 ```@docs
 DeepSet

diff --git a/docs/src/API/core.md b/docs/src/API/core.md
@@ -22,7 +22,7 @@ The data should be stored as a `Vector{A}`, where each element of the vector is
 ## Types of estimators
 
 See also [Architectures and activations functions](@ref) that are often used
-when constructing neural estimators.
+when constructing neural estimators, and the convenience constructor [`initialise_estimator`](@ref). 
 
 ```@docs
 NeuralEstimator

diff --git a/docs/src/API/utility.md b/docs/src/API/utility.md
@@ -31,6 +31,8 @@ estimateinbatches
 
 expandgrid
 
+initialise_estimator
+
 loadbestweights
 
 removedata

diff --git a/docs/src/workflow/examples.md b/docs/src/workflow/examples.md
@@ -27,7 +27,7 @@ Next, we implicitly define the statistical model with simulated data. In `Neural
 simulate(parameters, m) = [θ[1] .+ θ[2] .* randn(1, m) for θ ∈ eachcol(parameters)]
 ```
 
-We now design architectures for the inner and outer neural networks, $\boldsymbol{\psi}(\cdot)$ and $\boldsymbol{\phi}(\cdot)$ respectively, in the [`DeepSet`](@ref) framework, and initialise the neural estimator as a [`PointEstimator`](@ref) object.
+We now design architectures for the inner and outer neural networks, $\boldsymbol{\psi}(\cdot)$ and $\boldsymbol{\phi}(\cdot)$ respectively, in the [`DeepSet`](@ref) framework, and initialise the neural estimator as a [`PointEstimator`](@ref) object. Note that this can be done directly using `Flux` code (as below), or with the helper function [`initialise_estimator`](@ref). 
 
 ```
 d = 1   # dimension of each replicate

diff --git a/docs/src/workflow/overview.md b/docs/src/workflow/overview.md
@@ -5,7 +5,7 @@ To develop a neural estimator with `NeuralEstimators.jl`,
 
 - Sample parameters from the prior distribution: the parameters are stored as $p \times K$ matrices, with $p$ the number of parameters in the model and $K$ the number of samples (i.e., parameter configurations) in the given parameter set (i.e., training, validation, or test set).
 - Simulate data from the assumed model over the parameter sets generated above. These data are stored as a `Vector{A}`, with each element of the vector associated with one parameter configuration, and where `A` depends on the representation of the neural estimator (e.g., an `Array` for CNN-based estimators, or a `GNNGraph` for GNN-based estimators).
-- Initialise a neural network, `θ̂`, that will be trained into a neural Bayes estimator.  
+- Initialise a neural network, `θ̂`, that will be trained into a neural Bayes estimator (see, e.g., convenience constructor [`initialise_estimator`](@ref)).  
 - Train `θ̂` under the chosen loss function using [`train`](@ref).
 - Assess `θ̂` using [`assess`](@ref). The resulting object of class [`Assessment`](@ref) can be used to assess the estimator with respect to the entire parameter space by estimating the risk function with [`risk`](@ref), or used to plot the empirical sampling distribution of the estimator.
 - Apply `θ̂` to observed data (once its performance has been checked in the above step). Bootstrap-based uncertainty quantification is facilitated with [`bootstrap`](@ref) and [`interval`](@ref).

diff --git a/src/Estimators.jl b/src/Estimators.jl
@@ -346,3 +346,110 @@ end
 # Clean printing:
 Base.show(io::IO, pe::PiecewiseEstimator) = print(io, "\nPiecewise estimator with $(length(pe.estimators)) estimators and sample size change-points: $(pe.breaks)")
 Base.show(io::IO, m::MIME"text/plain", pe::PiecewiseEstimator) = print(io, pe)
+
+
+# ---- Helper function for initialising an estimator ----
+
+"""
+    initialise_estimator(p::Integer; ...)
+Initialise a neural estimator for a statistical model with `p` unknown parameters.
+
+The estimator is couched in the DeepSets framework so that it can be applied to data with an arbitrary number of independent replicates (including the special case of a single replicate).
+
+# Keyword arguments
+- `architecture::String`: for unstructured data, one may use a densely-connected neural network ("DNN"); for data collected over a grid, a convolutional neural network ("CNN"); and for graphical or irregular spatial data, a graphical neural network ("GNN").
+- `d::Integer = 1`: dimension of the response variable (e.g., `d = 1` for univariate processes).
+- `estimator_type::String = "point"`: the type of estimator; either `"point"` or `"interval"`.
+- `depth = 3`: the number of hidden layers. Either a single integer or an integer vector of length two specifying the depth of inner (summary) and outer (inference) network of the DeepSets framework. Since there is an input and an output layer, the total number of layers in is `sum(depth) + 2`.
+- `width = 32`: a single integer or an integer vector of length `sum(depth)` specifying the width (or number of convolutional filters/channels) in each layer.
+- `activation::Function = relu`: the (non-linear) activation function of each hidden layer.
+- `activation_final::Function = relu`: the activation function of the output layer.
+- `kernel_size`: (applicable only to CNNs) a vector of length `depth[1]` containing integer tuples of length `D`, where `D` is the dimension of the convolution (e.g., `D = 2` for two-dimensional convolution).
+- `weight_by_distance::Bool = false`: (applicable only to GNNs) flag indicating whether the estimator will weight by spatial distance; if true, a `WeightedGraphConv` layer is used in the propagation module; otherwise, a regular `GraphConv` layer is used.
+
+# Examples
+```
+p = 2
+initialise_estimator(p, architecture = "DNN")
+initialise_estimator(p, architecture = "GNN")
+initialise_estimator(p, architecture = "CNN", kernel_size = [(10, 10), (5, 5), (3, 3)])
+```
+"""
+function initialise_estimator(
+    p::Integer;
+	architecture::String,
+    d::Integer = 1,
+    estimator_type::String = "point",
+    depth::Union{Integer, Vector{<:Integer}} = 3,
+    width::Union{Integer, Vector{<:Integer}} = 32,
+    activation::Function = relu,
+    activation_output::Function = identity,
+    kernel_size = nothing,
+	weight_by_distance::Bool = false
+    )
+
+    @assert p > 0
+    @assert d > 0
+	@assert architecture ∈ ["DNN", "CNN", "GNN"]
+    @assert estimator_type ∈ ["point", "interval"]
+    @assert all(depth .> 0)
+    @assert length(depth) == 1 || length(depth) == 2
+	if isa(depth, Integer) depth = [depth] end
+	if length(depth) == 1 depth = repeat(depth, 2) end
+    @assert all(width .> 0)
+    @assert length(width) == 1 || length(width) == sum(depth)
+	if isa(width, Integer) width = [width] end
+	if length(width) == 1 width = repeat(width, sum(depth)) end
+	# henceforth, depth and width are integer vectors of length 2 and sum(depth), respectively
+
+	if architecture == "CNN"
+		@assert !isnothing(kernel_size) "The argument `kernel_size` must be provided when `architecture = 'CNN'`"
+		@assert length(kernel_size) == depth[1]
+		kernel_size = coercetotuple(kernel_size)
+	end
+
+	L = sum(depth) # total number of hidden layers
+
+	# mapping (outer) network
+	ϕ = Chain(
+		[Dense(width[l-1] => width[l], activation) for l ∈ (depth[1]+1):L]...,
+		Dense(width[L] => p, activation_output)
+		)
+
+	# summary (inner) network
+	if architecture == "DNN"
+		ψ = Chain(
+			Dense(d => width[1], activation),
+			[Dense(width[l] => width[l+1], activation) for l ∈ 1:depth[1]]...
+			)
+	elseif architecture == "CNN"
+		ψ = Chain(
+			Conv(kernel_size[1], d => width[1], activation),
+			[Conv(kernel_size[l], width[l] => width[l+1], activation) for l ∈ 1:depth[1]]...,
+			Flux.flatten
+			)
+	elseif architecture == "GNN"
+		propagation = weight_by_distance ? WeightedGraphConv : GraphConv
+		# propagation_module = GNNChain(
+		# 	propagation(d => width[1], activation),
+		# 	[propagation(width[l] => width[l+1], relu) for l ∈ 1:depth[1]]...
+		# 	)
+		# readout_module = GlobalPool(mean)
+		# return GNN(propagation_module, readout_module, ϕ) # return more-efficient GNN object
+		ψ = GNNChain(
+			propagation(d => width[1], activation, aggr = mean),
+			[propagation(width[l] => width[l+1], relu, aggr = mean) for l ∈ 1:depth[1]]...,
+			GlobalPool(mean) # readout module
+			)
+	end
+
+	θ̂ = DeepSet(ψ, ϕ)
+
+	if estimator_type == "interval"
+		θ̂ = IntervalEstimator(θ̂, θ̂)
+	end
+
+	return θ̂
+end
+
+coercetotuple(x) = (x...,)
diff --git a/src/NeuralEstimators.jl b/src/NeuralEstimators.jl
@@ -41,7 +41,7 @@ export CholeskyCovariance, CovarianceMatrix, CorrelationMatrix
 export vectotril, vectotriu
 include("Architectures.jl")
 
-export NeuralEstimator, PointEstimator, IntervalEstimator, IntervalEstimatorCompactPrior, PointIntervalEstimator, QuantileEstimator, PiecewiseEstimator
+export NeuralEstimator, PointEstimator, IntervalEstimator, IntervalEstimatorCompactPrior, PointIntervalEstimator, QuantileEstimator, PiecewiseEstimator, initialise_estimator
 include("Estimators.jl")
 
 export GNN, UniversalPool, adjacencymatrix, WeightedGraphConv, maternclusterprocess
@@ -69,14 +69,13 @@ include("bootstrap.jl")
 export stackarrays, expandgrid, loadbestweights, loadweights, numberreplicates, nparams, samplesize, drop, containertype, estimateinbatches
 include("utility.jl")
 
-
 export NeuralEM, removedata, encodedata # TODO unit testing for NeuralEM
 include("missingdata.jl")
 
 end
 
 #TODO
-# - Add helper functions for censored data and write an example in the documentation. 
+# - Add helper functions for censored data and write an example in the documentation.
 # -	Plotting from Julia (which can act directly on the object of type assessment).
 # -	Examples:
 #   o	Add some figures to the examples in the documentation (e.g., show the sampling distribution in univariate example).

diff --git a/test/runtests.jl b/test/runtests.jl
@@ -668,3 +668,19 @@ end
 	ci = interval(estimator, Z, parameter_names = parameter_names)
 	@test size(ci[1]) == (p, 2)
 end
+
+
+@testset "initialise_estimator" begin
+	p = 2
+	initialise_estimator(p, architecture = "DNN")
+	initialise_estimator(p, architecture = "GNN")
+	initialise_estimator(p, architecture = "CNN", kernel_size = [(10, 10), (5, 5), (3, 3)])
+
+	@test typeof(initialise_estimator(p, architecture = "DNN", estimator_type = "interval")) <: IntervalEstimator
+	@test typeof(initialise_estimator(p, architecture = "GNN", estimator_type = "interval")) <: IntervalEstimator
+	@test typeof(initialise_estimator(p, architecture = "CNN", kernel_size = [(10, 10), (5, 5), (3, 3)], estimator_type = "interval")) <: IntervalEstimator
+
+	@test_throws Exception initialise_estimator(0, architecture = "DNN")
+	@test_throws Exception initialise_estimator(p, d = 0, architecture = "DNN")
+	@test_throws Exception initialise_estimator(p, architecture = "CNN")
+end