diff --git a/docs/src/API/architectures.md b/docs/src/API/architectures.md index 0bf4f91..ca1c116 100644 --- a/docs/src/API/architectures.md +++ b/docs/src/API/architectures.md @@ -9,7 +9,7 @@ Pages = ["architectures.md"] ## Architectures -Although the user is free to construct their neural estimator however they see fit, `NeuralEstimators` provides several useful architectures described below. +Although the user is free to construct their neural estimator however they see fit (i.e., using arbitrary `Flux` code), `NeuralEstimators` provides several useful architectures described below that are specifically relevant to neural estimation. See also the convenience constructor [`initialise_estimator`](@ref). ```@docs DeepSet diff --git a/docs/src/API/core.md b/docs/src/API/core.md index 88ed37e..69c7519 100644 --- a/docs/src/API/core.md +++ b/docs/src/API/core.md @@ -22,7 +22,7 @@ The data should be stored as a `Vector{A}`, where each element of the vector is ## Types of estimators See also [Architectures and activations functions](@ref) that are often used -when constructing neural estimators. +when constructing neural estimators, and the convenience constructor [`initialise_estimator`](@ref). ```@docs NeuralEstimator diff --git a/docs/src/API/utility.md b/docs/src/API/utility.md index 8704121..cd70ff4 100644 --- a/docs/src/API/utility.md +++ b/docs/src/API/utility.md @@ -31,6 +31,8 @@ estimateinbatches expandgrid +initialise_estimator + loadbestweights removedata diff --git a/docs/src/workflow/examples.md b/docs/src/workflow/examples.md index 55c1407..49132d7 100644 --- a/docs/src/workflow/examples.md +++ b/docs/src/workflow/examples.md @@ -27,7 +27,7 @@ Next, we implicitly define the statistical model with simulated data. In `Neural simulate(parameters, m) = [θ[1] .+ θ[2] .* randn(1, m) for θ ∈ eachcol(parameters)] ``` -We now design architectures for the inner and outer neural networks, $\boldsymbol{\psi}(\cdot)$ and $\boldsymbol{\phi}(\cdot)$ respectively, in the [`DeepSet`](@ref) framework, and initialise the neural estimator as a [`PointEstimator`](@ref) object. +We now design architectures for the inner and outer neural networks, $\boldsymbol{\psi}(\cdot)$ and $\boldsymbol{\phi}(\cdot)$ respectively, in the [`DeepSet`](@ref) framework, and initialise the neural estimator as a [`PointEstimator`](@ref) object. Note that this can be done directly using `Flux` code (as below), or with the helper function [`initialise_estimator`](@ref). ``` d = 1 # dimension of each replicate diff --git a/docs/src/workflow/overview.md b/docs/src/workflow/overview.md index f66f0ad..4e0b1af 100644 --- a/docs/src/workflow/overview.md +++ b/docs/src/workflow/overview.md @@ -5,7 +5,7 @@ To develop a neural estimator with `NeuralEstimators.jl`, - Sample parameters from the prior distribution: the parameters are stored as $p \times K$ matrices, with $p$ the number of parameters in the model and $K$ the number of samples (i.e., parameter configurations) in the given parameter set (i.e., training, validation, or test set). - Simulate data from the assumed model over the parameter sets generated above. These data are stored as a `Vector{A}`, with each element of the vector associated with one parameter configuration, and where `A` depends on the representation of the neural estimator (e.g., an `Array` for CNN-based estimators, or a `GNNGraph` for GNN-based estimators). -- Initialise a neural network, `θ̂`, that will be trained into a neural Bayes estimator. +- Initialise a neural network, `θ̂`, that will be trained into a neural Bayes estimator (see, e.g., convenience constructor [`initialise_estimator`](@ref)). - Train `θ̂` under the chosen loss function using [`train`](@ref). - Assess `θ̂` using [`assess`](@ref). The resulting object of class [`Assessment`](@ref) can be used to assess the estimator with respect to the entire parameter space by estimating the risk function with [`risk`](@ref), or used to plot the empirical sampling distribution of the estimator. - Apply `θ̂` to observed data (once its performance has been checked in the above step). Bootstrap-based uncertainty quantification is facilitated with [`bootstrap`](@ref) and [`interval`](@ref). diff --git a/src/Estimators.jl b/src/Estimators.jl index 667d14a..870c11c 100644 --- a/src/Estimators.jl +++ b/src/Estimators.jl @@ -346,3 +346,110 @@ end # Clean printing: Base.show(io::IO, pe::PiecewiseEstimator) = print(io, "\nPiecewise estimator with $(length(pe.estimators)) estimators and sample size change-points: $(pe.breaks)") Base.show(io::IO, m::MIME"text/plain", pe::PiecewiseEstimator) = print(io, pe) + + +# ---- Helper function for initialising an estimator ---- + +""" + initialise_estimator(p::Integer; ...) +Initialise a neural estimator for a statistical model with `p` unknown parameters. + +The estimator is couched in the DeepSets framework so that it can be applied to data with an arbitrary number of independent replicates (including the special case of a single replicate). + +# Keyword arguments +- `architecture::String`: for unstructured data, one may use a densely-connected neural network ("DNN"); for data collected over a grid, a convolutional neural network ("CNN"); and for graphical or irregular spatial data, a graphical neural network ("GNN"). +- `d::Integer = 1`: dimension of the response variable (e.g., `d = 1` for univariate processes). +- `estimator_type::String = "point"`: the type of estimator; either `"point"` or `"interval"`. +- `depth = 3`: the number of hidden layers. Either a single integer or an integer vector of length two specifying the depth of inner (summary) and outer (inference) network of the DeepSets framework. Since there is an input and an output layer, the total number of layers in is `sum(depth) + 2`. +- `width = 32`: a single integer or an integer vector of length `sum(depth)` specifying the width (or number of convolutional filters/channels) in each layer. +- `activation::Function = relu`: the (non-linear) activation function of each hidden layer. +- `activation_final::Function = relu`: the activation function of the output layer. +- `kernel_size`: (applicable only to CNNs) a vector of length `depth[1]` containing integer tuples of length `D`, where `D` is the dimension of the convolution (e.g., `D = 2` for two-dimensional convolution). +- `weight_by_distance::Bool = false`: (applicable only to GNNs) flag indicating whether the estimator will weight by spatial distance; if true, a `WeightedGraphConv` layer is used in the propagation module; otherwise, a regular `GraphConv` layer is used. + +# Examples +``` +p = 2 +initialise_estimator(p, architecture = "DNN") +initialise_estimator(p, architecture = "GNN") +initialise_estimator(p, architecture = "CNN", kernel_size = [(10, 10), (5, 5), (3, 3)]) +``` +""" +function initialise_estimator( + p::Integer; + architecture::String, + d::Integer = 1, + estimator_type::String = "point", + depth::Union{Integer, Vector{<:Integer}} = 3, + width::Union{Integer, Vector{<:Integer}} = 32, + activation::Function = relu, + activation_output::Function = identity, + kernel_size = nothing, + weight_by_distance::Bool = false + ) + + @assert p > 0 + @assert d > 0 + @assert architecture ∈ ["DNN", "CNN", "GNN"] + @assert estimator_type ∈ ["point", "interval"] + @assert all(depth .> 0) + @assert length(depth) == 1 || length(depth) == 2 + if isa(depth, Integer) depth = [depth] end + if length(depth) == 1 depth = repeat(depth, 2) end + @assert all(width .> 0) + @assert length(width) == 1 || length(width) == sum(depth) + if isa(width, Integer) width = [width] end + if length(width) == 1 width = repeat(width, sum(depth)) end + # henceforth, depth and width are integer vectors of length 2 and sum(depth), respectively + + if architecture == "CNN" + @assert !isnothing(kernel_size) "The argument `kernel_size` must be provided when `architecture = 'CNN'`" + @assert length(kernel_size) == depth[1] + kernel_size = coercetotuple(kernel_size) + end + + L = sum(depth) # total number of hidden layers + + # mapping (outer) network + ϕ = Chain( + [Dense(width[l-1] => width[l], activation) for l ∈ (depth[1]+1):L]..., + Dense(width[L] => p, activation_output) + ) + + # summary (inner) network + if architecture == "DNN" + ψ = Chain( + Dense(d => width[1], activation), + [Dense(width[l] => width[l+1], activation) for l ∈ 1:depth[1]]... + ) + elseif architecture == "CNN" + ψ = Chain( + Conv(kernel_size[1], d => width[1], activation), + [Conv(kernel_size[l], width[l] => width[l+1], activation) for l ∈ 1:depth[1]]..., + Flux.flatten + ) + elseif architecture == "GNN" + propagation = weight_by_distance ? WeightedGraphConv : GraphConv + # propagation_module = GNNChain( + # propagation(d => width[1], activation), + # [propagation(width[l] => width[l+1], relu) for l ∈ 1:depth[1]]... + # ) + # readout_module = GlobalPool(mean) + # return GNN(propagation_module, readout_module, ϕ) # return more-efficient GNN object + ψ = GNNChain( + propagation(d => width[1], activation, aggr = mean), + [propagation(width[l] => width[l+1], relu, aggr = mean) for l ∈ 1:depth[1]]..., + GlobalPool(mean) # readout module + ) + end + + θ̂ = DeepSet(ψ, ϕ) + + if estimator_type == "interval" + θ̂ = IntervalEstimator(θ̂, θ̂) + end + + return θ̂ +end + +coercetotuple(x) = (x...,) diff --git a/src/NeuralEstimators.jl b/src/NeuralEstimators.jl index b413aa5..f3f0400 100644 --- a/src/NeuralEstimators.jl +++ b/src/NeuralEstimators.jl @@ -41,7 +41,7 @@ export CholeskyCovariance, CovarianceMatrix, CorrelationMatrix export vectotril, vectotriu include("Architectures.jl") -export NeuralEstimator, PointEstimator, IntervalEstimator, IntervalEstimatorCompactPrior, PointIntervalEstimator, QuantileEstimator, PiecewiseEstimator +export NeuralEstimator, PointEstimator, IntervalEstimator, IntervalEstimatorCompactPrior, PointIntervalEstimator, QuantileEstimator, PiecewiseEstimator, initialise_estimator include("Estimators.jl") export GNN, UniversalPool, adjacencymatrix, WeightedGraphConv, maternclusterprocess @@ -69,14 +69,13 @@ include("bootstrap.jl") export stackarrays, expandgrid, loadbestweights, loadweights, numberreplicates, nparams, samplesize, drop, containertype, estimateinbatches include("utility.jl") - export NeuralEM, removedata, encodedata # TODO unit testing for NeuralEM include("missingdata.jl") end #TODO -# - Add helper functions for censored data and write an example in the documentation. +# - Add helper functions for censored data and write an example in the documentation. # - Plotting from Julia (which can act directly on the object of type assessment). # - Examples: # o Add some figures to the examples in the documentation (e.g., show the sampling distribution in univariate example). diff --git a/test/runtests.jl b/test/runtests.jl index 737a837..f7c79e1 100644 --- a/test/runtests.jl +++ b/test/runtests.jl @@ -668,3 +668,19 @@ end ci = interval(estimator, Z, parameter_names = parameter_names) @test size(ci[1]) == (p, 2) end + + +@testset "initialise_estimator" begin + p = 2 + initialise_estimator(p, architecture = "DNN") + initialise_estimator(p, architecture = "GNN") + initialise_estimator(p, architecture = "CNN", kernel_size = [(10, 10), (5, 5), (3, 3)]) + + @test typeof(initialise_estimator(p, architecture = "DNN", estimator_type = "interval")) <: IntervalEstimator + @test typeof(initialise_estimator(p, architecture = "GNN", estimator_type = "interval")) <: IntervalEstimator + @test typeof(initialise_estimator(p, architecture = "CNN", kernel_size = [(10, 10), (5, 5), (3, 3)], estimator_type = "interval")) <: IntervalEstimator + + @test_throws Exception initialise_estimator(0, architecture = "DNN") + @test_throws Exception initialise_estimator(p, d = 0, architecture = "DNN") + @test_throws Exception initialise_estimator(p, architecture = "CNN") +end