-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[question] Achieve constraints via return of Inf #82
Comments
@Maximilian-Stefan-Ernst I'm not sure how Optim deals with Can you give an example of what you tried? |
Yes! Okay, so our objective is of the form So what works fine for us with Optim and NLopt is just to check if using Zygote, Optim, ForwardDiff, ProximalAlgorithms, ProximalOperators, LinearAlgebra, ProximalCore
function f(x)
# in reality, Σ is a more complicated function of the parameters
Σ = [x[1] 0.0
0.0 x[2]]
# Since Σ is diagonal, it is positive definite iff x[1] > 0, x[2] > 0
if !isposdef(Σ)
return Inf
else
return logdet(Σ) + tr(inv(Σ))
end
end
# works just fine
sol = Optim.optimize(f, [2.0, 2.0], BFGS(); autodiff = :forward)
# finds the correct values of [1.0; 1.0]
sol.minimizer
# the same with ProximalAlgorithms
ProximalAlgorithms.PANOC()(x0 = [2.0, 2.0], f = f, g = NormL1(0.0))
# ERROR: MethodError: Cannot `convert` an object of type Nothing to an object of type Float64
# The reason seems to be that different autodiff backends handle inadmissible values differently
ForwardDiff.gradient(f, [-1.0, -1.0])
# returns [0.0; 0.0]
Zygote.gradient(f, [-1.0, -1.0])
# returns (nothing,) In reality, for performance reasons, we give analytical gradients instead. For inadmissible parameter values, we just fill the gradient with ones (i.e. anything that does not fulfill Optim's convergence criteria on the gradient norm). However, this fails for ProximalAlgorithms: struct myproblem end
function ProximalCore.gradient!(grad, f::myproblem, x)
Σ = [x[1] 0.0
0.0 x[2]]
if !isposdef(Σ)
grad .= 1.0
return Inf
else
∇Σ = [1.0 0.0
0.0 0.0
0.0 0.0
0.0 1.0]
grad' .= vec(inv(Σ) - inv(Σ)^2)'*∇Σ
return logdet(Σ) + tr(inv(Σ))
end
end
ProximalAlgorithms.PANOC()(x0 = [2.0, 2.0], f = myproblem(), g = NormL1(0.0))
# output: ([-6.39742852169722e17, -6.39742852169722e17], 11) but it starts to work again if we return function ProximalCore.gradient!(grad, f::myproblem, x)
Σ = [x[1] 0.0
0.0 x[2]]
if !isposdef(Σ)
grad .= 1.0
return floatmax(eltype(x)) # <- only this line changed
else
∇Σ = [1.0 0.0
0.0 0.0
0.0 0.0
0.0 1.0]
grad' .= vec(inv(Σ) - inv(Σ)^2)'*∇Σ
return logdet(Σ) + tr(inv(Σ))
end
end
ProximalAlgorithms.PANOC()(x0 = [2.0, 2.0], f = myproblem(), g = NormL1(0.0))
# output: ([0.9999999997915161, 0.9999999997915161], 7) Sorry for the long post, but I could not find an easier MWE that reproduces the problem. |
Not at all, thanks for the detailed report! Leaving aside the AD issues, The To summarize, I’m not surprised if things break in your case 🙂 and now I’m curious how Optim deals with it. |
If you don't want questions in your issues, please just delete.
We have complicated constraints that can not (or at least it is not obvious how to) be enforced via a proximal operator. However, with Optim.jl we've had good experiences just returning
Inf
whenever the constraints are not met (and there are some theoretical reasons why this should work as well as a lot of practical experience).Using ProximalAlgorithms, this strategy leads to convergence problems. Do you know if ProximalAlgorithms handles
Inf
values differently than Optim? Is there any advantage of returningfloatmax(...)
or a very high number instead ofInf
?The text was updated successfully, but these errors were encountered: