-
-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New Jacobian cache causes copy bug #473
Comments
Further investigation: when I run the tests with
I get a different error: julia> sol1 = solve(bvp1, solver; dt=dt)
ERROR: BoundsError: attempt to access 4-element UnitRange{Int64} at index [5]
Stacktrace:
[1] throw_boundserror(A::UnitRange{Int64}, I::Int64)
@ Base ./abstractarray.jl:737
[2] getindex
@ ./range.jl:930 [inlined]
[3] _colorediteration!
@ ~/.julia/packages/FiniteDiff/wm6iC/src/iteration_utils.jl:4 [inlined]
[4] forwarddiff_color_jacobian!(J::SubArray{…}, f::BoundaryValueDiffEq.__Fix3{…}, x::Vector{…}, jac_cache::ForwardColorJacCache{…})
@ SparseDiffTools ~/.julia/packages/SparseDiffTools/RGOJw/src/differentiation/compute_jacobian_ad.jl:392
[5] sparse_jacobian!
@ ~/.julia/packages/SparseDiffTools/RGOJw/src/highlevel/forward_mode.jl:63 [inlined]
[6] __mirk_mpoint_jacobian!(J::Matrix{…}, ::BandedMatrices.BandedMatrix{…}, x::Vector{…}, bc_diffmode::AutoForwardDiff{…}, nonbc_diffmode::AutoSparse{…}, bc_diffcache::SparseDiffTools.ForwardDiffJacobianCache{…}, nonbc_diffcache::SparseDiffTools.ForwardDiffJacobianCache{…}, loss_bc::BoundaryValueDiffEq.__Fix3{…}, loss_collocation::BoundaryValueDiffEq.__Fix3{…}, resid_bc::Vector{…}, resid_collocation::Vector{…}, L::Int64)
@ BoundaryValueDiffEq ~/.julia/packages/BoundaryValueDiffEq/YN0of/src/solve/mirk.jl:390
[7] #268
@ ~/.julia/packages/BoundaryValueDiffEq/YN0of/src/solve/mirk.jl:371 [inlined]
[8] JacobianCache
@ ~/.julia/dev/NonlinearSolve/src/internal/jacobian.jl:149 [inlined]
[9] JacobianCache
@ ~/.julia/dev/NonlinearSolve/src/internal/jacobian.jl:120 [inlined]
[10] __step!(cache::NonlinearSolve.GeneralizedFirstOrderAlgorithmCache{…}; recompute_jacobian::Nothing, kwargs::@Kwargs{})
@ NonlinearSolve ~/.julia/dev/NonlinearSolve/src/core/generalized_first_order.jl:230
[11] __step!
@ ~/.julia/dev/NonlinearSolve/src/core/generalized_first_order.jl:226 [inlined]
[12] #step!#131
@ ~/.julia/dev/NonlinearSolve/src/core/generic.jl:50 [inlined]
[13] step!
@ ~/.julia/dev/NonlinearSolve/src/core/generic.jl:45 [inlined]
[14] solve!(cache::NonlinearSolve.GeneralizedFirstOrderAlgorithmCache{…})
@ NonlinearSolve ~/.julia/dev/NonlinearSolve/src/core/generic.jl:13
[15] #__solve#130
@ ~/.julia/dev/NonlinearSolve/src/core/generic.jl:4 [inlined]
[16] __solve
@ ~/.julia/dev/NonlinearSolve/src/core/generic.jl:1 [inlined]
[17] macro expansion
@ ~/.julia/dev/NonlinearSolve/src/default.jl:282 [inlined]
[18] __solve(::NonlinearProblem{…}, ::NonlinearSolvePolyAlgorithm{…}; stats::SciMLBase.NLStats, alias_u0::Bool, verbose::Bool, kwargs::@Kwargs{…})
@ NonlinearSolve ~/.julia/dev/NonlinearSolve/src/default.jl:255
[19] __perform_mirk_iteration(cache::BoundaryValueDiffEq.MIRKCache{…}, abstol::Float64, adaptive::Bool; nlsolve_kwargs::@NamedTuple{}, kwargs::@Kwargs{…})
@ BoundaryValueDiffEq ~/.julia/packages/BoundaryValueDiffEq/YN0of/src/solve/mirk.jl:175
[20] __perform_mirk_iteration
@ ~/.julia/packages/BoundaryValueDiffEq/YN0of/src/solve/mirk.jl:171 [inlined]
[21] solve!(cache::BoundaryValueDiffEq.MIRKCache{…})
@ BoundaryValueDiffEq ~/.julia/packages/BoundaryValueDiffEq/YN0of/src/solve/mirk.jl:152
[22] #__solve#296
@ ~/.julia/packages/BoundaryValueDiffEq/YN0of/src/BoundaryValueDiffEq.jl:51 [inlined]
[23] __solve
@ ~/.julia/packages/BoundaryValueDiffEq/YN0of/src/BoundaryValueDiffEq.jl:49 [inlined]
[24] #solve_call#44
@ ~/.julia/packages/DiffEqBase/vscDj/src/solve.jl:612 [inlined]
[25] solve_call
@ ~/.julia/packages/DiffEqBase/vscDj/src/solve.jl:569 [inlined]
[26] solve_up(prob::BVProblem{…}, sensealg::Nothing, u0::Vector{…}, p::Tuple{…}, args::MIRK4{…}; kwargs::@Kwargs{…})
@ DiffEqBase ~/.julia/packages/DiffEqBase/vscDj/src/solve.jl:1084
[27] solve_up
@ ~/.julia/packages/DiffEqBase/vscDj/src/solve.jl:1070 [inlined]
[28] solve(prob::BVProblem{…}, args::MIRK4{…}; sensealg::Nothing, u0::Nothing, p::Nothing, wrap::Val{…}, kwargs::@Kwargs{…})
@ DiffEqBase ~/.julia/packages/DiffEqBase/vscDj/src/solve.jl:1007
[29] top-level scope
@ ~/Work/GitHub/Julia/Sandbox/test.jl:47
Some type information was truncated. Use `show(err)` to see complete types. So the one I was observing before might have been due to incorrect use of |
And in this same part of FiniteDiff, the array we're incorrectly accessing is
|
it should be square? What are the dimensions? |
|
oh it's differencing the banded subproblem, I see. |
Shouldn't this end up in the bandedmatrix iteration https://github.com/JuliaArrays/ArrayInterface.jl/blob/master/ext/ArrayInterfaceBandedMatricesExt.jl#L56-L62? |
It probably doesn't because |
I think there is something fundamentally wrong with the default |
Agreed, but it looks like there's two errors here. One is that NonlinearSolve changes are now missing the BandedMatrix dispatches which isn't great, but then the second is that the dense fallback should be corrected. |
Only the second one is an error, the first one is a suboptimality. I guess we need to ask @avik-pal where the jacobian prototype is built for this one, and how it ends up being a |
https://github.com/JuliaDiff/FiniteDiff.jl/blob/b574440c5d54708aa23a90c48f4a000dd1c6d915/ext/FiniteDiffBandedMatricesExt.jl#L13 if you relax |
Partial bisection done git bisect start
# status: waiting for both good and bad commits
# good: [f667683bb6c5bf78a8cef5defa52f887c526fc71] feat: integrate SciMLJacobianOperators into NonlinearSolve
git bisect good f667683bb6c5bf78a8cef5defa52f887c526fc71
# status: waiting for bad commit, 1 good commit known
# bad: [d56476c69c749a4b575b47d58c03cf366ab7134c] fix: use DI preparation result when initializing Jacobian (#472)
git bisect bad d56476c69c749a4b575b47d58c03cf366ab7134c
# skip: [fe1bd3dd6ec4923afad60453600316038b3cd9f1] chore: bump minimum versions
git bisect skip fe1bd3dd6ec4923afad60453600316038b3cd9f1
# skip: [59b7d022bc282a04f858b973954b26fc52792771] fix: update bruss testing
git bisect skip 59b7d022bc282a04f858b973954b26fc52792771
# skip: [92c59dbb2d360a4a2ed968668395473be57fc81e] chore: run formatter
git bisect skip 92c59dbb2d360a4a2ed968668395473be57fc81e
# skip: [058b87d5af338ad7e2049f2f27c95b476798d32b] feat: remove Setfield dep
git bisect skip 058b87d5af338ad7e2049f2f27c95b476798d32b
# skip: [3f59f4949829b74d78b9eea8de834dd4f9239152] fix: autodiff cannot be nothing
git bisect skip 3f59f4949829b74d78b9eea8de834dd4f9239152
# skip: [40eb6c867fbed99368e7005ae1ba4246083269eb] chore: mark master as DEV (DON'T TAG)
git bisect skip 40eb6c867fbed99368e7005ae1ba4246083269eb
# skip: [adac560c3f0f737b5c854d61642a182078bb50a1] fix: __value_derivative removal from line searches
git bisect skip adac560c3f0f737b5c854d61642a182078bb50a1
# skip: [b2bb25252e3526b753b3852dfbfff9cc386e3f7a] fix: missing qualifier
git bisect skip b2bb25252e3526b753b3852dfbfff9cc386e3f7a
# skip: [3b12e73e945941ac8d0d4eebe038f1c8ae15d5cf] fix: unwrap sparse AD for derivative call
git bisect skip 3b12e73e945941ac8d0d4eebe038f1c8ae15d5cf
# skip: [43df907989a67b707e433753b09ef3bcd3d08a4c] test: structured jacobians
git bisect skip 43df907989a67b707e433753b09ef3bcd3d08a4c
# skip: [0f2a62f00334d680096e55c9a8ecdc8012819aef] fix: minor test fixes
git bisect skip 0f2a62f00334d680096e55c9a8ecdc8012819aef
# good: [bdacb9ca9327b9d5d2fe6a8f0c08f44ac206f8ec] docs: disable ILU for now
git bisect good bdacb9ca9327b9d5d2fe6a8f0c08f44ac206f8ec
# bad: [b6221b7d61027339ec5fc50bb1f3f90398ce607e] refactor: reorder imports
git bisect bad b6221b7d61027339ec5fc50bb1f3f90398ce607e
# skip: [ba45097e841f9875a7e53d94a1d86f1ffcfd1425] fix: sparse jacobian
git bisect skip ba45097e841f9875a7e53d94a1d86f1ffcfd1425
# good: [6201fcc2f49e2ec2019eb61491f30bc9664e0dc5] fix: DI now works with ReverseDiff
git bisect good 6201fcc2f49e2ec2019eb61491f30bc9664e0dc5 |
Yes, if I relax that I get a lot of these warnings but at least the solve works:
|
Okay cool, one mystery solved. And that looks optimal too, since it's using the 2D indexing. That's worth a PR. The warnings, we should probably pass verbose false there but they make sense. Now for the fallback, i.e. they are defined the same as For dense then we should specialize on Matrix and do a fast path if square. If non-square, these should be: julia> n = 4
4
julia> m = 5
5
julia> A = rand(4,5)
4×5 Matrix{Float64}:
0.0732819 0.236601 0.971541 0.0239939 0.834559
0.297407 0.472245 0.963563 0.841125 0.296606
0.467831 0.297999 0.760833 0.87123 0.309619
0.781913 0.879559 0.871359 0.120851 0.0749186
julia> col_idxs = vec(stack([i*ones(Int,n) for i in 1:m]))
20-element Vector{Int64}:
1
1
1
1
2
2
2
⋮
4
4
5
5
5
5
julia> row_idxs = vec(stack([1:m for i in 1:n]))
20-element Vector{Int64}:
1
2
3
4
5
1
2
⋮
5
1
2
3
4
5 |
found it [a8e60d5] feat: use DI for dense jacobians |
Trying to figure out why the prototype isn't banded. We get a |
That makes a lot of sense. It must've changed the jac or sparsity type here. We can:
|
For 1 I have a hunch: if you're using DI with a dense backend to compute your prototype, you're always gonna get something that is the result of |
infil> f.jac_prototype
8×8 AlmostBandedMatrix{Float64} with bandwidths (5, 4) and fill rank 4 with data 8×8 BandedMatrix{Float64} with bandwidths (5, 4) and fill 4×8 Matrix{Float64}:
1.0 1.0 1.0 1.0 1.0 0.0 0.0 0.0
1.0 1.0 1.0 1.0 1.0 1.0 0.0 0.0
1.0 1.0 1.0 1.0 1.0 1.0 1.0 0.0
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
⋅ 1.0 1.0 1.0 1.0 1.0 1.0 1.0
⋅ ⋅ 1.0 1.0 1.0 1.0 1.0 1.0
infil> similar(f.jac_prototype)
8×8 Matrix{Float64}:
2.122e-314 1.2732e-313 1.4854e-313 1.9098e-313 2.3342e-313 2.54639e-313 2.75859e-313 7.0e-323
4.24399e-314 3.0e-323 1.4854e-313 1.9098e-313 2.3342e-313 2.54639e-313 2.75859e-313 3.18299e-313
8.48798e-314 1.4854e-313 1.6976e-313 2.122e-313 2.54639e-313 2.54639e-313 2.75859e-313 3.18299e-313
8.48798e-314 1.4854e-313 1.6976e-313 2.122e-313 2.54639e-313 2.54639e-313 2.97079e-313 3.18299e-313
8.48798e-314 1.4854e-313 1.6976e-313 2.122e-313 2.54639e-313 2.75859e-313 2.97079e-313 3.18299e-313
8.48798e-314 1.4854e-313 1.6976e-313 2.122e-313 2.54639e-313 2.75859e-313 2.97079e-313 3.18299e-313
1.061e-313 1.4854e-313 1.6976e-313 2.122e-313 2.54639e-313 2.75859e-313 2.97079e-313 7.4e-323
1.2732e-313 1.4854e-313 1.9098e-313 2.3342e-313 2.54639e-313 2.75859e-313 2.97079e-313 7.4e-323 |
Okay then that's not on me 🤣 good find Avik! |
I have always hated Julia's fallback semantics for similar 😠 |
Oh I see what's going on here. Is an internal setup to say "are you going to use structural_nz anyways? Don't make the arrays if you don't have to". It checks the sparsity pattern and says, "banded matrix? We have one of those at home https://github.com/JuliaDiff/FiniteDiff.jl/blob/b574440c5d54708aa23a90c48f4a000dd1c6d915/ext/FiniteDiffBandedMatricesExt.jl#L13-L27 which effectively just uses a trivial rows/cols, so don't spend the time building that array". Then |
So what's the plan of action here? If we define a generic But it would maybe send a signal that dense matrices also have a structure |
Those should probably just be rows_index = 1:size(J, 1)
cols_index = 1:size(J, 2) on the inside which is non-allocating so it's fine to do it there. |
Or we can just make sure it returns a correct non-allocating iterator. What would be bad is to allocate some big arrays (O(nonzeros)) that we don't end up using because the iteration scheme uses an analytical solution though. That would be a bit wasteful because for a matrix it's as big as the matrix itself. |
Here's an easy solution: julia> using Base.Iterators
julia> using BenchmarkTools
julia> function findstructralnz(x::AbstractMatrix)
cartind = vec(CartesianIndices(x))
return Iterators.map(first ∘ Tuple, cartind), Iterators.map(last ∘ Tuple, cartind)
end
findstructralnz (generic function with 1 method)
julia> collect.(findstructralnz(rand(2, 3)))
([1, 2, 1, 2, 1, 2], [1, 1, 2, 2, 3, 3])
julia> @btime findstructralnz($(rand(2, 3)));
28.162 ns (0 allocations: 0 bytes) |
👍that looks pretty decent. |
If that's okay I would like to let you insert it in ArrayInterface and modify SparseDiffTools though, I don't know these code bases and I don't want to accidentally break something |
Have a nice weekend! |
One day I'll remember what weekends are 😅 |
I think the least disruptive fix is this one: |
And to avoid the diagnosis issues we've ran into here: |
And JuliaDiff/FiniteDiff.jl#194 is the other part, and SciML/FastAlmostBandedMatrices.jl#16 is the better similar. So I think all facets of this are getting handled. |
Merged, yay. |
Describe the bug 🐞
The new NonlinearSolve v3.15 (with DI in it) causes a BoundaryValueDiffEq solve to error or segfault.
Expected behavior
The solve should run without errors, as it does with NonlinearSolve 3.14.
Minimal Reproducible Example 👇
Error & Stacktrace⚠️
Environment (please complete the following information):
using Pkg; Pkg.status()
using Pkg; Pkg.status(; mode = PKGMODE_MANIFEST)
versioninfo()
Additional context
The bug is probably not deterministic. Sometimes it doesn't even happen, and sometimes it downright causes a segfault.
By
pkg> dev
-ing NonlinearSolve and adding some logging before linesrc/internal/linear_solve.jl:231
, I obtained some more information about the matriceslincache.A
andnew_A
:In other words, the
copyto!
fails even though both matrices have the samesize
, because the one fromlincache
doesn't have the rightlength
. Note thatlength(lincache.A)
is not always 1.The text was updated successfully, but these errors were encountered: