Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correct adjoint for truncated SVD #15

Merged
merged 29 commits into from
Jul 10, 2024
Merged

Correct adjoint for truncated SVD #15

merged 29 commits into from
Jul 10, 2024

Conversation

pbrehmer
Copy link
Collaborator

@pbrehmer pbrehmer commented Mar 4, 2024

This PR implements the correct adjoint for a truncated SVD, as explained in this pre-print. To that end, we use KrylovKit's iterative svdsolve which is wrapped in the function itersvd, so that a custom rrule can defined locally.

I will also add a small test script that compares against the previous adjoint on some gauge-invariant loss function.

@pbrehmer pbrehmer marked this pull request as draft March 4, 2024 18:12
@pbrehmer pbrehmer marked this pull request as ready for review March 5, 2024 15:11
@pbrehmer
Copy link
Collaborator Author

pbrehmer commented Mar 5, 2024

I think this is ready to merge from my side. The itersvd adjoint seems to be compatible with the TensorKit.tsvd adjoint for the loss function I tested it on. So this seems to work.

I scrapped the test_rrule bit for now since I thought we could add it later once we have unit tests running. Then I'll also add the rrule for AbstractMatrix.

@pbrehmer
Copy link
Collaborator Author

I updated to the newest KrylovKit version that uses VectorInterface.jl to simplify the coupled linear problem that arises in the SVD adjoint. This allows us to just use a Tuple and to get rid of all the reshapes, so this is much nicer now.

@lkdvos
Copy link
Member

lkdvos commented Apr 4, 2024

Jutho/KrylovKit.jl#83
Just wanted to link these PRs

@pbrehmer
Copy link
Collaborator Author

I was thinking about this again, so let me list a few TODOs that should still be implemented before merging:

  • Make itersvd compatible with symmetric tensors
  • Write some interface which makes it possible to choose different SVD functions during CTMRG. It could be as easy as overloading the alg kwarg in tsvd
  • Rewrite test_svd_adjoint.jl as an actual test

@lkdvos
Copy link
Member

lkdvos commented May 29, 2024

My take on this:

Symmetric tensors + KrylovKit.svdsolve

In the case of an explicit input tensor, the mapping is straightforward, as this can be achieved simply block-by-block:

foreach blocksector(t) do c
   Us, Ss, Vs, info = svdsolve(block(t, c), num, which, alg)
   block(U, c)[:, 1:num] .= stack(Us)
    block(S, c)[diagind(Ss)] .= Ss
    block(v, c)[1:num, :] .= stack(Vs)'
end

The annoying thing is to determine how you want to specify num. As far as I can tell, the easiest way of achieving this is just through a truncspace(V)-like specification. This then immediately fixes the structure of U, S, V, so filling in the blocks is also straightforward (I am skipping over all details here).

Symmetric tensors + KrylovKit.svdsolve + arbitrary function

In principle, it should also be possible to supply an implicit tensormap (e.g. as the contraction of a network instead of the dense tensormap). This still works, but now we need:

  1. the "row spaces" or "codomain" of the initial guess, or an explicit initial guess
  2. the number of svdvalues per sector, which directly translates to inner spaces between U,S, V

The question here is whether or not we want to allow the initial guess to be an arbitrary tensormap, or restrict it to a tensor. In the latter case, it is quite straightforward to then build U and V, as this is just catdomain and catcodomain, but in the former there is an additional permutation involved. I guess this is still possible, but there are definitely some subtleties to consider

Writing an interface

I guess the easiest is to figure out the implementation of the above first, and then check what input arguments we need. Then we just define a wrapper function in PEPSKit that dispatches to either KrylovKit or TensorKit, depending on if alg isa SVD/SDD or alg isa GKL.


In principle, I would be very happy to have this wrapper in a package extension of TensorKit, as a very similar thing exists for eigsolve and expsolve, but I would think here is a good place to have a first implementation, which we could consider migrating afterwards

@pbrehmer
Copy link
Collaborator Author

I think your take is pretty much spot on. For the case of symmetric tensor + KrylovKit.svdsolve that's what I would have implemented. The case of an arbitrary function is not yet super relevant for PEPSKit I think (apparently it doesn't make too much of a performance difference, unless you go to really large dimensions) but of course it should be implemented eventually.

I'm currently writing such a SVD interface with a wrapper function. I think that makes the most sense because that would make user-specified custom SVD functions easy to implement (and that's what I frequently need at the moment). Regarding the input arguments, I would try to keep it quite general. I will push something next week and maybe try to finish the PR then.

@lkdvos
Copy link
Member

lkdvos commented May 31, 2024

Great! I think the KrylovKit AD PR is getting its final touches, so by then this should be merged and AD might work out of the box. Let me know if I can do something!

@pbrehmer
Copy link
Collaborator Author

pbrehmer commented Jun 20, 2024

I added some wrapper structs and a wrapper function for the SVD. I thought it would be a useful thing to have extra structs for the SVD algorithm since you can store the different algorithm parameters and still have a single svdwrap interface that will get called inside the CTMRG routine.

I'm not sure yet how to implement the adjoint for the IterSVD which uses KrylovKit.GKL. The problem with the adjoint inside KrylovKit is that it uses vectors of vectors under the hood instead of matrices to perform all operations. I wonder what the best way is to convert back and forth from Matrix{ComplexF64} to Vector{Vector{ComplexF64}}.

Also, both the tsvd! and KrylovKit.svdsolve adjoint lack the possibility to do Lorentzian broadening to avoid divergences in case of degenerate singular values. However, this will be necessary in some PEPS simulations. So here I'm also not sure what the best way is to proceed; either we implement the broadening directly in TensorKit and KrylovKit, or we need a custom adjoint inside PEPSKit.

Note that I also included the OldSVD with the previously used (wrong) adjoint. For now it is useful for testing purposes, I think, but we can still throw it out later.

I could use some help or input on how to reuse the KrylovKit.svdsolve adjoint and also how to realize the Lorentzian broadening. Of course we could also go for a different interface altogether, I'd happy about suggestions! @lkdvos

Copy link
Member

@lkdvos lkdvos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a couple comments in the code, but in general I quite like the idea.

Some more generic remarks:
I think, if possible, we should decouple the forwards and backwards implementations as much as possible. This means using something like hook_pullback, but also that we can move the Lorentzian broadening question into the configuration of the backwards algorithm instead of the forwards. (Am I correct in assuming that the broadening only makes sense when performing AD, and you would otherwise rather have the broadening turned off?) This would also mean that we automatically have the broadening only for the final iteration which is used in the fixed point differentiation, while the bulk of the CTMRG iterations just uses a full decomposition.

I dont like the term svdwrap, but I guess that can be changed however we like. I wouldnt mind reusing either svdsolve or tsvd without importing it, i.e. TensorKit.tsvd would not be the same as PEPSKit.tsvd, which would only matter within this package, and not change it outwards, and then afterwards we could merge the two?

src/algorithms/ctmrg.jl Outdated Show resolved Hide resolved
src/algorithms/ctmrg.jl Outdated Show resolved Hide resolved
src/algorithms/ctmrg.jl Outdated Show resolved Hide resolved
src/utility/svd.jl Outdated Show resolved Hide resolved
src/utility/svd.jl Outdated Show resolved Hide resolved
alg::KrylovKit.GKL = KrylovKit.GKL(; tol=1e-14, krylovdim=25)
howmany::Int = 20
lorentz_broad::Float64 = 0.0
alg_rrule::Union{GMRES,BiCGStab,Arnoldi} = GMRES(; tol=1e-14)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would keep this alg_rrule separate, maybe re-using the hook_pullback interface, to decouple the implementation of the forward and backward rules

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But in terms of usability, I think the SVD algorithm struct would be the best way to set the SVD adjoint parameters. So if I were to use hook_pullback in this case, I would need to store the alg_rrule and lorentz_broad settings in the CTMRG struct, which I also don't like that much.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see, that's a very good point.

How about a wrapper then?

struct SVDrrule{F,T,R,E<:Real}
    svd_alg::F
    truncation_alg::T
    rrule_alg::R
    lorentz_broadening::E
end

In this case, we still at least semantically separate the rrule from the forward pass (which I really like), while also grouping them together, which I also quite like. It also looks to me to be quite flexible.

I think we should probably investigate (not just here, but throughout PEPSKit), if we want to include the truncation algorithm here or as a separate field. My guess would be that it's actually not a bad idea to keep them separate, considering we presumably might want to have varying truncspace settings throughout the unit cell, without changing the SVD algorithm.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have now added such a wrapper. It leads to the cool thing that you can easily choose the rrule with the corresponding algorithm struct and this decouples the forward SVD algorithm and its reverse rule. I decided to hide the lorentz_broadening away inside the rrule_alg since in general Lorentzian broadening might not be needed in every SVD implementation.

The only thing I'm unsure about is the naming; maybe something like SVDWithRrule, SVDWithAdjoint, DiffSVD or DifferentiableSVD would be more descriptive?

Copy link

codecov bot commented Jun 28, 2024

Codecov Report

Attention: Patch coverage is 89.14729% with 14 lines in your changes missing coverage. Please review.

Files Coverage Δ
src/PEPSKit.jl 100.00% <ø> (ø)
src/algorithms/ctmrg.jl 94.81% <100.00%> (+0.13%) ⬆️
src/utility/svd.jl 87.93% <87.93%> (ø)

... and 1 file with indirect coverage changes

@lkdvos
Copy link
Member

lkdvos commented Jul 2, 2024

as a small heads-up, the latest version of tensorkit now is less strict with the tsvd types: Jutho/TensorKit.jl#134 (comment)

@pbrehmer
Copy link
Collaborator Author

pbrehmer commented Jul 2, 2024

Thanks for the heads-up, that is good timing! I will work on the PR tomorrow, so the TensorKit update comes in handy.

@pbrehmer
Copy link
Collaborator Author

pbrehmer commented Jul 4, 2024

What I just realized: We can get rid of eigsolve.jl since that is now implemented in the new KrylovKit versions. I'll take care of that in this PR. (And of course we should adjust the compat entry for KrylovKit then.)

@lkdvos
Copy link
Member

lkdvos commented Jul 4, 2024

It might conflict with the MPSKit compat entries, but I'll try and get a release out for that one asap

src/utility/svd.jl Outdated Show resolved Hide resolved
src/utility/svd.jl Outdated Show resolved Hide resolved
@pbrehmer
Copy link
Collaborator Author

pbrehmer commented Jul 8, 2024

Will add a IterSVD test for symmetric tensors tomorrow and try to fix the other tests. Then the PR should be finally done (hopefully).

@pbrehmer
Copy link
Collaborator Author

pbrehmer commented Jul 9, 2024

Alright, I think I would be fine with merging the PR like this (once the tests work). Maybe we could first merge #49 and get the tests to run there, and then merge master into this branch (such that the tests still hopefully work).

src/utility/svd.jl Outdated Show resolved Hide resolved
src/utility/svd.jl Outdated Show resolved Hide resolved
src/utility/svd.jl Outdated Show resolved Hide resolved
src/utility/svd.jl Outdated Show resolved Hide resolved
test/ctmrg/svd_wrapper.jl Outdated Show resolved Hide resolved
@pbrehmer
Copy link
Collaborator Author

pbrehmer commented Jul 9, 2024

@lkdvos I'm not sure how to make the PR compatible with Julia 1.8 since I have to use Base.get_extension for the sparse SVD adjoint in order to access functions from the KrylovKit AD package extension. AFAIK package extensions were only added after v1.8 - so do you think there is a way around it somehow, without losing backwards compatibility? Other than that the tests should run!

@lkdvos
Copy link
Member

lkdvos commented Jul 9, 2024

@lkdvos I'm not sure how to make the PR compatible with Julia 1.8 since I have to use Base.get_extension for the sparse SVD adjoint in order to access functions from the KrylovKit AD package extension. AFAIK package extensions were only added after v1.8 - so do you think there is a way around it somehow, without losing backwards compatibility? Other than that the tests should run!

I think in 1.8, these extensions are just loaded as regular modules, so you can just import them accordingly. For reference, I did a similar thing here

@pbrehmer
Copy link
Collaborator Author

pbrehmer commented Jul 9, 2024

I think in 1.8, these extensions are just loaded as regular modules, so you can just import them accordingly

Gave it a try and I'm not so sure, why I can't access the extensions from the main module. I don't quite understand why the TensorOperations example works. I might need some help here

@lkdvos
Copy link
Member

lkdvos commented Jul 9, 2024

I'm super confused myself now. I have no problem getting it to work from the REPL on 1.8, and apparently it's also fine from a package extension, but somehow the construction in the main package of a module seems to not work.
I am doing some further digging, but it seems like this is really a limitation of how package extensions and "Requires" work, and in particular from what I am gathering this code is only included after the main module is loaded, so there really is no way to access it.

Technically, that construction is also just very illegal, Extensions should not define new functions, so accessing them from a different package is just going against all guidelines. I have an idea on how to avoid the entire problem, but the simplest solution is to just restrict to julia 1.9 😁 I am definitely okay with this restriction, it seems like we will have to do this for TensorKit in the near future as well, as some multithreaded implementation there requires this.

@pbrehmer
Copy link
Collaborator Author

pbrehmer commented Jul 9, 2024

Oh well, what a mess. Thanks for digging deep :-) I would also be very okay with restricting to v1.9.

@lkdvos
Copy link
Member

lkdvos commented Jul 9, 2024

So, in principle, we could support 1.8 by computing the rrule block-per-block in the forwards pass using the normal chainrules interface, and then in the pullback applying these rules block per block. I don't think this makes any difference performance-wise, but I am definitely a little lazy right now. @leburgel , do you have any strong feelings against restricting to 1.9?

@pbrehmer
Copy link
Collaborator Author

pbrehmer commented Jul 9, 2024

Alright that would be possible, but it seems like quite a big hassle for supporting a non-LTS version... I'd still go with restricting to v1.9 :)

@leburgel
Copy link
Member

So, in principle, we could support 1.8 by computing the rrule block-per-block in the forwards pass using the normal chainrules interface, and then in the pullback applying these rules block per block. I don't think this makes any difference performance-wise, but I am definitely a little lazy right now. @leburgel , do you have any strong feelings against restricting to 1.9?

I have no objections whatsoever to restricting to 1.9, go for it.

lkdvos
lkdvos previously approved these changes Jul 10, 2024
Copy link
Member

@lkdvos lkdvos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the tests are green, good to go for me

@pbrehmer
Copy link
Collaborator Author

Just noticed a duplicate line, so I removed it. If the tests run, this is also good to go for me!

@lkdvos lkdvos merged commit 5cd0304 into master Jul 10, 2024
7 of 9 checks passed
@lkdvos lkdvos deleted the svd_adjoint branch July 10, 2024 10:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants