Refactor taylorize (#156)

* Add RetAlloc to store preallocated objects * Add allocated vars definitions to parsed `jetcoeffs!` * Use a RetAlloc argument in parsed `jetcoeffs!` * Split array declaration by number of indices up to 3 indices * Increase test coverage of common interface with DiffEqs Note that `full_cache` was never used, so I commented * Remove `debug` flags in parse_eqs.jl * Allow integration of ODEs with abstract arrays with tests (common interface) * Add tests related to #143, somehow solved * Comment __jetcoeffs! methods for DynamicalODEFunctions which are not used or seemingly useless * Precisions in docs/taylorize.md * Further documentation fixes * Increase `maxsteps` in examples to make fairer comparisons * Add declaration of Taylor1 arrays involving 4 indices * Fix an error, and error when more than 5-index array is included * Fix typo and fix tests * Add test for error thrown with 5-index arrays * Fix added test so it works with Julia v1.6 too * Comment a seemingly useless line * Pass t-variable with complete order to ODE function * Uncomment and fix Kepler benchmarks with parse_eqs=false * Bump minor version * Few more tests... for coverage * TaylorSeries v0.13 and OrdinaryDiffEq v6
PerezHz · Feb 8, 2023 · da195d4 · da195d4 · lbenet · Feb 8, 2023
1 parent 5422e9b
commit da195d4
Show file tree

Hide file tree

Showing 11 changed files with 616 additions and 345 deletions.
diff --git a/Project.toml b/Project.toml
@@ -26,13 +26,13 @@ BenchmarkTools = "1.3"
 DiffEqBase = "6"
 Elliptic = "0.5, 1.0"
 Espresso = "0.6.1"
-OrdinaryDiffEq = "5, 6"
+OrdinaryDiffEq = "6"
 PkgBenchmark = "0.2"
 RecursiveArrayTools = "2"
 Reexport = "0.2, 1"
 Requires = "0.5.2, 1"
 StaticArrays = "0.12.5, 1"
-TaylorSeries = "0.12"
+TaylorSeries = "0.13"
 julia = "1"
 
 [extras]

diff --git a/benchmark/kepler_benchmarks.jl b/benchmark/kepler_benchmarks.jl
@@ -115,23 +115,23 @@ let
     SUITE["Kepler"]["kepler6-3"] = @benchmarkable taylorinteg(
         kepler6!, $q0, $t0, $tf3, $_order, $_abstol, -1.0, maxsteps=$maxsteps3)
 
-    # # ==========
-    # # KeplerNotParsed
-    # # ==========
-    # SUITE["KeplerNotParsed"] = BenchmarkGroup()
-
-    # SUITE["KeplerNotParsed"]["kepler1"] = @benchmarkable taylorinteg(
-    #     kepler1!, $q0, $t0, $tf, $_order, $_abstol, -1.0, maxsteps=$maxsteps,
-    #     parse_eqs=false)
-    # SUITE["KeplerNotParsed"]["kepler2"] = @benchmarkable taylorinteg(
-    #     kepler2!, $q0, $t0, $tf, $_order, $_abstol, $pars, maxsteps=$maxsteps,
-    #     parse_eqs=false)
-    #
-    # SUITE["KeplerNotParsed"]["kepler5"] = @benchmarkable taylorinteg(
-    #     kepler5!, $q0, $t0, $tf, $_order, $_abstol, maxsteps=$maxsteps,
-    #     parse_eqs=false)
-    # SUITE["KeplerNotParsed"]["kepler6"] = @benchmarkable taylorinteg(
-    #     kepler6!, $q0, $t0, $tf, $_order, $_abstol, -1.0, maxsteps=$maxsteps,
-    #     parse_eqs=false)
-    #
+    # ==========
+    # KeplerNotParsed
+    # ==========
+    SUITE["KeplerNotParsed"] = BenchmarkGroup()
+
+    SUITE["KeplerNotParsed"]["kepler1"] = @benchmarkable taylorinteg(
+        kepler1!, $q0, $t0, $tf2, $_order, $_abstol, -1.0, maxsteps=$maxsteps2,
+        parse_eqs=false)
+    SUITE["KeplerNotParsed"]["kepler2"] = @benchmarkable taylorinteg(
+        kepler2!, $q0, $t0, $tf2, $_order, $_abstol, $pars, maxsteps=$maxsteps2,
+        parse_eqs=false)
+
+    SUITE["KeplerNotParsed"]["kepler5"] = @benchmarkable taylorinteg(
+        kepler5!, $q0, $t0, $tf2, $_order, $_abstol, maxsteps=$maxsteps2,
+        parse_eqs=false)
+    SUITE["KeplerNotParsed"]["kepler6"] = @benchmarkable taylorinteg(
+        kepler6!, $q0, $t0, $tf2, $_order, $_abstol, -1.0, maxsteps=$maxsteps2,
+        parse_eqs=false)
+
 end
diff --git a/docs/src/taylorize.md b/docs/src/taylorize.md
@@ -17,19 +17,22 @@ order to compute the recurrence relations that are used to construct
 the Taylor expansion of the solution. This is done for each order of
 the series in [`TaylorIntegration.jetcoeffs!`](@ref). These computations are
 not optimized: they waste memory due to repeated allocations of some temporary
-arrays, and perform some operations whose result has been previously
+arrays, and perform some operations whose result has already been previously
 computed.
 
 Here we describe one way to optimize this: The idea is to replace the
 default method of [`TaylorIntegration.jetcoeffs!`](@ref) by another
 method (same function name) which is called by dispatch, and that in principle
 performs better.
-The new method is constructed specifically for the specific function
+The new method is constructed specifically for the function
 defining the equations of motion by parsing its expression. This new
 method performs in principle *exactly* the same operations, but avoids
-the extra allocations; to do the latter, the macro also creates an *internal*
+repeating some operations and the extra allocations.
+To achieve the latter, the macro also creates an *internal*
 function `TaylorIntegration._allocate_jetcoeffs!`, which allocates all temporary
-`Taylor1` objects as well as the declared `Array{Taylor1,1}`s.
+`Taylor1` objects as well as the declared `Array{Taylor1,N}`s, which are stored
+in a [`RetAlloc{T}`](@ref) struct for efficiency, and include arrays (of `Taylor1{T}`
+objects) with up-to-three indices.
 
 
 ## An example
@@ -41,7 +44,7 @@ using the default method, as described [before](@ref pendulum).
 ```@example taylorize
 using TaylorIntegration
 
-function pendulum!(dx, x, p, t)
+function pendulumNP!(dx, x, p, t) # `pendulum!` ODEs, not parsed
     dx[1] = x[2]
     dx[2] = -sin(x[1])
     return dx
@@ -53,28 +56,30 @@ tf = 10000.0
 q0 = [1.3, 0.0]
 
 # The actual integration
-t1, x1 = taylorinteg(pendulum!, q0, t0, tf, 25, 1e-20, maxsteps=1500); # warm-up run
-e1 = @elapsed taylorinteg(pendulum!, q0, t0, tf, 25, 1e-20, maxsteps=1500);
-all1 = @allocated taylorinteg(pendulum!, q0, t0, tf, 25, 1e-20, maxsteps=1500);
+t1, x1 = taylorinteg(pendulumNP!, q0, t0, tf, 25, 1e-20, maxsteps=50000); # warm-up run
+e1 = @elapsed taylorinteg(pendulumNP!, q0, t0, tf, 25, 1e-20, maxsteps=50000);
+all1 = @allocated taylorinteg(pendulumNP!, q0, t0, tf, 25, 1e-20, maxsteps=50000);
 e1, all1
 ```
 
-We note that the initial number of methods defined for
-`TaylorIntegration.jetcoeffs!` is 2.
+The initial number of methods defined for
+`TaylorIntegration.jetcoeffs!` is 2; yet, since `@taylorize` was used
+in [an example previously](@ref diffeqinterface), the current number
+of methods is 3, as explained below.
 ```@example taylorize
-length(methods(TaylorIntegration.jetcoeffs!)) == 2 # initial value
+println(methods(TaylorIntegration.jetcoeffs!)) # default methods
 ```
-Similarly, the number of methods for `TaylorIntegration._allocate_jetcoeffs!` is
-also 2.
+Similarly, the number of methods for `TaylorIntegration._allocate_jetcoeffs!` originally is 2, and for the same reasons it is
+currently 3.
 ```@example taylorize
-length(methods(TaylorIntegration._allocate_jetcoeffs!)) == 2 # initial value
+println(methods(TaylorIntegration._allocate_jetcoeffs!)) # default methods
 ```
-Using `@taylorize` will increase this number by creating a new method for these
-two functions.
+Using `@taylorize` will increase this number by creating a new method
+for these functions.
 
 The macro [`@taylorize`](@ref) is intended to be used in front of the function
 that implements the equations of motion. The macro does the following: it
-first parses the function as it is, so the integration can be computed
+first parses the function as it is, so the integration can still be computed
 using [`taylorinteg`](@ref) as above, by explicitly using the keyword
 argument `parse_eqs=false`; this also declares the function of the ODEs, whose name
 is used for parsing. It then creates and evaluates a new method of
@@ -93,30 +98,30 @@ println(methods(pendulum!))
 ```
 
 ```@example taylorize
-println(methods(TaylorIntegration.jetcoeffs!))
+println(methods(TaylorIntegration.jetcoeffs!)) # result should be 4
 ```
 
 We see that there is only one method of `pendulum!`, and
-there is a *new* method (three in total) of `TaylorIntegration.jetcoeffs!`,
+there is a *new* method (four in total) of `TaylorIntegration.jetcoeffs!`,
 whose signature appears
-in this documentation as `Val{Main.ex-taylorize.pendulum!}`. It is
+in this documentation as `Val{Main.pendulum!}`. It is
 a specialized version for the function `pendulum!` (with some extra information
 about the module where the function was created). This method
 is selected internally if it exists (default), exploiting dispatch, when
-calling [`taylorinteg`](@ref) or [`lyap_taylorinteg`](@ref); to integrate
-using the hard-coded (default) method of [`TaylorIntegration.jetcoeffs!`](@ref) of the
+calling [`taylorinteg`](@ref) or [`lyap_taylorinteg`](@ref). In order to integrate
+using the hard-coded standard (default) method of [`TaylorIntegration.jetcoeffs!`](@ref) of the
 integration above, the keyword argument `parse_eqs` has to be set to `false`.
 Similarly, one can check that there exists a new method of
-`TaylorIntegration._allocate_jetcoeffs!`
+`TaylorIntegration._allocate_jetcoeffs!`.
 
 Now we carry out the integration using the specialized method; note that we
 use the same instruction as above; the default value for the keyword argument `parse_eqs`
 is `true`, so we may omit it.
 
 ```@example taylorize
-t2, x2 = taylorinteg(pendulum!, q0, t0, tf, 25, 1e-20, maxsteps=1500); # warm-up run
-e2 = @elapsed taylorinteg(pendulum!, q0, t0, tf, 25, 1e-20, maxsteps=1500);
-all2 = @allocated taylorinteg(pendulum!, q0, t0, tf, 25, 1e-20, maxsteps=1500);
+t2, x2 = taylorinteg(pendulum!, q0, t0, tf, 25, 1e-20, maxsteps=50000); # warm-up run
+e2 = @elapsed taylorinteg(pendulum!, q0, t0, tf, 25, 1e-20, maxsteps=50000);
+all2 = @allocated taylorinteg(pendulum!, q0, t0, tf, 25, 1e-20, maxsteps=50000);
 e2, all2
 ```
 
@@ -135,10 +140,10 @@ created by [`@taylorize`](@ref), [`taylorinteg`](@ref) and
 [`lyap_taylorinteg`](@ref) recognize the keyword argument `parse_eqs`;
 setting it to `false` causes the standard method to be used.
 ```@example taylorize
-taylorinteg(pendulum!, q0, t0, tf, 25, 1e-20, maxsteps=1500, parse_eqs=false); # warm-up run
+taylorinteg(pendulum!, q0, t0, tf, 25, 1e-20, maxsteps=50000, parse_eqs=false); # warm-up run
 
-e3 = @elapsed taylorinteg(pendulum!, q0, t0, tf, 25, 1e-20, maxsteps=1500, parse_eqs=false);
-all3 = @allocated taylorinteg(pendulum!, q0, t0, tf, 25, 1e-20, maxsteps=1500, parse_eqs=false);
+e3 = @elapsed taylorinteg(pendulum!, q0, t0, tf, 25, 1e-20, maxsteps=50000, parse_eqs=false);
+all3 = @allocated taylorinteg(pendulum!, q0, t0, tf, 25, 1e-20, maxsteps=50000, parse_eqs=false);
 e1/e3, all1/all3
 ```
 
@@ -153,20 +158,21 @@ e4 = @elapsed solve(prob, TaylorMethod(25), abstol=1e-20, parse_eqs=true);
 
 e1/e4
 ```
-Note that there is an additional marginal cost to using `solve` in comparison
-with `taylorinteg`.
+Note that there is an additional cost to using `solve` in comparison
+with `taylorinteg`, but still `@taylorize` yields improved running times.
 
 The speed-up obtained comes from the design of the new (specialized) method of
 `TaylorIntegration.jetcoeffs!` as described [above](@ref idea): it avoids some
 allocations and some repeated computations. This is achieved by knowing the
 specific AST of the function of the ODEs integrated, which is walked
 through and *translated* into the actual implementation, where some
-required auxiliary arrays are created, and the low-level functions defined in
+required auxiliary arrays are created and reused, and the low-level functions defined in
 `TaylorSeries.jl` are used.
 For this, we heavily rely on [`Espresso.jl`](https://github.com/dfdx/Espresso.jl) and
 some metaprogramming; we thank Andrei Zhabinski for his help and comments.
 
-The new `TaylorIntegration.jetcoeffs!` and `TaylorIntegration._allocate_jetcoeffs!` method can be inspected by
+The new `TaylorIntegration.jetcoeffs!` and `TaylorIntegration._allocate_jetcoeffs!`
+methods can be inspected by
 constructing the expression corresponding to the function, and using
 [`TaylorIntegration._make_parsed_jetcoeffs`](@ref):
 
@@ -182,7 +188,7 @@ new_ex1, new_ex2 = TaylorIntegration._make_parsed_jetcoeffs(ex)
 
 The first function has a similar structure as the hard-coded method of
 `TaylorIntegration.jetcoeffs!`, but uses low-level functions in `TaylorSeries`
-(e.g., `sincos!` above); all temporary `Taylor1` objects as well as declared
+(e.g., `sincos!` above). Temporary `Taylor1` objects as well as declared
 arrays are allocated once by `TaylorIntegration._allocate_jetcoeffs!`.
 More complex functions quickly become very difficult to read. Note that,
 if necessary, one can further optimize `new_ex` manually.
@@ -205,13 +211,17 @@ list some limitations and provide some advice.
   example, the expression `x += y` is not recognized by `@taylorize`. Likewise,
   expressions such as `x = x+y` are not supported by `@taylorize` and should be
   replaced by equivalent expressions in which variables appear only on one side
-  of the assignment; e.g. `z = x+y; x = z`.
+  of the assignment; e.g. `z = x+y; x = z`. The introduction of such temporary
+  variables `z` is left to the user.
 
-- The macro allows the use of array declarations through `Array`, but other ways
-  (e.g. `similar`) are not yet implemented.
+- The macro allows the use of array declarations through `Array` or `Vector`, but other ways
+  (e.g. `similar`) are not yet implemented. Note that certain temporary arrays
+  may be introduced to avoid re-computating certain expressions; only up-to-three
+  indices expressions are currently handled.
 
-- Avoid using variables prefixed by an underscore, in particular `_T`, `_S` and
-  `_N`; using them may lead to name collisions with some internal variables.
+- Avoid using variables prefixed by an underscore, in particular `_T`, `_S`,
+  `_N` and `__idx`, as well as `ord`; using them may lead to name collisions
+  with some internal variables used in the constructed expressions.
 
 - Broadcasting is not recognized by `@taylorize`.
 
@@ -226,33 +236,35 @@ list some limitations and provide some advice.
   rather than comparing against numeric literals.
 
 - Input and output lengths should be determined at the time of `@taylorize`
-  application, not at runtime.  Do not use the length of the input as an
+  application (parse time), not at runtime.  Avoid using the length of the input as an
   implicit indicator of whether to write all elements of the output.  If
-  conditional output of auxiliary equations is desired, use explicit methods,
-  such as through parameters or by setting auxiliary `t0` vector elements
+  conditional output of auxiliary equations is desired use explicit methods,
+  such as through parameters or by setting auxiliary vector elements
   to zero, and assigning unwanted auxiliary outputs zero.
 
 - Expressions which correspond to function calls (so the `head` field is
   `:call`) which are not recognized by the parser are simply copied. The
   heuristics used, especially for vectors, may not work for all cases.
 
-- Use `local` for internal parameters (simple constant values); this improves
-  performance. Do not use it if the variable is Taylor expanded during the integration.
-
+- Use `local` for internal parameters, e.g., simple constant values; this improves
+  performance. Do not use it if the variable is needed to be Taylor expanded
+  during the integration step.
+
 - To examine the code generated for `jetcoeffs!` and `_allocate_jetcoeffs!`
   for a specific ODE function, follow the pendulum example above; create an expression
-  by wrapping the ODE function (without `@taylorize` prefix) in `:()` and
+  by wrapping the ODE function (without `@taylorize` prefix) in a `:()`-block, and
   supply the expression to `TaylorIntegration._make_parsed_jetcoeffs`.  This
   can help in debugging issues with either function generated by `@taylorize`.
-  
+
 - `@taylorize` supports multi-threading via `Threads.@threads`. **WARNING**:
   this feature is experimental. Since thread-safety depends on the definition
   of each ODE, we cannot guarantee the resulting code to be thread-safe in
   advance. The user should check the resulting code to ensure that it is indeed
   thread-safe. For more information about multi-threading, the reader is
   referred to the [Julia documentation](https://docs.julialang.org/en/v1/manual/multi-threading/#man-multithreading).
 
-It is recommended to skim `test/taylorize.jl`, which implements different
-cases and highlights cases where the macro doesn't work and how to solve the problem.
+We recommend to skim `test/taylorize.jl`, which implements different
+cases and highlights examples where the macro does not work, and how to solve the problem;
+read the information that is in the comments.
 
 Please report any problems you may encounter.
diff --git a/src/TaylorIntegration.jl b/src/TaylorIntegration.jl
@@ -12,10 +12,10 @@ using InteractiveUtils: methodswith
 
 export taylorinteg, lyap_taylorinteg, @taylorize
 
+include("parse_eqs.jl")
 include("explicitode.jl")
 include("lyapunovspectrum.jl")
 include("rootfinding.jl")
-include("parse_eqs.jl")
 
 function __init__()
     @require DiffEqBase = "2b5f629d-d688-5b77-993f-72d75c75574e" include("common.jl")