Skip to content

Commit

Permalink
Refactor taylorize (#156)
Browse files Browse the repository at this point in the history
* Add RetAlloc to store preallocated objects

* Add allocated vars definitions to parsed `jetcoeffs!`

* Use a RetAlloc  argument in parsed `jetcoeffs!`

* Split array declaration by number of indices up to 3 indices

* Increase test coverage of common interface with DiffEqs

Note that `full_cache` was never used, so I commented

* Remove `debug` flags in parse_eqs.jl

* Allow integration of ODEs with abstract arrays with tests (common interface)

* Add tests related to #143, somehow solved

* Comment __jetcoeffs! methods for DynamicalODEFunctions

which are not used or seemingly useless

* Precisions in docs/taylorize.md

* Further documentation fixes

* Increase `maxsteps` in examples to make fairer comparisons

* Add declaration of Taylor1 arrays involving 4 indices

* Fix an error, and error when more than 5-index array is included

* Fix typo and fix tests

* Add test for error thrown with 5-index arrays

* Fix added test so it works with Julia v1.6 too

* Comment a seemingly useless line

* Pass t-variable with complete order to ODE function

* Uncomment and fix Kepler benchmarks with parse_eqs=false

* Bump minor version

* Few more tests... for coverage

* TaylorSeries v0.13 and OrdinaryDiffEq v6
  • Loading branch information
lbenet authored Feb 8, 2023
1 parent 5422e9b commit da195d4
Show file tree
Hide file tree
Showing 11 changed files with 616 additions and 345 deletions.
4 changes: 2 additions & 2 deletions Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -26,13 +26,13 @@ BenchmarkTools = "1.3"
DiffEqBase = "6"
Elliptic = "0.5, 1.0"
Espresso = "0.6.1"
OrdinaryDiffEq = "5, 6"
OrdinaryDiffEq = "6"
PkgBenchmark = "0.2"
RecursiveArrayTools = "2"
Reexport = "0.2, 1"
Requires = "0.5.2, 1"
StaticArrays = "0.12.5, 1"
TaylorSeries = "0.12"
TaylorSeries = "0.13"
julia = "1"

[extras]
Expand Down
38 changes: 19 additions & 19 deletions benchmark/kepler_benchmarks.jl
Original file line number Diff line number Diff line change
Expand Up @@ -115,23 +115,23 @@ let
SUITE["Kepler"]["kepler6-3"] = @benchmarkable taylorinteg(
kepler6!, $q0, $t0, $tf3, $_order, $_abstol, -1.0, maxsteps=$maxsteps3)

# # ==========
# # KeplerNotParsed
# # ==========
# SUITE["KeplerNotParsed"] = BenchmarkGroup()

# SUITE["KeplerNotParsed"]["kepler1"] = @benchmarkable taylorinteg(
# kepler1!, $q0, $t0, $tf, $_order, $_abstol, -1.0, maxsteps=$maxsteps,
# parse_eqs=false)
# SUITE["KeplerNotParsed"]["kepler2"] = @benchmarkable taylorinteg(
# kepler2!, $q0, $t0, $tf, $_order, $_abstol, $pars, maxsteps=$maxsteps,
# parse_eqs=false)
#
# SUITE["KeplerNotParsed"]["kepler5"] = @benchmarkable taylorinteg(
# kepler5!, $q0, $t0, $tf, $_order, $_abstol, maxsteps=$maxsteps,
# parse_eqs=false)
# SUITE["KeplerNotParsed"]["kepler6"] = @benchmarkable taylorinteg(
# kepler6!, $q0, $t0, $tf, $_order, $_abstol, -1.0, maxsteps=$maxsteps,
# parse_eqs=false)
#
# ==========
# KeplerNotParsed
# ==========
SUITE["KeplerNotParsed"] = BenchmarkGroup()

SUITE["KeplerNotParsed"]["kepler1"] = @benchmarkable taylorinteg(
kepler1!, $q0, $t0, $tf2, $_order, $_abstol, -1.0, maxsteps=$maxsteps2,
parse_eqs=false)
SUITE["KeplerNotParsed"]["kepler2"] = @benchmarkable taylorinteg(
kepler2!, $q0, $t0, $tf2, $_order, $_abstol, $pars, maxsteps=$maxsteps2,
parse_eqs=false)

SUITE["KeplerNotParsed"]["kepler5"] = @benchmarkable taylorinteg(
kepler5!, $q0, $t0, $tf2, $_order, $_abstol, maxsteps=$maxsteps2,
parse_eqs=false)
SUITE["KeplerNotParsed"]["kepler6"] = @benchmarkable taylorinteg(
kepler6!, $q0, $t0, $tf2, $_order, $_abstol, -1.0, maxsteps=$maxsteps2,
parse_eqs=false)

end
110 changes: 61 additions & 49 deletions docs/src/taylorize.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,19 +17,22 @@ order to compute the recurrence relations that are used to construct
the Taylor expansion of the solution. This is done for each order of
the series in [`TaylorIntegration.jetcoeffs!`](@ref). These computations are
not optimized: they waste memory due to repeated allocations of some temporary
arrays, and perform some operations whose result has been previously
arrays, and perform some operations whose result has already been previously
computed.

Here we describe one way to optimize this: The idea is to replace the
default method of [`TaylorIntegration.jetcoeffs!`](@ref) by another
method (same function name) which is called by dispatch, and that in principle
performs better.
The new method is constructed specifically for the specific function
The new method is constructed specifically for the function
defining the equations of motion by parsing its expression. This new
method performs in principle *exactly* the same operations, but avoids
the extra allocations; to do the latter, the macro also creates an *internal*
repeating some operations and the extra allocations.
To achieve the latter, the macro also creates an *internal*
function `TaylorIntegration._allocate_jetcoeffs!`, which allocates all temporary
`Taylor1` objects as well as the declared `Array{Taylor1,1}`s.
`Taylor1` objects as well as the declared `Array{Taylor1,N}`s, which are stored
in a [`RetAlloc{T}`](@ref) struct for efficiency, and include arrays (of `Taylor1{T}`
objects) with up-to-three indices.


## An example
Expand All @@ -41,7 +44,7 @@ using the default method, as described [before](@ref pendulum).
```@example taylorize
using TaylorIntegration
function pendulum!(dx, x, p, t)
function pendulumNP!(dx, x, p, t) # `pendulum!` ODEs, not parsed
dx[1] = x[2]
dx[2] = -sin(x[1])
return dx
Expand All @@ -53,28 +56,30 @@ tf = 10000.0
q0 = [1.3, 0.0]
# The actual integration
t1, x1 = taylorinteg(pendulum!, q0, t0, tf, 25, 1e-20, maxsteps=1500); # warm-up run
e1 = @elapsed taylorinteg(pendulum!, q0, t0, tf, 25, 1e-20, maxsteps=1500);
all1 = @allocated taylorinteg(pendulum!, q0, t0, tf, 25, 1e-20, maxsteps=1500);
t1, x1 = taylorinteg(pendulumNP!, q0, t0, tf, 25, 1e-20, maxsteps=50000); # warm-up run
e1 = @elapsed taylorinteg(pendulumNP!, q0, t0, tf, 25, 1e-20, maxsteps=50000);
all1 = @allocated taylorinteg(pendulumNP!, q0, t0, tf, 25, 1e-20, maxsteps=50000);
e1, all1
```

We note that the initial number of methods defined for
`TaylorIntegration.jetcoeffs!` is 2.
The initial number of methods defined for
`TaylorIntegration.jetcoeffs!` is 2; yet, since `@taylorize` was used
in [an example previously](@ref diffeqinterface), the current number
of methods is 3, as explained below.
```@example taylorize
length(methods(TaylorIntegration.jetcoeffs!)) == 2 # initial value
println(methods(TaylorIntegration.jetcoeffs!)) # default methods
```
Similarly, the number of methods for `TaylorIntegration._allocate_jetcoeffs!` is
also 2.
Similarly, the number of methods for `TaylorIntegration._allocate_jetcoeffs!` originally is 2, and for the same reasons it is
currently 3.
```@example taylorize
length(methods(TaylorIntegration._allocate_jetcoeffs!)) == 2 # initial value
println(methods(TaylorIntegration._allocate_jetcoeffs!)) # default methods
```
Using `@taylorize` will increase this number by creating a new method for these
two functions.
Using `@taylorize` will increase this number by creating a new method
for these functions.

The macro [`@taylorize`](@ref) is intended to be used in front of the function
that implements the equations of motion. The macro does the following: it
first parses the function as it is, so the integration can be computed
first parses the function as it is, so the integration can still be computed
using [`taylorinteg`](@ref) as above, by explicitly using the keyword
argument `parse_eqs=false`; this also declares the function of the ODEs, whose name
is used for parsing. It then creates and evaluates a new method of
Expand All @@ -93,30 +98,30 @@ println(methods(pendulum!))
```

```@example taylorize
println(methods(TaylorIntegration.jetcoeffs!))
println(methods(TaylorIntegration.jetcoeffs!)) # result should be 4
```

We see that there is only one method of `pendulum!`, and
there is a *new* method (three in total) of `TaylorIntegration.jetcoeffs!`,
there is a *new* method (four in total) of `TaylorIntegration.jetcoeffs!`,
whose signature appears
in this documentation as `Val{Main.ex-taylorize.pendulum!}`. It is
in this documentation as `Val{Main.pendulum!}`. It is
a specialized version for the function `pendulum!` (with some extra information
about the module where the function was created). This method
is selected internally if it exists (default), exploiting dispatch, when
calling [`taylorinteg`](@ref) or [`lyap_taylorinteg`](@ref); to integrate
using the hard-coded (default) method of [`TaylorIntegration.jetcoeffs!`](@ref) of the
calling [`taylorinteg`](@ref) or [`lyap_taylorinteg`](@ref). In order to integrate
using the hard-coded standard (default) method of [`TaylorIntegration.jetcoeffs!`](@ref) of the
integration above, the keyword argument `parse_eqs` has to be set to `false`.
Similarly, one can check that there exists a new method of
`TaylorIntegration._allocate_jetcoeffs!`
`TaylorIntegration._allocate_jetcoeffs!`.

Now we carry out the integration using the specialized method; note that we
use the same instruction as above; the default value for the keyword argument `parse_eqs`
is `true`, so we may omit it.

```@example taylorize
t2, x2 = taylorinteg(pendulum!, q0, t0, tf, 25, 1e-20, maxsteps=1500); # warm-up run
e2 = @elapsed taylorinteg(pendulum!, q0, t0, tf, 25, 1e-20, maxsteps=1500);
all2 = @allocated taylorinteg(pendulum!, q0, t0, tf, 25, 1e-20, maxsteps=1500);
t2, x2 = taylorinteg(pendulum!, q0, t0, tf, 25, 1e-20, maxsteps=50000); # warm-up run
e2 = @elapsed taylorinteg(pendulum!, q0, t0, tf, 25, 1e-20, maxsteps=50000);
all2 = @allocated taylorinteg(pendulum!, q0, t0, tf, 25, 1e-20, maxsteps=50000);
e2, all2
```

Expand All @@ -135,10 +140,10 @@ created by [`@taylorize`](@ref), [`taylorinteg`](@ref) and
[`lyap_taylorinteg`](@ref) recognize the keyword argument `parse_eqs`;
setting it to `false` causes the standard method to be used.
```@example taylorize
taylorinteg(pendulum!, q0, t0, tf, 25, 1e-20, maxsteps=1500, parse_eqs=false); # warm-up run
taylorinteg(pendulum!, q0, t0, tf, 25, 1e-20, maxsteps=50000, parse_eqs=false); # warm-up run
e3 = @elapsed taylorinteg(pendulum!, q0, t0, tf, 25, 1e-20, maxsteps=1500, parse_eqs=false);
all3 = @allocated taylorinteg(pendulum!, q0, t0, tf, 25, 1e-20, maxsteps=1500, parse_eqs=false);
e3 = @elapsed taylorinteg(pendulum!, q0, t0, tf, 25, 1e-20, maxsteps=50000, parse_eqs=false);
all3 = @allocated taylorinteg(pendulum!, q0, t0, tf, 25, 1e-20, maxsteps=50000, parse_eqs=false);
e1/e3, all1/all3
```

Expand All @@ -153,20 +158,21 @@ e4 = @elapsed solve(prob, TaylorMethod(25), abstol=1e-20, parse_eqs=true);
e1/e4
```
Note that there is an additional marginal cost to using `solve` in comparison
with `taylorinteg`.
Note that there is an additional cost to using `solve` in comparison
with `taylorinteg`, but still `@taylorize` yields improved running times.

The speed-up obtained comes from the design of the new (specialized) method of
`TaylorIntegration.jetcoeffs!` as described [above](@ref idea): it avoids some
allocations and some repeated computations. This is achieved by knowing the
specific AST of the function of the ODEs integrated, which is walked
through and *translated* into the actual implementation, where some
required auxiliary arrays are created, and the low-level functions defined in
required auxiliary arrays are created and reused, and the low-level functions defined in
`TaylorSeries.jl` are used.
For this, we heavily rely on [`Espresso.jl`](https://github.com/dfdx/Espresso.jl) and
some metaprogramming; we thank Andrei Zhabinski for his help and comments.

The new `TaylorIntegration.jetcoeffs!` and `TaylorIntegration._allocate_jetcoeffs!` method can be inspected by
The new `TaylorIntegration.jetcoeffs!` and `TaylorIntegration._allocate_jetcoeffs!`
methods can be inspected by
constructing the expression corresponding to the function, and using
[`TaylorIntegration._make_parsed_jetcoeffs`](@ref):

Expand All @@ -182,7 +188,7 @@ new_ex1, new_ex2 = TaylorIntegration._make_parsed_jetcoeffs(ex)

The first function has a similar structure as the hard-coded method of
`TaylorIntegration.jetcoeffs!`, but uses low-level functions in `TaylorSeries`
(e.g., `sincos!` above); all temporary `Taylor1` objects as well as declared
(e.g., `sincos!` above). Temporary `Taylor1` objects as well as declared
arrays are allocated once by `TaylorIntegration._allocate_jetcoeffs!`.
More complex functions quickly become very difficult to read. Note that,
if necessary, one can further optimize `new_ex` manually.
Expand All @@ -205,13 +211,17 @@ list some limitations and provide some advice.
example, the expression `x += y` is not recognized by `@taylorize`. Likewise,
expressions such as `x = x+y` are not supported by `@taylorize` and should be
replaced by equivalent expressions in which variables appear only on one side
of the assignment; e.g. `z = x+y; x = z`.
of the assignment; e.g. `z = x+y; x = z`. The introduction of such temporary
variables `z` is left to the user.

- The macro allows the use of array declarations through `Array`, but other ways
(e.g. `similar`) are not yet implemented.
- The macro allows the use of array declarations through `Array` or `Vector`, but other ways
(e.g. `similar`) are not yet implemented. Note that certain temporary arrays
may be introduced to avoid re-computating certain expressions; only up-to-three
indices expressions are currently handled.

- Avoid using variables prefixed by an underscore, in particular `_T`, `_S` and
`_N`; using them may lead to name collisions with some internal variables.
- Avoid using variables prefixed by an underscore, in particular `_T`, `_S`,
`_N` and `__idx`, as well as `ord`; using them may lead to name collisions
with some internal variables used in the constructed expressions.

- Broadcasting is not recognized by `@taylorize`.

Expand All @@ -226,33 +236,35 @@ list some limitations and provide some advice.
rather than comparing against numeric literals.

- Input and output lengths should be determined at the time of `@taylorize`
application, not at runtime. Do not use the length of the input as an
application (parse time), not at runtime. Avoid using the length of the input as an
implicit indicator of whether to write all elements of the output. If
conditional output of auxiliary equations is desired, use explicit methods,
such as through parameters or by setting auxiliary `t0` vector elements
conditional output of auxiliary equations is desired use explicit methods,
such as through parameters or by setting auxiliary vector elements
to zero, and assigning unwanted auxiliary outputs zero.

- Expressions which correspond to function calls (so the `head` field is
`:call`) which are not recognized by the parser are simply copied. The
heuristics used, especially for vectors, may not work for all cases.

- Use `local` for internal parameters (simple constant values); this improves
performance. Do not use it if the variable is Taylor expanded during the integration.

- Use `local` for internal parameters, e.g., simple constant values; this improves
performance. Do not use it if the variable is needed to be Taylor expanded
during the integration step.

- To examine the code generated for `jetcoeffs!` and `_allocate_jetcoeffs!`
for a specific ODE function, follow the pendulum example above; create an expression
by wrapping the ODE function (without `@taylorize` prefix) in `:()` and
by wrapping the ODE function (without `@taylorize` prefix) in a `:()`-block, and
supply the expression to `TaylorIntegration._make_parsed_jetcoeffs`. This
can help in debugging issues with either function generated by `@taylorize`.

- `@taylorize` supports multi-threading via `Threads.@threads`. **WARNING**:
this feature is experimental. Since thread-safety depends on the definition
of each ODE, we cannot guarantee the resulting code to be thread-safe in
advance. The user should check the resulting code to ensure that it is indeed
thread-safe. For more information about multi-threading, the reader is
referred to the [Julia documentation](https://docs.julialang.org/en/v1/manual/multi-threading/#man-multithreading).

It is recommended to skim `test/taylorize.jl`, which implements different
cases and highlights cases where the macro doesn't work and how to solve the problem.
We recommend to skim `test/taylorize.jl`, which implements different
cases and highlights examples where the macro does not work, and how to solve the problem;
read the information that is in the comments.

Please report any problems you may encounter.
2 changes: 1 addition & 1 deletion src/TaylorIntegration.jl
Original file line number Diff line number Diff line change
Expand Up @@ -12,10 +12,10 @@ using InteractiveUtils: methodswith

export taylorinteg, lyap_taylorinteg, @taylorize

include("parse_eqs.jl")
include("explicitode.jl")
include("lyapunovspectrum.jl")
include("rootfinding.jl")
include("parse_eqs.jl")

function __init__()
@require DiffEqBase = "2b5f629d-d688-5b77-993f-72d75c75574e" include("common.jl")
Expand Down
Loading

2 comments on commit da195d4

@lbenet
Copy link
Collaborator Author

@lbenet lbenet commented on da195d4 Feb 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JuliaRegistrator
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error while trying to register: "Tag with name v0.10.0 already exists and points to a different commit"

Please sign in to comment.