Skip to content

Commit

Permalink
put timers around time integrator steps again
Browse files Browse the repository at this point in the history
  • Loading branch information
aperijake committed Jan 12, 2025
1 parent c1398ae commit 850ef5a
Showing 1 changed file with 20 additions and 5 deletions.
25 changes: 20 additions & 5 deletions src/Solver.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -203,7 +203,10 @@ double ExplicitSolver::Solve() {
CommunicateForce(aperi::SolverTimerType::CommunicateForce);

// Compute initial accelerations, done at state np1 as states will be swapped at the start of the time loop
explicit_time_integrator->ComputeAcceleration();
{
auto timer = m_timer_manager->CreateScopedTimer(SolverTimerType::TimeIntegrationNodalUpdates);
explicit_time_integrator->ComputeAcceleration();
}

// Initialize total runtime, average runtime, for benchmarking
double total_runtime = 0.0;
Expand Down Expand Up @@ -261,7 +264,10 @@ double ExplicitSolver::Solve() {
UpdateFieldStates();

// Compute the first partial update nodal velocities: v^{n+½} = v^n + (t^{n+½} − t^n)a^n
explicit_time_integrator->ComputeFirstPartialUpdate();
{
auto timer = m_timer_manager->CreateScopedTimer(SolverTimerType::TimeIntegrationNodalUpdates);
explicit_time_integrator->ComputeFirstPartialUpdate();
}

// Enforce essential boundary conditions: node I on \gamma_v_i : v_{iI}^{n+½} = \overbar{v}_I(x_I,t^{n+½})
{
Expand All @@ -272,7 +278,10 @@ double ExplicitSolver::Solve() {
}

// Update nodal displacements: d^{n+1} = d^n+ Δt^{n+½}v^{n+½}
explicit_time_integrator->UpdateDisplacements();
{
auto timer = m_timer_manager->CreateScopedTimer(SolverTimerType::TimeIntegrationNodalUpdates);
explicit_time_integrator->UpdateDisplacements();
}

// Compute the force, f^{n+1}
ComputeForce(aperi::SolverTimerType::ComputeForce);
Expand All @@ -281,7 +290,10 @@ double ExplicitSolver::Solve() {
CommunicateForce(aperi::SolverTimerType::CommunicateForce);

// Compute acceleration: a^{n+1} = M^{–1}(f^{n+1})
explicit_time_integrator->ComputeAcceleration();
{
auto timer = m_timer_manager->CreateScopedTimer(SolverTimerType::TimeIntegrationNodalUpdates);
explicit_time_integrator->ComputeAcceleration();
}

// Set acceleration on essential boundary conditions. Overwrites acceleration from ComputeAcceleration above so that the acceleration is consistent with the velocity boundary condition.
{
Expand All @@ -292,7 +304,10 @@ double ExplicitSolver::Solve() {
}

// Compute the second partial update nodal velocities: v^{n+1} = v^{n+½} + (t^{n+1} − t^{n+½})a^{n+1}
explicit_time_integrator->ComputeSecondPartialUpdate();
{
auto timer = m_timer_manager->CreateScopedTimer(SolverTimerType::TimeIntegrationNodalUpdates);
explicit_time_integrator->ComputeSecondPartialUpdate();
}

// Compute the energy balance
// TODO(jake): Compute energy balance
Expand Down

11 comments on commit 850ef5a

@github-actions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.20.

Benchmark suite Current: 850ef5a Previous: c1398ae Ratio
Application_Setup_CreateInternalForceContribution_fem_taylor_bar_cpu_np_1 0.000061068 seconds 0.000037122 seconds 1.65
Application_Setup_CreateExternalForceContribution_fem_taylor_bar_cpu_np_1 0.000006994 seconds 0.00000501 seconds 1.40
Application_Setup_CreateTimeStepper_fem_taylor_bar_cpu_np_4 0.000024758 seconds 0.000012163 seconds 2.04
Application_Setup_CreateInternalForceContribution_fem_taylor_bar_cpu_np_4 0.000048414 seconds 0.000028946 seconds 1.67
Application_Setup_CreateTimeStepper_fem_taylor_bar_gpu_np_1 0.000019989 seconds 0.000016081 seconds 1.24

This comment was automatically generated by workflow using github-action-benchmark.

CC: @aperijake

@github-actions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.20.

Benchmark suite Current: 850ef5a Previous: c1398ae Ratio
Explicit_Solver_TimeIntegrationNodalUpdates_fem_taylor_bar_cpu_np_1 0.753861 seconds 0 seconds +∞
Explicit_Solver_TimeIntegrationNodalUpdates_fem_taylor_bar_cpu_np_4 0.452284 seconds 0 seconds +∞
Explicit_Solver_CommunicateForce_fem_taylor_bar_cpu_np_4 2.11946 seconds 1.57314 seconds 1.35
Explicit_Solver_TimeStepCompute_fem_taylor_bar_cpu_np_4 0.000108299 seconds 0.000084964 seconds 1.27
Explicit_Solver_TimeIntegrationNodalUpdates_fem_taylor_bar_gpu_np_1 0.0802404 seconds 0 seconds +∞

This comment was automatically generated by workflow using github-action-benchmark.

CC: @aperijake

@github-actions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.20.

Benchmark suite Current: 850ef5a Previous: c1398ae Ratio
Application_Setup_CreateTimeStepper_sfem_taylor_bar_cpu_np_4 0.000027182 seconds 0.000019968 seconds 1.36
Application_Setup_CreateBoundaryConditions_sfem_taylor_bar_cpu_np_4 0.000190468 seconds 0.000156963 seconds 1.21
Application_Setup_CreateOutputScheduler_sfem_taylor_bar_cpu_np_4 0.000093099 seconds 0.000071889 seconds 1.30

This comment was automatically generated by workflow using github-action-benchmark.

CC: @aperijake

@github-actions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.20.

Benchmark suite Current: 850ef5a Previous: c1398ae Ratio
Smoothed_Cell_Data_Instantiate_sfem_taylor_bar_cpu_np_4 0.0261884 seconds 0.0191637 seconds 1.37

This comment was automatically generated by workflow using github-action-benchmark.

CC: @aperijake

@github-actions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.20.

Benchmark suite Current: 850ef5a Previous: c1398ae Ratio
Explicit_Solver_TimeIntegrationNodalUpdates_sfem_taylor_bar_cpu_np_1 0.768579 seconds 0 seconds +∞
Explicit_Solver_TimeIntegrationNodalUpdates_sfem_taylor_bar_cpu_np_4 0.457862 seconds 0 seconds +∞
Explicit_Solver_TimeIntegrationNodalUpdates_sfem_taylor_bar_gpu_np_1 0.080885 seconds 0 seconds +∞

This comment was automatically generated by workflow using github-action-benchmark.

CC: @aperijake

@github-actions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.20.

Benchmark suite Current: 850ef5a Previous: c1398ae Ratio
Application_Setup_CreateInternalForceContribution_rkpm_taylor_bar_cpu_np_1 0.000072229 seconds 0.000047402 seconds 1.52
Application_Setup_CreateExternalForceContribution_rkpm_taylor_bar_cpu_np_1 0.000007334 seconds 0.000005009 seconds 1.46
Application_Setup_CreateFieldResultsFile_rkpm_taylor_bar_cpu_np_4 0.0129833 seconds 0.00948483 seconds 1.37

This comment was automatically generated by workflow using github-action-benchmark.

CC: @aperijake

@github-actions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.20.

Benchmark suite Current: 850ef5a Previous: c1398ae Ratio
Neighbor_Search_Processor_Instantiate_rkpm_taylor_bar_cpu_np_4 0.000160451 seconds 0.000128758 seconds 1.25
Neighbor_Search_Processor_CreateNodeSpheres_rkpm_taylor_bar_cpu_np_4 0.0030804 seconds 0.00240087 seconds 1.28
Neighbor_Search_Processor_CreateNodePoints_rkpm_taylor_bar_cpu_np_4 0.00263443 seconds 0.00202314 seconds 1.30
Neighbor_Search_Processor_ComputeKernelRadius_rkpm_taylor_bar_cpu_np_4 0.000261857 seconds 0.000211308 seconds 1.24

This comment was automatically generated by workflow using github-action-benchmark.

CC: @aperijake

@github-actions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.20.

Benchmark suite Current: 850ef5a Previous: c1398ae Ratio
Explicit_Solver_TimeIntegrationNodalUpdates_rkpm_taylor_bar_cpu_np_1 0.194236 seconds 0 seconds +∞
Explicit_Solver_TimeIntegrationNodalUpdates_rkpm_taylor_bar_cpu_np_4 0.108884 seconds 0 seconds +∞
Explicit_Solver_TimeIntegrationNodalUpdates_rkpm_taylor_bar_gpu_np_1 0.0208519 seconds 0 seconds +∞

This comment was automatically generated by workflow using github-action-benchmark.

CC: @aperijake

@github-actions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.20.

Benchmark suite Current: 850ef5a Previous: c1398ae Ratio
Application_Setup_ReadInputMesh_rkpm_nodal_taylor_bar_cpu_np_1 0.250305 seconds 0.133703 seconds 1.87
Application_Setup_CreateTimeStepper_rkpm_nodal_taylor_bar_cpu_np_4 0.000087439 seconds 0.000011673 seconds 7.49
Application_Setup_CreateInternalForceContribution_rkpm_nodal_taylor_bar_cpu_np_4 0.000161131 seconds 0.000030058 seconds 5.36
Application_Setup_ReadInputMesh_rkpm_nodal_taylor_bar_gpu_np_1 0.196482 seconds 0.0945789 seconds 2.08

This comment was automatically generated by workflow using github-action-benchmark.

CC: @aperijake

@github-actions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.20.

Benchmark suite Current: 850ef5a Previous: c1398ae Ratio
Neighbor_Search_Processor_KokkosDeepCopy_rkpm_nodal_taylor_bar_cpu_np_1 0.000009098 seconds 0.000004398 seconds 2.07
Element_CreateElementForceProcessor_rkpm_nodal_taylor_bar_cpu_np_4 0.000147615 seconds 0.000110815 seconds 1.33
Neighbor_Search_Processor_KokkosDeepCopy_rkpm_nodal_taylor_bar_cpu_np_4 0.000016662 seconds 0.000005371 seconds 3.10
Neighbor_Search_Processor_ComputeKernelRadius_rkpm_nodal_taylor_bar_cpu_np_4 0.00076574 seconds 0.000102388 seconds 7.48

This comment was automatically generated by workflow using github-action-benchmark.

CC: @aperijake

@github-actions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.20.

Benchmark suite Current: 850ef5a Previous: c1398ae Ratio
Explicit_Solver_TimeIntegrationNodalUpdates_rkpm_nodal_taylor_bar_cpu_np_1 0.159518 seconds 0 seconds +∞
Explicit_Solver_TimeIntegrationNodalUpdates_rkpm_nodal_taylor_bar_cpu_np_4 0.0596976 seconds 0 seconds +∞
Explicit_Solver_TimeIntegrationNodalUpdates_rkpm_nodal_taylor_bar_gpu_np_1 0.04073 seconds 0 seconds +∞

This comment was automatically generated by workflow using github-action-benchmark.

CC: @aperijake

Please sign in to comment.