Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Macros for OSACA (and other *CA) analysis of the latency of specific paths of functions #4151

Merged
merged 26 commits into from
Dec 31, 2024

Conversation

eggrobin
Copy link
Member

@eggrobin eggrobin commented Dec 31, 2024

Example for Cos(0.1):

Other analysers (uiCA, IACA 3.0 and 2.3, LLVM-MCA) on uops.info, Skylake:

OSACA, Cascade Lake loop carried through memory:

Cropped dependency graph:
osaca_dg-cropped

Output:

Open Source Architecture Code Analyzer (OSACA) - 0.6.1
Analyzed file:      ..\Principia\functions\Release\x64\sin_cos_test.asm
Architecture:       CSX
Timestamp:          2024-12-31 00:51:00


 P - Throughput of LOAD operation can be hidden behind a past or future STORE instruction
 * - Instruction micro-ops not bound to a port
 X - No throughput/latency information for this instruction in data file


Combined Analysis Report
------------------------
                                       Port pressure in cycles
     |  0   - 0DV  |   1   |  2   -  2D  |  3   -  3D  |  4   |   5   |  6   |  7   ||  CP  | LCD  |
----------------------------------------------------------------------------------------------------
4410 |             |       | 0.50   0.50 | 0.50   0.50 |      |       |      |      ||  4.0 |  4.0 |   movsd xmm9, QWORD PTR OSACA_input$27[rsp]
4411 |             |       | 0.50   0.50 | 0.50   0.50 |      |       |      |      ||      |      |   movzx edx, BYTE PTR ?UseHardwareFMA@internal@_fma@numerics@principia@@3_NB ; principia::numerics::_fma::internal::UseHardwareFMA
4412 |             |       | 0.50   0.50 | 0.50   0.50 |      |       |      |      ||      |      |   movzx ecx, BYTE PTR ?OSACA_loop_terminator@@3_NA
4413 |             |       | 0.50   0.50 | 0.50   0.50 |      |       |      |      ||      |      |   movaps xmm13, XMMWORD PTR ?sign_bit@masks@internal@_sin_cos@numerics@principia@@3U__m128d@@B
4414 |             |       | 0.50   0.50 | 0.50   0.50 |      |       |      |      ||      |      |   movaps xmm14, XMMWORD PTR ?mantissa_index_bits@masks@internal@_sin_cos@numerics@principia@@3U__m128d@@B
4415 |             |       | 0.50   0.50 | 0.50   0.50 |      |       |      |      ||      |      |   movaps xmm15, XMMWORD PTR ?exponent_bits@masks@internal@_sin_cos@numerics@principia@@3U__m128d@@B
4416 |             |       |             |             |      |       |      |      ||      |      |   $OSACA_loop$215:
4417 |             |       |             |             |      |       |      |      ||      |      |   ; Line 196
4418 | 1.00        |       |             |             |      |       |      |      ||      |      |   comisd xmm12, xmm9
4419 |             |       |             |             |      |       |      |      ||      |      | * jbe SHORT $LN19@TestBody
4420 | 1.00        |       |             |             |      |       |      |      ||      |      |   comisd xmm9, xmm10
4421 |             |       |             |             |      |       |      |      ||      |      | * jbe SHORT $LN19@TestBody
4422 | 0.00        | 0.000 |             |             |      | 0.000 | 1.00 |      ||      |      |   mov al, 1
4423 | 0.00        |       | 0.50   0.50 | 0.50   0.50 |      |       | 1.00 |      ||      |      |   jmp SHORT $LN20@TestBody
4424 |             |       |             |             |      |       |      |      ||      |      |   $LN19@TestBody:
4425 | 0.00        | 0.000 |             |             |      | 0.000 | 1.00 |      ||      |      |   xor eax, eax
4426 |             |       |             |             |      |       |      |      ||      |      |   $LN20@TestBody:
4427 |             |       | 0.00        | 0.00        | 1.00 |       |      | 1.00 ||      |      |   mov BYTE PTR OSACA_computed_condition$21[rsp], al
4428 |             |       |             |             |      |       |      |      ||      |      |   ; Line 371
4429 |             |       | 0.00        | 0.00        | 1.00 |       |      | 1.00 ||      |      |   mov BYTE PTR OSACA_computed_condition$25[rsp], dl
4430 |             |       |             |             |      |       |      |      ||      |      |   ; Line 372
4431 |             |       | 0.00        | 0.00        | 1.00 |       |      | 1.00 ||      |      |   mov BYTE PTR OSACA_computed_condition$24[rsp], 0
4432 |             |       |             |             |      |       |      |      ||      |      |   ; File C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.38.33130\include\cstdlib
4433 |             |       |             |             |      |       |      |      ||      |      |   ; Line 23
4434 |             |       |             |             |      |       |      |      ||  0.0 |  0.0 | * movaps xmm2, xmm9
4435 | 0.33        | 0.333 |             |             |      | 0.333 |      |      ||  1.0 |  1.0 |   andps xmm2, xmm1
4436 |             |       |             |             |      |       |      |      ||      |      |   ; File C:\Users\robin\Projects\mockingbirdnest\Principia\numerics\sin_cos.cpp
4437 |             |       |             |             |      |       |      |      ||      |      |   ; Line 293
4438 |             |       |             |             |      |       |      |      ||      |      | * movaps xmm0, xmm9
4439 |             |       |             |             |      |       |      |      ||      |      | * movaps xmm3, xmm0
4440 | 0.33        | 0.333 |             |             |      | 0.333 |      |      ||      |      |   andps xmm3, xmm13
4441 |             |       |             |             |      |       |      |      ||      |      |   ; Line 138
4442 | 0.33        | 0.333 | 0.50   0.50 | 0.50   0.50 |      | 0.333 |      |      ||  4.0 |  4.0 |   addsd xmm2, QWORD PTR __real@42a0000000000000
4443 |             |       |             |             |      |       |      |      ||      |      | * movaps xmm0, xmm14
4444 | 0.67        | 0.163 |             |             |      | 0.163 |      |      ||  1.0 |  1.0 |   andps xmm0, xmm2
4445 | 1.00        |       |             |             |      |       |      |      ||  1.0 |  1.0 |   movq rax, xmm0
4446 |             |       |             |             |      |       |      |      ||      |      |   ; File C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.38.33130\include\array
4447 |             |       |             |             |      |       |      |      ||      |      |   ; Line 545
4448 | 0.00        |       |             |             |      |       | 1.00 |      ||  1.0 |  1.0 |   shl rax, 5
4449 |             |       |             |             |      |       |      |      ||      |      |   ; File C:\Users\robin\Projects\mockingbirdnest\Principia\numerics\sin_cos.cpp
4450 |             |       |             |             |      |       |      |      ||      |      |   ; Line 296
4451 |             |       | 0.50   0.50 | 0.50   0.50 |      |       |      |      ||  4.0 |  4.0 |   movsd xmm1, QWORD PTR [rax+rsi]
4452 | 0.33        | 0.333 |             |             |      | 0.333 |      |      ||  1.0 |  1.0 |   xorps xmm1, xmm3
4453 |             |       |             |             |      |       |      |      ||      |      |   ; Line 298
4454 |             |       | 0.50   0.50 | 0.50   0.50 |      |       |      |      ||      |      |   movsd xmm0, QWORD PTR [rax+rsi+8]
4455 |             |       |             |             |      |       |      |      ||      |      | * movaps xmm2, xmm0
4456 | 0.33        | 0.333 |             |             |      | 0.333 |      |      ||      |      |   xorps xmm2, xmm3
4457 |             |       |             |             |      |       |      |      ||      |      |   ; Line 306
4458 | 0.33        | 0.333 |             |             |      | 0.333 |      |      ||  4.0 |  4.0 |   subsd xmm9, xmm1
4459 |             |       |             |             |      |       |      |      ||      |      |   ; File C:\Users\robin\Projects\mockingbirdnest\Principia\quantities\elementary_functions_body.hpp
4460 |             |       |             |             |      |       |      |      ||      |      |   ; Line 60
4461 |             |       | 0.50   0.50 | 0.50   0.50 |      |       |      |      ||      |      |   movsd xmm5, QWORD PTR [rax+rsi+16]
4462 |             |       |             |             |      |       |      |      ||      |      |   ; File C:\Users\robin\Projects\mockingbirdnest\Principia\numerics\fma_body.hpp
4463 |             |       |             |             |      |       |      |      ||      |      |   ; Line 35
4464 |             |       |             |             |      |       |      |      ||      |      | * movaps xmm0, xmm5
4465 |             |       |             |             |      |       |      |      ||      |      | * movaps xmm1, xmm0
4466 |             |       |             |             |      |       |      |      ||      |      | * movaps xmm0, xmm9
4467 | 0.23        | 0.243 |             |             |      | 0.523 |      |      ||      |      |   xorps xmm8, xmm8
4468 |             |       |             |             |      | 1.000 |      |      ||      |      |   movsd xmm8, xmm2
4469 |             |       |             |             |      |       |      |      ||      |      | * movaps xmm7, xmm8
4470 | 0.50        | 0.500 |             |             |      |       |      |      ||      |      |   vfnmadd213sd xmm7, xmm0, xmm1
4471 |             |       |             |             |      |       |      |      ||      |      |   ; File C:\Users\robin\Projects\mockingbirdnest\Principia\numerics\double_precision_body.hpp
4472 |             |       |             |             |      |       |      |      ||      |      |   ; Line 310
4473 |             |       |             |             |      |       |      |      ||      |      | * movaps xmm1, xmm7
4474 | 0.00        | 0.000 |             |             |      | 1.000 |      |      ||      |      |   subsd xmm1, xmm5
4475 |             |       |             |             |      |       |      |      ||      |      |   ; File C:\Users\robin\Projects\mockingbirdnest\Principia\numerics\fma_body.hpp
4476 |             |       |             |             |      |       |      |      ||      |      |   ; Line 46
4477 |             |       |             |             |      |       |      |      ||      |      | * movaps xmm6, xmm8
4478 | 0.50        | 0.500 |             |             |      |       |      |      ||      |      |   vfnmsub213sd xmm6, xmm0, xmm1
4479 |             |       |             |             |      |       |      |      ||      |      |   ; File C:\Users\robin\Projects\mockingbirdnest\Principia\numerics\sin_cos.cpp
4480 |             |       |             |             |      |       |      |      ||      |      |   ; Line 310
4481 |             |       |             |             |      |       |      |      ||  0.0 |  0.0 | * movaps xmm4, xmm9
4482 | 0.00        | 0.000 |             |             |      | 1.000 |      |      ||      |      |   xorps xmm0, xmm0
4483 | 0.50        | 0.500 |             |             |      |       |      |      ||  4.0 |  4.0 |   addsd xmm4, xmm0
4484 | 0.50        | 0.500 |             |             |      |       |      |      ||  4.0 |  4.0 |   mulsd xmm4, xmm9
4485 |             |       |             |             |      |       |      |      ||      |      |   ; Line 311
4486 |             |       |             |             |      |       |      |      ||      |      | * movaps xmm2, xmm4
4487 | 0.50        | 0.500 |             |             |      |       |      |      ||      |      |   mulsd xmm2, xmm9
4488 |             |       |             |             |      |       |      |      ||      |      |   ; File C:\Users\robin\Projects\mockingbirdnest\Principia\numerics\fma_body.hpp
4489 |             |       |             |             |      |       |      |      ||      |      |   ; Line 14
4490 |             |       |             |             |      |       |      |      ||  0.0 |  0.0 | * movaps xmm0, xmm4
4491 |             |       |             |             |      |       |      |      ||  0.0 |  0.0 | * movaps xmm1, xmm0
4492 |             |       | 0.50   0.50 | 0.50   0.50 |      |       |      |      ||      |      |   movaps xmm0, XMMWORD PTR tv1368[rsp]
4493 | 0.50        | 0.500 | 0.50   0.50 | 0.50   0.50 |      |       |      |      ||      |      |   vfmadd213sd xmm0, xmm1, XMMWORD PTR tv1360[rsp]
4494 |             |       |             |             |      |       |      |      ||      |      |   ; File C:\Users\robin\Projects\mockingbirdnest\Principia\numerics\sin_cos.cpp
4495 |             |       |             |             |      |       |      |      ||      |      |   ; Line 312
4496 | 0.21        | 0.790 |             |             |      |       |      |      ||      |      |   mulsd xmm4, xmm5
4497 | 0.00        | 1.000 |             |             |      |       |      |      ||      |      |   mulsd xmm4, xmm0
4498 |             |       |             |             |      |       |      |      ||      |      |   ; File C:\Users\robin\Projects\mockingbirdnest\Principia\numerics\fma_body.hpp
4499 |             |       |             |             |      |       |      |      ||      |      |   ; Line 14
4500 |             |       | 0.50   0.50 | 0.50   0.50 |      |       |      |      ||      |      |   movaps xmm0, XMMWORD PTR tv1367[rsp]
4501 |             |       | 0.50   0.50 | 0.50   0.50 |      |       |      |      ||      |      |   movaps xmm5, XMMWORD PTR tv1356[rsp]
4502 | 0.00        | 1.000 |             |             |      |       |      |      ||  4.0 |  4.0 |   vfmadd213sd xmm0, xmm1, xmm5
4503 |             |       |             |             |      |       |      |      ||      |      |   ; File C:\Users\robin\Projects\mockingbirdnest\Principia\numerics\sin_cos.cpp
4504 |             |       |             |             |      |       |      |      ||      |      |   ; Line 146
4505 |             |       | 0.00        | 0.00        | 1.00 |       |      | 1.00 ||      |      |   mov BYTE PTR OSACA_computed_condition$20[rsp], 1
4506 | 0.00        | 0.000 |             |             |      | 1.000 |      |      ||      |      |   xorps xmm1, xmm1
4507 |             |       |             |             |      |       |      |      ||      |      |   ; File C:\Users\robin\Projects\mockingbirdnest\Principia\numerics\fma_body.hpp
4508 |             |       |             |             |      |       |      |      ||      |      |   ; Line 14
4509 |             |       |             |             |      | 1.000 |      |      ||  1.0 |  1.0 |   movsd xmm1, xmm0
4510 |             |       |             |             |      |       |      |      ||      |      | * movaps xmm0, xmm2
4511 |             |       |             |             |      |       |      |      ||      |      | * movaps xmm3, xmm0
4512 | 0.00        | 1.000 | 0.50   0.50 | 0.50   0.50 |      |       |      |      ||  4.0 |  4.0 |   vfmadd213sd xmm3, xmm1, XMMWORD PTR tv1353[rsp]
4513 |             |       |             |             |      |       |      |      ||      |      |   ; File C:\Users\robin\Projects\mockingbirdnest\Principia\numerics\sin_cos.cpp
4514 |             |       |             |             |      |       |      |      ||      |      |   ; Line 157
4515 |             |       | 0.00        | 0.00        | 1.00 |       |      | 1.00 ||      |      |   mov BYTE PTR OSACA_computed_condition$19[rsp], 1
4516 |             |       |             |             |      |       |      |      ||      |      |   ; File C:\Users\robin\Projects\mockingbirdnest\Principia\numerics\fma_body.hpp
4517 |             |       |             |             |      |       |      |      ||      |      |   ; Line 35
4518 |             |       |             |             |      |       |      |      ||      |      | * movaps xmm0, xmm4
4519 |             |       |             |             |      |       |      |      ||      |      | * movaps xmm2, xmm0
4520 |             |       |             |             |      |       |      |      ||  0.0 |  0.0 | * movaps xmm0, xmm3
4521 | 0.00        | 1.000 |             |             |      |       |      |      ||  4.0 |  4.0 |   vfnmadd213sd xmm8, xmm0, xmm2
4522 | 0.00        | 1.000 |             |             |      |       |      |      ||  4.0 |  4.0 |   addsd xmm8, xmm6
4523 |             |       |             |             |      |       |      |      ||      |      |   ; File C:\Users\robin\Projects\mockingbirdnest\Principia\numerics\double_precision_body.hpp
4524 |             |       |             |             |      |       |      |      ||      |      |   ; Line 352
4525 |             |       |             |             |      |       |      |      ||  0.0 |  0.0 | * movaps xmm9, xmm8
4526 | 0.00        | 1.000 |             |             |      |       |      |      ||  4.0 |  4.0 |   addsd xmm9, xmm7
4527 |             |       |             |             |      |       |      |      ||      |      |   ; File C:\Users\robin\Projects\mockingbirdnest\Principia\numerics\sin_cos.cpp
4528 |             |       |             |             |      |       |      |      ||      |      |   ; Line 173
4529 |             |       |             |             |      |       |      |      ||      |      | * movaps xmm0, xmm9
4530 |             |       |             |             |      |       |      |      ||      |      | * movaps xmm2, xmm0
4531 | 0.22        | 0.207 |             |             |      | 0.573 |      |      ||      |      |   andps xmm2, xmm15
4532 |             |       |             |             |      |       |      |      ||      |      |   ; File C:\Users\robin\Projects\mockingbirdnest\Principia\numerics\double_precision_body.hpp
4533 |             |       |             |             |      |       |      |      ||      |      |   ; Line 353
4534 |             |       |             |             |      |       |      |      ||  0.0 |      | * movaps xmm0, xmm9
4535 | 0.00        | -0.01 |             |             |      | 1.000 |      |      ||  4.0 |      |   subsd xmm0, xmm7
4536 | 0.00        | -0.01 |             |             |      | 1.000 |      |      ||  4.0 |      |   subsd xmm8, xmm0
4537 |             |       |             |             |      |       |      |      ||      |      |   ; File C:\Users\robin\Projects\mockingbirdnest\Principia\numerics\sin_cos.cpp
4538 |             |       |             |             |      |       |      |      ||      |      |   ; Line 175
4539 |             |       |             |             |      |       |      |      ||      |      | * movaps xmm0, xmm13
4540 | 0.00        | -0.01 |             |             |      | 1.000 |      |      ||  1.0 |      |   andnps xmm0, xmm8
4541 |             |       |             |             |      |       |      |      ||      |      |   ; Line 177
4542 | 0.00        | -0.01 |             |             |      | 1.000 |      |      ||  1.0 |      |   psubq xmm0, xmm2
4543 |             |       |             |             |      |       |      |      ||      |      |   ; Line 181
4544 |             |       | 0.50   0.50 | 0.50   0.50 |      |       |      |      ||      |      |   movsd xmm8, QWORD PTR __real@fcaffff000000000
4545 | 1.00        |       |             |             |      |       |      |      ||  3.0 |      |   comisd xmm8, xmm0
4546 |             |       |             |             |      |       |      |      ||      |      | * jbe SHORT $LN125@TestBody
4547 | 1.00        |       |             |             |      |       |      |      ||      |      |   comisd xmm0, xmm11
4548 |             |       |             |             |      |       |      |      ||      |      | * jbe SHORT $LN125@TestBody
4549 | 0.00        | -0.01 |             |             |      | 0.090 | 0.91 |      ||      |      |   mov al, 1
4550 | 0.00        |       | 0.50   0.50 | 0.50   0.50 |      |       | 1.00 |      ||      |      |   jmp SHORT $LN126@TestBody
4551 |             |       |             |             |      |       |      |      ||      |      |   $LN125@TestBody:
4552 | 0.00        | 0.000 |             |             |      | -0.01 | 1.00 |      ||      |      |   xor eax, eax
4553 |             |       |             |             |      |       |      |      ||      |      |   $LN126@TestBody:
4554 |             |       | 0.00        | 0.00        | 1.00 |       |      | 1.00 ||      |      |   mov BYTE PTR OSACA_computed_condition$18[rsp], al
4555 |             |       |             |             |      |       |      |      ||      |      |   ; Line 384
4556 | 1.00        |       |             |             |      |       |      |      ||      |      |   ucomisd xmm9, xmm9
4557 |             |       |             |             |      |       |      |      ||      |      | * jnp SHORT $LN11@TestBody
4558 | 0.00        | -0.01 |             |             |      | 0.000 | 1.00 |      ||      |      |   mov al, 1
4559 | 0.00        |       | 0.50   0.50 | 0.50   0.50 |      |       | 1.00 |      ||      |      |   jmp SHORT $LN12@TestBody
4560 |             |       |             |             |      |       |      |      ||      |      |   $LN11@TestBody:
4561 | 0.00        | 0.000 |             |             |      | -0.01 | 1.00 |      ||      |      |   xor eax, eax
4562 |             |       |             |             |      |       |      |      ||      |      |   $LN12@TestBody:
4563 |             |       | 0.00        | 0.00        | 1.00 |       |      | 1.00 ||      |      |   mov BYTE PTR OSACA_computed_condition$23[rsp], al
4564 |             |       |             |             |      |       |      |      ||      |      |   ; Line 386
4565 |             |       | 0.00        | 0.00        | 1.00 |       |      | 1.00 ||      |      |   mov BYTE PTR OSACA_computed_condition$22[rsp], 0
4566 |             |       |             |             |      |       |      |      ||      |      |   ; Line 389
4567 | 0.00        | -0.01 |             |             |      | 0.000 | 1.00 |      ||      |      |   test cl, cl
4568 |             |       | 0.50   0.50 | 0.50   0.50 |      |       |      |      ||      |      |   movsd xmm1, QWORD PTR __xmm@7fffffffffffffff7fffffffffffffff
4569 |             |       |             |             |      |       |      |      ||      |      | * je $OSACA_loop$215
4570 |             |       | 0.00        | 0.00        | 1.00 |       |      | 1.00 ||      |  0.0 |   movsd QWORD PTR OSACA_result$26[rsp], xmm9

       12.3          12.33   10.0   10.0   10.0   10.0   9.00   12.33   10.9   9.00    63.0   50.0

@eggrobin eggrobin changed the title Macros for OSACA analysis the latency of specific paths of functions Macros for OSACA analysis of the latency of specific paths of functions Dec 31, 2024
numerics/numerics.vcxproj Outdated Show resolved Hide resolved
numerics/sin_cos.cpp Outdated Show resolved Hide resolved
numerics/sin_cos.cpp Outdated Show resolved Hide resolved
numerics/sin_cos.cpp Outdated Show resolved Hide resolved
numerics/sin_cos.cpp Outdated Show resolved Hide resolved
numerics/sin_cos.cpp Show resolved Hide resolved
numerics/sin_cos.cpp Show resolved Hide resolved
@eggrobin eggrobin changed the title Macros for OSACA analysis of the latency of specific paths of functions Macros for OSACA (and other *CA) analysis of the latency of specific paths of functions Dec 31, 2024
numerics/sin_cos.cpp Outdated Show resolved Hide resolved
numerics/sin_cos.cpp Outdated Show resolved Hide resolved
numerics/sin_cos.cpp Outdated Show resolved Hide resolved
numerics/sin_cos.cpp Outdated Show resolved Hide resolved
numerics/sin_cos.cpp Outdated Show resolved Hide resolved
@pleroy pleroy added the LGTM label Dec 31, 2024
@eggrobin eggrobin merged commit dd286a1 into mockingbirdnest:master Dec 31, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants