Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Trace Callstack processing out of order #113

Open
briancoutinho opened this issue Mar 14, 2024 · 0 comments
Open

New Trace Callstack processing out of order #113

briancoutinho opened this issue Mar 14, 2024 · 0 comments
Assignees
Labels
bug Something isn't working needs triage

Comments

@briancoutinho
Copy link
Contributor

🐛 Describe the bug

While doing critical path analysis- that builds a DAG of operations in the trace - I am ending up with cycles in the DAG.
This is how CPA works-

  1. Parse trace call stack.
  2. Create a node for start and node for end of every CPU operator
  3. Connect the nodes as seen in stack order

See https://hta.readthedocs.io/en/latest/source/features/lightweight_critical_path_analysis.html#cpu-operator-nesting-and-dependencies

I saw that the call stack group was providing stacks out of order. See how index 102534 comes in but it is actually older node

2024-03-14 01:11:14,541 - hta - critical_path_analysis.py:L401 - INFO - ===Exiting node TBackward0, id = 127107.0
2024-03-14 01:11:14,541 - hta - critical_path_analysis.py:L167 - INFO - Adding an edge between nodes 156119 -> 156117 type = CPEdgeType.OPERATOR_KERNEL
2024-03-14 01:11:14,542 - hta - critical_path_analysis.py:L401 - INFO - ==Exiting node autograd::engine::evaluate_function: TBackward0, id = 127106
2024-03-14 01:11:14,542 - hta - critical_path_analysis.py:L167 - INFO - Adding an edge between nodes 156117 -> 156115 type = CPEdgeType.OPERATOR_KERNEL
2024-03-14 01:11:14,542 - hta - critical_path_analysis.py:L370 - INFO - ==Entering node autograd::engine::evaluate_function: TBackward0, id = 102534
2024-03-14 01:11:14,542 - hta - critical_path_analysis.py:L167 - INFO - Adding an edge between nodes 156115 -> 156122 type = CPEdgeType.DEPENDENCY

Screenshot 2024-03-14 at 10 17 10 AM

When we run critical path analysis it errors out due to cycles in the DAG

2024-03-13 01:20:39,328 - hta - critical_path_analysis.py:L867 - ERROR - Critical path algorithm failed due to Graph contains a cycle or graph changed during iteration

This is related to new callstack implementation from
#86

When i turned back to older call stack this does not happen, and it passes :)

For more context T182236796

Steps to reproduce

Download trace

manifold getr amoghavs/tree/overlaid_traces/fb_fm_330x/

We can run this script

from hta.configs.config import setup_logger
setup_logger()

from hta.trace_analysis import TraceAnalysis

trace_dir = "/Users/bcoutinho/Work/hta/critical_path/amogha_debug/fb_fm_330x"
analyzer = TraceAnalysis(trace_dir=trace_dir)

annotation = "ProfilerStep"
instance_id = 2
rank = 35


cp_graph, success = analyzer.critical_path_analysis(
    rank = rank, annotation=annotation, instance_id=instance_id)
print("success = ", success)

# dump overlaid trace
analyzer.overlay_critical_path_analysis(
    rank, cp_graph, only_show_critical_events=False, output_dir=trace_dir + '/overlaid', show_all_edges=True)

Expected behavior

Critical path analysis should pass.
For timebeing I am working around this by using older call stack implmentation.

Environment

OS Mac
Python 3.18
HTA version 4222b7b

Additional Info

No response

@briancoutinho briancoutinho added bug Something isn't working needs triage labels Mar 14, 2024
briancoutinho added a commit to briancoutinho/HolisticTraceAnalysis that referenced this issue Mar 15, 2024
Summary:
This is a quick fix for the issue where were seeing cycles in the critical path
facebookresearch#113

Differential Revision: D54908634
briancoutinho added a commit to briancoutinho/HolisticTraceAnalysis that referenced this issue Mar 16, 2024
…#114)

Summary:

This is a quick fix for the issue where were seeing cycles in the critical path
facebookresearch#113

Reviewed By: amoghavs

Differential Revision: D54908634
facebook-github-bot pushed a commit that referenced this issue Mar 16, 2024
Summary:
Pull Request resolved: #114

This is a quick fix for the issue where were seeing cycles in the critical path
#113

Reviewed By: amoghavs

Differential Revision: D54908634

fbshipit-source-id: 8cbeffaebbe12ad576c9f4b14df6d5080abe1f5e
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage
Projects
None yet
Development

No branches or pull requests

2 participants