UCP/PROTO: UCX_PROTO_INFO=used option #10395

iyastreb · 2024-12-19T15:03:45Z

What?

Currently when UCX_PROTO_INFO is enabled, it shows tons of output with details of all the selected protocols, most of which is never used by the application. The idea is to introduce a new option to the config variable ("used"), to limit display to only actually used protocols.
There is already a wildcard option may improve the output, but still it's not enough, because you must know in advance what to look for. On the contrary, displaying used protocols is showing the exact choice made by UCX without guesses, and it also shows the amount of selections per threshold

Why?

Easy of debugging and diagnostics

How?

Sets progress callback with auto revertible progress_wrapper, which means wrapper is deactivated after N invocations (N=255 for now)
Removes one extra branch from the fast path
Uses counter for tracking number of selections. When counter reaches maximum (1 for short, 255 for others) - adds "+" in the output
Works in debug and release modes

   +---------------------------------+--------------------------------------------------------------------------------------------+
   | ucp_context_46 intra-node cfg#0 | active message by ucp_am_send* from host memory                                            |
   +---------------------------------+-------------------------------------------+------------------------------------------------+
   | 1+                      0..2038 | short                                     | rc_mlx5/mlx5_0:1/path0                         |
   | 0                    2039..8246 | copy-in                                   | rc_mlx5/mlx5_0:1/path0                         |
   | 10                   8247..9109 | multi-frag copy-in                        | rc_mlx5/mlx5_0:1 50% on path0 and 50% on path1 |
   | 0                   9110..22553 | multi-frag zero-copy                      | rc_mlx5/mlx5_0:1 50% on path0 and 50% on path1 |
   | 255+                 22554..inf | (?) rendezvous zero-copy read from remote | rc_mlx5/mlx5_0:1 50% on path0 and 50% on path1 |
   +---------------------------------+-------------------------------------------+------------------------------------------------+

src/ucp/proto/proto_select.h

rakhmets · 2024-12-19T17:27:06Z

src/ucp/proto/proto_debug.c

@@ -241,7 +278,8 @@ ucp_proto_select_elem_info(ucp_worker_h worker,
    /* Print contents */
    ucp_proto_table_row_separator(strb, col_width, 3);
    ucs_array_for_each(row_elem, &table) {
-        ucs_string_buffer_appendf(strb, info_row_fmt, col_width[0],
+        ucs_string_buffer_appendf(strb, info_row_fmt, row_elem->counter_str,


I think the meaning of these numbers will not be obvious to users. Please add the value to a separate column with the name.

I followed the existing design - as you can see there are no really named columns, e.g. for message range. The column name is actually just the config name. So if I add a new named column just for selections count - that would look ridiculous IMO

Perhaps, we should add a row with column names: size range / protocol / resources / use count.

I'm not sure about the added value of this change.
I mean, to me it seems this is a debug feature meant to be used mostly by UCX developers who know what to look for. This probably explains why these column names were not introduced so far - there is no much benefit I guess.

On the other hand - it would also increase the output size - and this is exactly the issue I'm trying to solve with this commit.

src/ucp/proto/proto.h

brminich · 2025-01-07T10:12:35Z

src/ucp/proto/proto.h

@@ -141,6 +141,9 @@ typedef struct {

    /* Map of used lanes */
    ucp_lane_map_t lane_map;
+
+    /* Selections count */
+    size_t         selections;


maybe unsigned is enough?

brminich · 2025-01-07T10:20:06Z

src/ucp/proto/proto_select.inl

@@ -108,7 +109,11 @@ ucp_proto_select_lookup(ucp_worker_h worker, ucp_proto_select_t *proto_select,
        proto_select->cache.value = select_elem;
    }

-    return ucp_proto_select_thresholds_search(select_elem, msg_length);
+    thresh = ucp_proto_select_thresholds_search(select_elem, msg_length);
+#ifdef ENABLE_DEBUG_DATA


why do you need it? Does it mean that used option will not work in the release mode?

Yep, it won't work in release mode
The only reason to have ENABLE_DEBUG_DATA is to save memory and a bit of CPU cycles in the hot path (this increment).
I don't have strong opinion on whether it should work in release mode, I thought it's mostly used by UCX engineers, so can always be built in debug mode. Please share your thoughts

imo, we need it in release mode. We ask customers to provide this output quite often. Initial protocol selection is not a fast-path, or am i missing something?

This feature is not about initial protocol selection, it counts all the protocol selections during send execution, so in fast-path. So we add an increment operation to the fast-path, which is not a big deal

i see would it be possible to to increase only on slow path? Then you would not need a counter, but rather a boolean indicating whether the protocol was selected at least once

Sure, it is possible, but it would reduce the added value IMO.
One of the ideas that we discussed on the meeting was the ability to show protocols whose selection count is above some limit. This would be useful to not display stats for auxilliary / init messages.

But look, maybe we can use a combined approach:

We use counter, not boolean, memory overhead is the same

When ENABLE_DEBUG_DATA is set, we collect precise stats (so does not harm release)

When in release mode, we resort to storing 1 in this counter in the slow path (so we have some stats in release)

hmm, unfortunately it's not that easy with Release mode..
The problem is that short protocol initialization triggers protocol selection with message size 0:
https://github.com/openucx/ucx/pull/10395/files#diff-b729ff9730727dbf69aa2e2f32eab876f0d1f68803a2b7714a911fe056909004R730
ucp_proto_select_short_init -> ucp_proto_select_lookup(..., 0)

This is a problem, because we don't want to take into account this "probing" selection. If we count it, then we basically display all the protocols, not actually selected ones. If we don't count it - then we can't detect selection of short protocols, because ucp_proto_select_lookup_slow will not be called anymore..
In DEBUG mode I just revert this probing selection:

#ifdef ENABLE_DEBUG_DATA /* Revert stats counter for probe operation */ --((ucp_proto_threshold_elem_t *)thresh)->proto_config.selections; #endif

But I don't see a good solution for Release mode...
So now I'm thinking that we should probably just enable this feature for all modes (without checking ENABLE_DEBUG_DATA)

yosefe · 2025-01-12T12:43:01Z

src/ucp/core/ucp_worker.c

@@ -2878,6 +2893,7 @@ static void ucp_worker_destroy_eps(ucp_worker_h worker,
 void ucp_worker_destroy(ucp_worker_h worker)
 {
    ucs_debug("destroy worker %p", worker);
+    ucp_worker_trace_configs(worker);


seems we display the data only during working destroy, but what if we have a long running application and we want to show the info without waiting for app exit?

This scenario is not really covered by the initial impl, as it does not seem to be a mainstream use case.
Here are 2 ideas what we can do:

Display all the protocols as before with UCX_PRORO_INFO=y option

Implement periodic timer (with configurable interval, 10s default) in worker using timerfd_create and timerfd_settime, that would trigger ucp_worker_trace_configs.

Personally I believe it's not really worth the effort. But if you think it's useful, can be done maybe in the next PR.

can we print the info after the count reaches a certain threshold?

In short - I think that may not be always reliable, and requires operations on fast-path
Currently the idea is that we support this feature in 2 modes:

Debug mode (ENABLE_DEBUG_DATA) - when we get precise statistics. Requires adding increment operation to the fast path.

Release mode - do not use counter, but only detect used protocols in the slow path. This way we don't introduce delays to the fast-path.
Btw, what's your opinion on that? Should have have these 2 modes of operation, or we just enable this feature always?

Basically in release mode the counter is never incremented above 1, so we cannot rely on this.
And in debug mode this would require to introduce one more if branch to the fast-path.

IMO we need to try making the feature available also in release mode without adding overhead when it is not enabled

Ok, I implemented it with absolutely zero overhead when the feature is disabled.
It required some additions to the logger, hopefully you like it.
If not, we can remove logger event listener and have one extra if branch in the fast-path.
Please let me know

Thanks to @brminich , I removed this logger stuff, and found the right place for the increment.
Now it's really zero overhead when feature is disabled

iyastreb · 2025-01-17T07:22:13Z

/azp run UCX PR

azure-pipelines · 2025-01-17T07:22:27Z

Azure Pipelines successfully started running 1 pipeline(s).

brminich · 2025-01-17T17:20:56Z

src/ucp/core/ucp_request.c

@@ -753,6 +754,10 @@ ucs_status_t ucp_request_progress_wrapper(uct_pending_req_t *self)
        ucp_trace_req(req, "progress protocol %s returned: %s lane %d",
                      proto->name, ucs_status_string(status), req->send.lane);
    } else {
+        if (req->send.proto_stage == UCP_PROTO_STAGE_START) {
+            ++ conf->selections;


Suggested change

++ conf->selections;

++conf->selections;

does it mean asking for PROTO_INFO=used would mean always using wrapper progress? (i.e. hurting perf). Also you are adding extra branch for progress invocation with tracker usage enabled

Yes. As for the tracker - that is a just a matter of single if condition.
Here we just piggyback on existing branch in hot-path, so that this little feature does not introduce any delay to the fast path (when PROTO_INFO=used is disabled). When it's enabled I think we can tolerate this tiny overhead - as this is not a fast-path anymore. It would be hard to add new functionality without any extra CPU cycles

brminich · 2025-01-17T17:26:51Z

src/ucp/proto/proto_debug.c

    if (ucs_config_sscanf_bool(proto_info_config, &bool_value, NULL)) {
        return bool_value;
    }

    return fnmatch(proto_info_config, select_param_str, FNM_CASEFOLD) == 0;
 }

+static inline unsigned


maybe make it boolean and breal immideately once some selection is found?

brminich · 2025-01-17T17:28:29Z

src/ucp/proto/proto_debug.c

+                              sizeof(row_elem->counter_str), "%u  ",
+                              proto_attr.selections);
+        } else {
+            row_elem->counter_str[0] = '\0';


maybe print 0 counter (if it is even possible ho hit this code in this case)

It's easily doable, I thought about this, but IMO having zeroes in the output does not improve readability:

+---------------------------------+----------------------------------------------------------------------------------+ | ucp_context_47 intra-node cfg#0 | rendezvous data fetch into host memory from host | +---------------------------------+---------------------------------+------------------------------------------------+ | 0 0 | no data fetch | | | 1000 1..9109 | (?) fragmented copy-in copy-out | rc_mlx5/mlx5_0:1 50% on path0 and 50% on path1 | | 0 9110..inf | (?) zero-copy | rc_mlx5/mlx5_0:1 50% on path0 and 50% on path1 | +---------------------------------+---------------------------------+------------------------------------------------+

imo it is better, than showing the counter only for some rows, but not others

ok, will be done

iyastreb · 2025-01-21T08:08:39Z

@yosefe Answering your question about selection count for short protocol.
I confirm that with this progress interception we do see properly the stats for short messages:

   +---------------------------------+---------------------------------------------------------------------------------------------+
   | ucp_context_46 intra-node cfg#0 | active message by ucp_am_send* from host memory                                             |
   +---------------------------------+--------------------------------------------+------------------------------------------------+
   | 10                       0..127 | short                                      | rc_mlx5/mlx5_0:1/path0                         |
   | 0                     128..9109 | (?) rendezvous fragmented copy-in copy-out | rc_mlx5/mlx5_0:1 50% on path0 and 50% on path1 |
   | 0                     9110..inf | (?) rendezvous zero-copy                   | rc_mlx5/mlx5_0:1 50% on path0 and 50% on path1 |
   +---------------------------------+--------------------------------------------+------------------------------------------------+

yosefe · 2025-01-21T08:28:11Z

@yosefe Answering your question about selection count for short protocol. I confirm that with this progress interception we do see properly the stats for short messages:

   +---------------------------------+---------------------------------------------------------------------------------------------+
   | ucp_context_46 intra-node cfg#0 | active message by ucp_am_send* from host memory                                             |
   +---------------------------------+--------------------------------------------+------------------------------------------------+
   | 10                       0..127 | short                                      | rc_mlx5/mlx5_0:1/path0                         |
   | 0                     128..9109 | (?) rendezvous fragmented copy-in copy-out | rc_mlx5/mlx5_0:1 50% on path0 and 50% on path1 |
   | 0                     9110..inf | (?) rendezvous zero-copy                   | rc_mlx5/mlx5_0:1 50% on path0 and 50% on path1 |
   +---------------------------------+--------------------------------------------+------------------------------------------------+

Do we still see it of only fast-path short messages are sent?

iyastreb · 2025-01-21T09:39:27Z

Do we still see it of only fast-path short messages are sent?

Ok, it seems fast-path short messages = those being sent with ucp_am_try_send_short?
First, I see that this mode is never being selected, because am_u.max_eager_short is always {-1, -1}.
I see that initially it's properly configured:

ucp_ep_config_set_memtype_thresh (max_short=2038)
ucp_ep_config_init ucp_ep.c:2882
ucp_worker_get_ep_config /ucp_worker.c:2115
ucp_wireup_init_lanes wireup.c:1719

But then immediately gets reset to {-1, -1}:

max_eager_short = {-1, -1}
ucp_ep_select_short_init ucp_ep.c:3885
ucp_ep_config_proto_init ucp_ep.c:3919
ucp_ep_set_cfg_index ucp_ep.c:3938
ucp_wireup_init_lanes wireup.c:1757

I believe this is a bug introduced in my PR #9863, where we changed the initialization order. Unfortunately existing unit tests (e.g. test_short_thresh) do not catch this error.

If I revert init order, then I see short messages are being properly sent over ucp_am_try_send_short, and in this case the counter is always 1, no matter how much messages are transmitted. This is better than nothing, at least we see that protocol is used

yosefe · 2025-01-21T09:51:56Z

I believe this is a bug introduced in my PR #9863, where we changed the initialization order. Unfortunately existing unit tests (e.g. test_short_thresh) do not catch this error.

If I revert init order, then I see short messages are being properly sent over ucp_am_try_send_short, and in this case the counter is always 1, no matter how much messages are transmitted. This is better than nothing, at least we see that protocol is used

First let's fix this in a separate PR since it affects performance (and probably need to backport to v1.18.x)
Also, can we disable the progress wrapper after the information was printed?

brminich · 2025-01-21T09:59:52Z

I believe this is a bug introduced in my PR #9863, where we changed the initialization order. Unfortunately existing unit tests (e.g. test_short_thresh) do not catch this error.
If I revert init order, then I see short messages are being properly sent over ucp_am_try_send_short, and in this case the counter is always 1, no matter how much messages are transmitted. This is better than nothing, at least we see that protocol is used

First let's fix this in a separate PR since it affects performance (and probably need to backport to v1.18.x) Also, can we disable the progress wrapper after the information was printed?

but how can we know that there will be no more selections? Seems like it is correct thing to do to print info during worker destroy when we count selection numbers

iyastreb · 2025-01-21T10:35:19Z

First let's fix this in a separate PR since it affects performance (and probably need to backport to v1.18.x) Also, can we disable the progress wrapper after the information was printed?

I'm sorry for confusion, there is no bug actually)

This feature piggybacks on existing feature for EP usage tracker, and it comes with a side effect I was not aware of:
https://github.com/openucx/ucx/blob/master/src/ucp/proto/proto_select.c#L699

Basically it disables fast-path short protocol, it seems for the same purpose - to properly count stats.
For this particular use case it's actually a desired side-effect, because due to this we are able to keep track short protocol selections.

iyastreb · 2025-01-23T06:34:40Z

/azp run UCX PR

azure-pipelines · 2025-01-23T06:34:52Z

Azure Pipelines successfully started running 1 pipeline(s).

UCP/PROTO: UCX_PROTO_INFO=used option

2723acc

iyastreb requested review from yosefe, brminich and rakhmets December 19, 2024 15:04

rakhmets reviewed Dec 19, 2024

View reviewed changes

UCP/PROTO: Addressed PR comments

a1ead26

brminich reviewed Jan 7, 2025

View reviewed changes

UCP/PROTO: selections size_t -> unsigned

69a81e0

yosefe reviewed Jan 12, 2025

View reviewed changes

iyastreb added 2 commits January 14, 2025 16:00

UCP/PROTO: Zero overhead when feature is disabled

c22e3fb

UCP/PROTO: Thread-safe log event handler

3ea1f16

iyastreb added 3 commits January 17, 2025 09:56

UCP/PROTO: Reverted logger changes as non-efficient

4212b9d

UCP/PROTO: Increment selections in ucp_request_progress_wrapper

d7ecd6a

UCP/PROTO: Cosmetic changes

a296bd3

brminich reviewed Jan 17, 2025

View reviewed changes

UCP/PROTO: Addressed PR comments

807fe74

iyastreb added 5 commits January 22, 2025 10:22

UCP/PROTO: Auto-removable progress_wrapper

f90bea9

UCP/PROTO: Minor change

ef0e249

UCP/PROTO: Fixed tests with progress_wrapper

3ef05f8

UCP/PROTO: Fixed rndv tests

d1c33ca

UCP/PROTO: Fixed coverity warning

778101f

UCP/PROTO: UCX_PROTO_INFO=used option #10395

Are you sure you want to change the base?

UCP/PROTO: UCX_PROTO_INFO=used option #10395

Conversation

iyastreb commented Dec 19, 2024 • edited Loading

What?

Why?

How?

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

iyastreb commented Jan 17, 2025

azure-pipelines bot commented Jan 17, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

iyastreb commented Jan 21, 2025

yosefe commented Jan 21, 2025

iyastreb commented Jan 21, 2025

yosefe commented Jan 21, 2025

brminich commented Jan 21, 2025

iyastreb commented Jan 21, 2025

iyastreb commented Jan 23, 2025

azure-pipelines bot commented Jan 23, 2025

iyastreb commented Dec 19, 2024 •

edited

Loading