Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UCT: Memory window #10332

Open
wants to merge 11 commits into
base: master
Choose a base branch
from
Open

UCT: Memory window #10332

wants to merge 11 commits into from

Conversation

iyastreb
Copy link
Contributor

@iyastreb iyastreb commented Nov 26, 2024

What?

This PR is a composite part of the new memory invalidation design based on the concept of memory windows.
Here we introduce changes on the UCT side.
On UCT side memory window represents a shallow copy of some existing UCT memory handle, which can be used to access the same memory region. When created, the memory window inherits the original memh access flags and state, and takes ownership of the indirect keys of the original memory handle. The lifetime of the memory window is bound to the original memh, and the original memh cannot be destroyed until all its memory windows are destroyed.

Design doc

Why?

The overall idea is to employ a lightweight invalidation: when failure happens we invalidate only the exposed indirect rkeys, but don't deregister the entire memory region, which is an expensive operation. Lightweight invalidation enables invalidation workflow for RNDV pipeline protocol.

How?

  1. Introduced a new option for uct_md_mem_reg_params_t: UCT_MD_MEM_WINDOW. Existing mem_reg call will create a memory window based on existing UCT memh
  2. When memory window is requested, create a shallow copy of base UCT memh and take the ownership of its indirect memory keys.

* original memh cannot be destroyed until all its memory windows are
* destroyed.
*/
UCT_MD_MEM_WINDOW = UCS_BIT(12),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe call it

Suggested change
UCT_MD_MEM_WINDOW = UCS_BIT(12),
UCT_MD_MEM_DERIVED = UCS_BIT(12),

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

* Register a memory window: a shallow copy of some existing UCT memory
* handle, which can be used to access the same memory region. When created,
* the memory window inherits the original memh access flags and state, and
* takes ownership of the indirect keys of the original memory handle. The
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indirect key is IB specific we should not mention it in UCT API

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok


/**
* Represents a pointer to the existing memory handle.
* Used to create a memory window is (with flag @ref UCT_MD_MEM_WINDOW)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Used to create a memory window is (with flag @ref UCT_MD_MEM_WINDOW)
* Used to create a derived memory handle

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

src/uct/api/v2/uct_v2.h Show resolved Hide resolved
Comment on lines 874 to 875
ucs_assertv(!(base->super.flags & UCT_IB_MEM_FLAG_MEM_WINDOW),
"memh=%p is already a memory window", base);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should mention this limitation in the API

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Comment on lines 886 to 889
base->atomic_dvmr = NULL;
base->atomic_rkey = UCT_IB_INVALID_MKEY;
base->indirect_dvmr = NULL;
base->indirect_rkey = UCT_IB_INVALID_MKEY;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to invalidate basic memh keys? I'd expect that this will break (at least performance) of some (at least atomics) operations on the base key

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should not hinder performance, because base memh rkeys will be recreated on next mkey_pack, like is done in my test.
But you are right, strictly speaking it's NOT necessary to transfer rkeys ownership to the derived memh. It will work even without this transfer. I will remove this ownership transfer logic

The reason why I added this transfer is the following:
Currently it's UCT layer that decides whether to create rkeys (atomic + indirect), and conditions for them are different:

  • atomic rkey is created if user requested it (with UCT_MD_MEM_ACCESS_REMOTE_ATOMIC), and hardware supports it. So it might be created no matter if invalidation is needed
  • indirect rkey is created only when invalidation is needed
    Now from UCP layer we assume that we will create a memory window when invalidation if needed from EP config. By this time atomic rkeys on base memh might be already created and I thought it's a good idea to move it to the memory window, to avoid re-creating this key per MW.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed ownership transfer for now, let's consider this later


/* Test case 4: base memh can still be used to pack mkeys */
std::vector<uint8_t> base_rkey2 = mkey_pack(base, flags);
EXPECT_NE(base_rkey1, base_rkey2);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i do not think it is correct. base memh should not be modified once derived memh is created

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, fixed this

uct_mem_h mw1 = reg_mem_window(base);

/* Test case 2: creating MW from memh after mkey_pack */
std::vector<uint8_t> base_rkey1 = mkey_pack(base, flags);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need invalidation on first mkey pack of base?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need test with packing without invalidation first?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In terms of UCT invalidation would mean that we destroy derived memh. Currently I don't have tests with invalidation, actually it's a good point to add them.
Or I misunderstood your comment

@iyastreb iyastreb mentioned this pull request Jan 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants