Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrapper type for memory address #241

Open
nazar-pc opened this issue Dec 19, 2024 · 6 comments
Open

Wrapper type for memory address #241

nazar-pc opened this issue Dec 19, 2024 · 6 comments

Comments

@nazar-pc
Copy link

Right now memory addresses are represented by u32 (though maybe they shouldn't be), but not all addresses are actually usable.

It might be beneficial to create a new type that wraps the address value and enforces some invariants on it, like those mentioned in description of some functions:

/// Fills the given memory region with zeros.
///
/// `address` must be greater or equal to 0x10000 and `address + length` cannot be greater than 0x100000000.
/// If `length` is zero then this call has no effect and will always succeed.
///
/// When dynamic paging is enabled calling this can be used to resolve a segfault. It can also
/// be used to preemptively initialize pages for which no segfault is currently triggered.
pub fn zero_memory(&mut self, address: u32, length: u32) -> Result<(), MemoryAccessError> {

For example sbrk can return a proper address instance to begin with. Not entirely sure about all sources, but maybe it'll be possible to forbid direct casting of u32 to address, helping to make invalid addresses impossible to construct.

Similarly new type can be used to attach constants, again, like those shown above.

Essentially an type similar to usize but for VM's memory. Similarly length in many cases has similar constraints and could use a new type like isize.

If machine supports 64-bit addresses then the instance can be parametrized by const BITS: u8 and return/accept different address types.

@koute
Copy link
Collaborator

koute commented Jan 1, 2025

Almost any address can be valid - essentially the only completely forbidden addresses are in the lower 64KB of address space; probably not much point in introducing extra friction and making a newtype just to enforce that.

If machine supports 64-bit addresses then the instance can be parametrized by const BITS: u8 and return/accept different address types.

The address space is always 32-bit, even if the VM instance is 64-bit (in which case the upper 32-bits are simply ignored).

(This is done for efficiency, as we have little need for huge amounts of memory, and capping the address space to 32-bit makes memory accesses from within the guest program more efficient.)

@nazar-pc
Copy link
Author

nazar-pc commented Jan 1, 2025

But the pointers are still 64-bit inside VM itself, aren't they?

@koute
Copy link
Collaborator

koute commented Jan 1, 2025

Yes, technically they are because the registers are 64-bit, but the upper 32-bits are completely ignored when making memory accesses.

@nazar-pc
Copy link
Author

nazar-pc commented Jan 1, 2025

That is still a bit confusing because when I allocate a few data structures in VM's memory before calling a function, I still have to convert those addresses VM gives me from u32 to u64before calling the function, so it'd be less confusing if API was consistent and producedu64` to begin with.

And it would make code base more type safe if there was a new type instead of anonymous u32 even if it stays the way it is right now.

@athei
Copy link
Member

athei commented Jan 2, 2025

Yes, technically they are because the registers are 64-bit, but the upper 32-bits are completely ignored when making memory accesses.

Funny anecdote regarding this. The RISC-V ABI demands that even unsigned integers are sign extended in order to fit into a register for calling a function (if they are of smaller size than a register). So if declare you a host function argument as uint32 and expect that it is zero extended to fit into the register you are in for a surprise. @aman4150 and @xermicus had a nice little debugging session to find this out.

So tl;dr: It is important to actually ignore the higher bits of a pointer and don't assume they are zero.

@nazar-pc
Copy link
Author

nazar-pc commented Jan 2, 2025

My plan was for host to allocate a data in a VM that looks roughly like this:

struct Args {
    ptr_1: *mut u8,
    size_1: u32,
    ptr_2: *mut u8,
    size_2: u32,
}

And then call a function that looks like this:

#[no_mangle]
pub unsafe extern "C" fn foo(args: *mut Args) {
    //
}

And after return to inspect values behind those prt_*. Since pointers need to be 64-bit in this case, looks like I'll need to do sign extension from u32 returned by sbrk() manually?
That would be a substantial pitfall, I'd rather get u64 (or better yet #[transparent] struct Address64(u64)) out of it that I can use as an address directly without thinking about it too much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants