Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

extern types #1861

Merged
merged 19 commits into from
Jul 25, 2017
Merged
Changes from 12 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
128 changes: 128 additions & 0 deletions text/0000-extern-types.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
- Feature Name: extern_types
- Start Date: 2017-01-18
- RFC PR:
- Rust Issue:

# Summary
[summary]: #summary

Add an `extern type` syntax for declaring types which are opaque to Rust's type
system.

# Motivation
[motivation]: #motivation

When interacting with external libraries we often need to be able to handle pointers to data that we don't know the size or layout of.

In C it's possible to declare a type but not define it.
These incomplete types can only be used behind pointers, a compilation error will result if the user tries to use them in such a way that the compiler would need to know their layout.

In Rust, we don't have this feature. Instead, a couple of problematic hacks are used in its place.

One is, we define the type as an uninhabited type. eg.

```rust
enum MyFfiType {}
```

Another is, we define the type with a private field and no methods to construct it.

```rust
struct MyFfiType {
_priv: (),
}
```

The point of both these constructions is to prevent the user from being able to create or deal directly with instances of the type.
Neither of these types accurately reflect the reality of the situation.
The first definition is logically problematic as it defines a type which can never exist.
This means that references to the type can also—logically—never exist and raw pointers to the type are guaranteed to be
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this a concrete problem?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It won't break anything so long as you keep your uninhabited type behind a raw pointer, but as soon as you put it behind a reference you've got a situation which is statically impossible unless you're lying to the type system with unsafe.

This caused a breakage in the standard library implementation when I was implementing some of the uninhabitedness stuff. &Void was being used somewhere as a void * with a lifetime and (I think - my memory's a little vague) the compiler started assuming that a function returning &Void could never return, and so segfaults resulted.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empty types have no value so any code that manipulates such a value is "impossible". As I understand it, the optimizer is allowed to assume that such code is unreachable and that assumption can be "propagated" to branches that lead there. This is also why this code type-checks:

enum Void {}
fn out_of_thin_air(x: Void) -> Box<u32> {
    match x {}
}

invalid.
The second definition says that the type is a ZST, that we can store it on the stack and that we can call `ptr::read`, `mem::size_of` etc. on it.
None of this is of course valid.

The controversies on how to represent foreign types even extend to the standard library too; see the discussion in the [libc_types RFC PR](https://github.com/rust-lang/rfcs/pull/1783).

This RFC instead proposes a way to directly express that a type exists but is unknown to Rust.

Finally, In the 2017 roadmap, [integration with other languages](https://github.com/rust-lang/rfcs/blob/master/text/1774-roadmap-2017.md#integration-with-other-languages), is listed as a priority.
Just like unions, this is an unsafe feature necessary for dealing with legacy code in a correct and understandable manner.

# Detailed design
[design]: #detailed-design

Add a new kind of type declaration, an extern type:

```rust
extern {
type Foo;
}
```

These types are FFI-safe. They are also DSTs, meaning that they do not implement `Sized`. Being DSTs, they cannot be kept on the stack and can only be accessed through pointers.

In Rust, pointers to DSTs carry metadata about the object being pointed to.
For strings and slices this is the length of the buffer, for trait objects this is the object's vtable.
For extern types the metadata is simply `()`.
This means that a pointer to an extern type is identical to a raw pointer.
It also means that if we store an extern type at the end of a container (such as a struct or tuple) pointers to that container will also be identical to raw pointers (despite the container as a whole being unsized).
This is useful to support a pattern found in some C APIs where structs are passed around which have arbitrary data appended to the end of them: eg.

```rust
extern {
type OpaqueTail;
}

#[repr(C)]
struct FfiStruct {
data: u8,
more_data: u32,
tail: OpaqueTail,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear whether you are proposing that this should be accepted by the compiler or not. I think that this is not as essential as the rest of the proposal and I'd suggest that you remove this example and specify that such types can only be used as pointer and reference types.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm saying it should be accepted. This isn't really hard to implement - just make extern types effect the struct pointer the same way that slices and trait objects do. It's also consistent with allowing other DSTs at the end of structs. I think it should be accepted because it's a pattern I've seen a lot in C code.

Copy link
Member

@nagisa nagisa Feb 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This structure seems useless to me. Consider 3 cases where FfiStruct could be used:

fn by_return() -> FfiStruct;
fn by_sret(retptr: *mut FfiStruct);
fn by_argument(arg: FfiStruct);

Now, all of these are invalid or can’t be made work:

  1. For by_return it may under covers be either a return by value or by sret pointer (analysed in next point); If its returned by value its all right. However you cannot really know if its by value or by sret without knowing the full defn' of structure;
  2. For by_sret compiler simply cannot how much of stack space to allocate for the retptr slot;
  3. For by_argument compiler cannot properly how to correctly pass such a structure to the C side (i.e. if the FfiStruct was supposed to be passed to C via registers, how many registers does the C side expect to be used?).

EDIT: only case where this could work is

fn double_indirection(retptr: *mut *mut FfiStruct)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about:

fn return_reference(&T) -> &FfiStruct {} 
fn consume_reference(&FfiStruct) {}

One of the main uses of this type would be to define opaque types with headers of known layouts such that they can be handled and have lifetimes managed as though they are Rust types, despite being defined in C++ land.

As an example, in the gecko codebase there is a type nsAString, which represents an abstract string. This type is an abstract base class which has (approximately) the following layout:

struct nsAString {
  const uint16_t* data;
  uint32_t length;
  uint32_t flags;
};

And has multiple subclasses, which may or may not add extra data after the above data, such as nsFixedString which has a layout like:

struct nsFixedString : public nsAString {
  uint32_t capacity;
  uint16_t* buffer;
};

In rust we can then define nsAString as:

#[repr(C)]
struct nsAString {
    data: *const u16,
    length: u32,
    flags: u32,
    _rest: OpaqueTail
}

And then we could take *mut nsAString, *const nsAString, *const nsFixedString etc. and cast them (through unsafe code) into &'a nsAString, working with them as though they were a rust object, able to directly access members from rust like length and flags, without having to worry about accidentally moving the data inside and breaking C++-defined invariants.

Currently in our rust bindings we're working around this limitation by defining #[repr(C)] struct nsAString([u8;0]); and doing casts to extract the fields from the header. You can see this here: http://searchfox.org/mozilla-central/rev/d3307f19d5dac31d7d36fc206b00b686de82eee4/xpcom/rust/nsstring/src/lib.rs#160-163

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fn return_reference(&T) -> &FfiStruct {} 

This can work, but you’ll need to either know the full type of T or get it on Rust side with *mut *mut T in the first place, basically moving the responsibility from FfiStruct directly to T.

Where’s the point of having a reference to

#[repr(C)]
struct nsAString {
    data: *const u16,
    length: u32,
    flags: u32,
    _rest: OpaqueTail
}

over a reference to

#[repr(C)]
struct nsAString {
    data: *const u16,
    length: u32,
    flags: u32,
}

So still, I’m not seeing the point of allowing such a thing, given in how many cases this cannot reasonably work in a FFI context.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah the current hacks look weird, and poorly approximate the true semantics to boot as @mystor shows.

@nagisa have you looked at the custom DST RFC? IMO thinking about then together helps if this seems too narrow on its own.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nagisa
by_sret is intended to work. extern type pointers are "fat" pointers in the sense that they point to a DST but they're still pointer-sized (the extra metadata on the pointer is just a ()). So *mut FfiStruct is pointer-sized and ffi-safe.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@canndrew How? Namely, please describe how much memory would the caller need to allocate on the stack so it could produce a valid pointer to pass to the function?

@Ericson2314 yes, I’ve seen it.

Copy link

@eternaleye eternaleye Feb 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nagisa: The entire point of this RFC is that no Rust code can possibly create such a value on the stack - it can only obtain pointers to such values, and only from foreign (or unsafe pointer-casting) code.

It is meant almost exactly to mimic the C/C++ notion of an "incomplete type" - which cannot be allocated on the stack in C/C++ either, but can be pointed/referred to.

Copy link
Contributor Author

@canndrew canndrew Feb 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nagisa 32 bits on a 32 bit system, 64 bits on a 64 bit system? I'm not sure I understand the question.

Edit: Oh I get it. Yes, what @eternaleye said.

}
```

As a DST, `size_of` and `align_of` do not work, but we must also be careful that `size_of_val` and `align_of_val` do not work either, as there is not necessarily a way at run-time to get the size of extern types either.
For an initial implementation, those methods can just panic, but before this is stabilized there should be some trait bound or similar on them that prevents their use statically.
The exact mechanism is more the domain of the custom DST RFC, [RFC 1524](https://github.com/rust-lang/rfcs/pull/1524), and so figuring that mechanism out will be delegated to it.

C's "pointer `void`" (not `()`, but the `void` used in `void*` and similar) is currently defined in two official places: [`std::os::raw::c_void`](https://doc.rust-lang.org/stable/std/os/raw/enum.c_void.html) and [`libc::c_void`](https://doc.rust-lang.org/libc/x86_64-unknown-linux-gnu/libc/enum.c_void.html).
Unifying these is out of scope for this RFC, but this feature should be used in their definition instead of the current tricks.
Strictly speaking, this is a breaking change, but the `std` docs explicitly say that `void` shouldn't be used without indirection.
And `libc` can, in the worst-case, make a breaking change.

# How We Teach This
[how-we-teach-this]: #how-we-teach-this

Really, the question is "how do we teach *without* this".
As described above, the current tricks for doing this are wrong.
Furthermore, they are quite advanced touching upon many advanced corners of the language: zero-sized and uninhabited types are phenomena few programmer coming from mainstream languages have considered.
From reading around other RFCs, issues, and internal threads, one gets a sense of two issues:
First, even among the group Rust programmers enthusiastic enough to participate in these fora, the semantics of foreign types are not widely understood.
Send, there is annoyance that none of the current tricks, by nature of them all being flawed in different ways, would become standard.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean "Second" instead of "Send"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.


By contrast, `extern type` does exactly what one wants, with an obvious and guessable syntax, without forcing the user to immediately understand all the nuance about why *these* semantics are indeed the right ones.
As they see various options fail: moves, stack variables, they can discover these semantics incrementally.
The benefits are such that this would soon displace the current hacks, making code in the wild more readable through consistent use of a pattern.

This should be taught in the foreign function interface chapter of the rust book in place of where it currently tells people to use uninhabited enums (ack!).

# Drawbacks
[drawbacks]: #drawbacks

Very slight addition of complexity to the language.

# Alternatives
[alternatives]: #alternatives

Not do this.

# Unresolved questions
[unresolved]: #unresolved-questions

- Should we allow generic lifetime and type parameters on extern types?
If so, how do they effect the type in terms of variance?

- [In std's source](https://github.com/rust-lang/rust/blob/164619a8cfe6d376d25bd3a6a9a5f2856c8de64d/src/libstd/os/raw.rs#L59-L64), it is mentioned that LLVM expects `i8*` for C's `void*`.
We'd need to continue to hack this for the two `c_void`s in std and libc.
But perhaps this should be done across-the-board for all extern types?
Somebody should check what Clang does.