Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wasi: use File.Poll for all blocking FDs in poll_oneoff #1606

Closed
wants to merge 6 commits into from

Conversation

evacchi
Copy link
Contributor

@evacchi evacchi commented Jul 31, 2023

This includes stdin, pipes but also sockets.

  • This also updates RATIONALE.md.
  • It also fixes some issues in the fsapi.File wrappers for sockets on POSIX and Windows (esp. the nonblocking flag was set correctly at all times)
  • I think with this we could also close support non-blocking files #1500.

Further refinements are possible (e.g. supporting poll for reading), but most of the work is ok now.

Rationale

Instead of special-casing for stdin, we can now use File.Poll() on basically every file. I am limiting to those that report blocking = true, because otherwise Read() is expected... not to block (i.e. it is allowed to return EAGAIN).

Notice that, previously, we knew we only handled stdin, so we always invoked Poll only on that one File.

  • This uncovered another interesting issue: in some cases, the event struct could be written at a wrong offset, because we precomputed the value. If for some reason one of the events was not written back, then we would leave an empty gap; this was hard to notice because of the special treatment reserved only for blocking stdin, and because in many cases we did not test more than one fd at a time (we did not simulate an unreported event).
  • The other issue is that, with a nonzero timeout, the poll syscall blocks an OS thread until it returns.

Now, we iterate on each File that has reported as "blocking" to invoke its method Poll(). However:

  • we no longer issue a blocking call to sysfs.poll() with the given timeout.
  • instead, we emulate time.After() and time.Tick() but we use sys.Nanosleep()
  • we repeatedly call sysfs.poll() with a zero timeout, similarly to how this is handled on the Windows side for WSAPoll, every time the tick goes off until the timeout is reached
  • This allows us to honor the given sys.Nanosleep() config instead of relying on poll regardless of the config settings (in fact, some test cases were broken because they did not configure sys.Nanosleep() properly if at all.

Alternative ways to do this would be:

  • scatter/gather by spawning a goroutine for each File.Poll() with a given timeout; however, if the given timeout does leverage sysfs.poll() this in turns issues a blocking call, taking over the underlying OS thread until the timeout is reached (which eventually may consume all resources).
  • call poll({fd_1, ..., fd_n}) which however, as already discussed in RATIONALE.md, is not an abstraction at the right level.

In fact, for our intents, we may also consider exporting File.Poll(Flag) with no timeouts, defaulting to 0ms, avoiding the risk of blocking altogether.

Finally, because emulating poll(2) on all platforms is not a goal, it might be possible to further refine this by replacing the Windows implementation of sysfs.poll() with ad-hoc versions basically remove the wrapper (e.g. without handling an arbitrary timeout, since this would be now handled in poll_oneoff and/or for specific file types instead of detecting them automatically in sysfs.poll(); in other words possibly going straight from WinTcp*File.Poll() to WSAPoll() etc.)

This includes stdin, pipes but also sockets. Updates RATIONALE.md

Signed-off-by: Edoardo Vacchi <[email protected]>
@evacchi
Copy link
Contributor Author

evacchi commented Aug 1, 2023

I added more test cases, and I realized some issues with file sock on both Windows and POSIX re: setting the nonblocking flag. I also added a wasi test case with zig-cc. Tomorrow I'll try to figure out if I can write another implementation in Rust and/or Go that actually invokes poll with multiple FDs (I am not sure the higher-level APIs actually do it)

EDIT: heh, I tried to write a simple example with gotip but I couldn't figure out how to make it call poll_oneoff 😬

// mainMixed is an explicit test of a blocking socket + stdin pipe.
func mainMixed() error {
	// Get a listener from the pre-opened file descriptor.
	// The listener is the first pre-open, with a file-descriptor of 3.
	f := os.NewFile(3, "")
	l, err := net.FileListener(f)
	defer f.Close()
	if err != nil {
		return err
	}
	defer l.Close()

	ch1 := make(chan error)
	ch2 := make(chan error)

	go func() {
		// Accept a connection
		conn, err := l.Accept()
		if err != nil {
			ch1 <- err
			return
		}
		defer conn.Close()

		// Do a blocking read of up to 32 bytes.
		// Note: the test should write: "wazero", so that's all we should read.
		var buf [32]byte
		n, err := conn.Read(buf[:])
		if err != nil {
			ch1 <- err
			return
		}
		fmt.Println(string(buf[:n]))
		close(ch1)
	}()

	go func() {
		b, err := io.ReadAll(os.Stdin)
		if err != nil {
			ch2 <- err
			return
		}
		os.Stdout.Write(b)
		close(ch2)
	}()
	err1 := <-ch1
	err2 := <-ch2
	if err1 != nil {
		return err1
	}
	if err2 != nil {
		return err2
	}
	return nil
}

@codefromthecrypt
Copy link
Contributor

Thanks for digging into the integration tests. This type of code/behavior is hard to pin down and exactly where the extra tests come in: to establish an "implementation quorum" please ping back when you feel things are settled or need a hand from someone else (even if technical over my head ;))

evacchi added 2 commits August 2, 2023 11:07
Signed-off-by: Edoardo Vacchi <[email protected]>
Signed-off-by: Edoardo Vacchi <[email protected]>
@evacchi evacchi force-pushed the poll-all-fds branch 2 times, most recently from bbb7b33 to 65d5b3c Compare August 2, 2023 19:02
@evacchi
Copy link
Contributor Author

evacchi commented Aug 2, 2023

Ok, I added a test for gotip, and I have also figured out something for Rust (using tokio-rs/mio).

  • I verified (with --hostlogging poll and also via debugger) they actually exercise poll_oneoff.
  • The tests are not all the same but they are ~similar.
  • The one that's actually most different is the one for gotip, because I can't tell if there is a straightforward way to make sure that all FDs are checked at once (spoiler: they aren't!), but at least it's checking more than one subscription at a time -- but they will all be in nonblocking mode, so it doesn't really follow the new code path, except for the timers.

I think at this point this is ready for review. It may still lack a bit of polish but your feedback is welcome.

EDIT: I have updated the top post.

evacchi added 2 commits August 2, 2023 21:16
Signed-off-by: Edoardo Vacchi <[email protected]>
Signed-off-by: Edoardo Vacchi <[email protected]>
@evacchi evacchi marked this pull request as ready for review August 2, 2023 19:29
@evacchi evacchi requested a review from mathetake as a code owner August 2, 2023 19:29
// if the fd is Stdin, and it is in non-blocking mode,
// do not ack yet, append to a slice for delayed evaluation.
blockingStdinSubs = append(blockingStdinSubs, evt)
writeEvent(outBuf[outOffset:], evt)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

writeEvents has been simplified, we pass the buffer at the right offset already

// and we don't need to wait for the timeout: clear it.
if readySubs != 0 {
timeout = 0
sysCtx := mod.(*wasm.ModuleInstance).Sys
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the section below has been reordered for clarity

Comment on lines +142 to +149
} else if file.File.IsNonblock() {
writeEvent(outBuf[outOffset:], evt)
nevents++
} else {
writeEvent(outBuf, evt)
readySubs++
// If the fd is blocking, do not ack yet,
// append to a slice for delayed evaluation.
fe := &filePollEvent{f: file, e: evt}
blockingSubs = append(blockingSubs, fe)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these have been reordered for clarity; first we avoid the double negation (!IsNonblock()), second the two immediate writes are in the if + else if, while else handles the special case of "delayed" processing.

0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
0x0, 0x0, 0x0, 0x0,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fdReadSubFd is now being used in other poll tests (see above); e.g. to create multiple records; in order for such records to be valid, we zero-pad the byte slice to the right size (32 bytes)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
0x0, 0x0, 0x0, 0x0,
0x0, 0x0, 0x0, 0x0, // pad to record size (32 bytes)

@@ -47,7 +47,7 @@ func Test_sockAccept(t *testing.T) {
t.Run(tc.name, func(t *testing.T) {
ctx := experimentalsock.WithConfig(testCtx, experimentalsock.NewConfig().WithTCPListener("127.0.0.1", 0))

mod, r, log := requireProxyModuleWithContext(ctx, t, wazero.NewModuleConfig())
mod, r, log := requireProxyModuleWithContext(ctx, t, wazero.NewModuleConfig().WithSysNanosleep())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

poll_oneoff now respects SysNanosleep thus, it has to be configured properly

@@ -100,7 +100,6 @@ func syscallConnControl(conn syscall.Conn, fn func(fd uintptr) (int, sys.Errno))
// because they are sensibly different from Unix's.
func newTCPListenerFile(tl *net.TCPListener) socketapi.TCPSock {
w := &winTcpListenerFile{tl: tl}
_ = w.SetNonblock(true)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we no longer default to nonblock on sockets (it's not necessary)

Comment on lines +136 to +138
func (f *winTcpListenerFile) Poll(flag sys.Pflag, timeoutMillis int32) (ready bool, errno sys.Errno) {
return _pollSock(f.tl, flag, timeoutMillis)
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

implement Poll properly for sockets

Comment on lines +120 to +122
if ready, errno := f.Poll(sys.POLLIN, 0); !ready || errno != 0 {
return nil, sys.EAGAIN
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can rewrite this in terms of f.Poll()

@evacchi
Copy link
Contributor Author

evacchi commented Aug 3, 2023

oh, I was almost forgetting: while running make build.examples.zig-cc I noticed that the DWARF example was mistakingly being overwritten (I think the new build might have stripped the DWARF symbols) so we'll need to check that

Copy link
Contributor

@codefromthecrypt codefromthecrypt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, just I might be missing a decision point around the poll interval. Right now, we seem to be polling via poll immediate followed by 100ms sleep.

100ms is a very long time, so I would suspect this should be a lot shorter, probably 100us even could be too long). It is worse because external sleep approach guarantees it will take that long.

So, I'm wondering mainly why not use poll with a short timeout isn't used, if it is a defect or we are trying for the native side to use the fake clock.

IMHO I think that since timeout is a parameter of poll, in the Poll api, how timeout is implemented is up to the backend, which may choose to use a real or fake clock to sleep or a native poll. If in any case we are not able to trust the implementation of the poll timeout and avoiding using it for that reason, I would try to make it very clear why not, because in worst case it can feel like "spaghetti around a problem" to do externnal orchestration of a feature defined in the poll documentation (timeout)

go closeChAfter(sysCtx, timeout, timeoutCh)

pollInterval := 100 * time.Millisecond
ticker := time.NewTicker(pollInterval)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fyi closeChAfter we are intentionally using the context clock, but this will use a real one..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whoops!

block the carrier thread of the goroutine, preventing any means
to support context cancellation directly.

We obviate this by invoking `poll` with a 0 timeout repeatedly,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't fully understand this. why not poll with 100ms vs poll zero+sleep? Are you suggesting that the poll implementation blocks too long even if you put 100ms? If so maybe the above paragraph needs to clarify this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, if we put a 100ms timeout, then the syscall will block for 100ms, which means it will also block the underlying OS thread. I will add a clarification.

0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
0x0, 0x0, 0x0, 0x0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
0x0, 0x0, 0x0, 0x0,
0x0, 0x0, 0x0, 0x0, // pad to record size (32 bytes)

@evacchi
Copy link
Contributor Author

evacchi commented Aug 4, 2023

100ms is a very long time, [...]
So, I'm wondering mainly why not use poll with a short timeout isn't used, if it is a defect or we are trying for the native side to use the fake clock.

100ms is completely arbitrary, I picked that because that's what I picked in the Windows impl (which now is irrelevant, since it is being invoked here with 0ms). We could certainly use a smaller delay, I am not an expert at all here.

IMHO I think that since timeout is a parameter of poll, in the Poll api, how timeout is implemented is up to the backend, which may choose to use a real or fake clock to sleep or a native poll. If in any case we are not able to trust the implementation of the poll timeout and avoiding using it for that reason, I would try to make it very clear why not, because in worst case it can feel like "spaghetti around a problem" to do externnal orchestration of a feature defined in the poll documentation (timeout)

yeah the main issue is that the real poll is actually hogging a real OS thread, that means e.g. if you invoke poll_oneoff with a long delay it won't return control to the Go runtime until the underlying poll has returned. This way we avoid relying on the syscall native delay, and we give the Go runtime more chances to take over if necessary (e.g. schedule other goroutines)

so this has more to do with the interaction between Go and the underlying syscall, than how the syscall is actually implemented at the OS-level. I will add some notes to the RATIONALE

the "bonus" is that by using the ctx clock we are also respecting that clock.

@codefromthecrypt
Copy link
Contributor

codefromthecrypt commented Aug 4, 2023

This way we avoid relying on the syscall native delay

What I'm probably missing here is that this is a timeout. What I'm thinking and could be wrong, but timeout is the worst case. A delay means it is blocked regardless.

So say 100us and the "file is ready to write" event happens at +10us. Using a clock sleep approach you have to wait extra 90us anyway. If this is correct then it seems we are trying too hard to not use poll's timeout even when Go uses it. In other words, I feel we are trying too hard to not use it, and in the process force a longer delay than necessary (blocking the worst case even when an event occurs before the worst case). What am I missing?

@ncruces
Copy link
Collaborator

ncruces commented Aug 4, 2023

My understanding of the problem/solution is this:

If you poll with timeout:

  • the guest unblocks as soon as an event happens;
  • but while waiting…
  • the host has an OS thread blocked from doing anything else;
  • the host can only cancel the guest at timeout intervals.

If you poll zero and sleep the timeout:

  • the guest can only unblock at timeout intervals;
  • but while sleeping…
  • the host OS thread is free to do other things;
  • the host can cancel the guest at any time.

Basically this looks like a choice/balance between giving priority to host or guest resources.

For a single guest (like browser) environment, the first one would be the right call, hands down.
For scaling in the backend, I'd be inclined to go with the second, if we can't do better.

@evacchi
Copy link
Contributor Author

evacchi commented Aug 4, 2023

since this is introducing a significant change in how we handle poll_oneoff, I will close this PR for now and instead contribute the tests, small cleanups and fixes that were part of it, without modifying how poll_oneoff handles FDs, we can always revisit this :)

@codefromthecrypt
Copy link
Contributor

codefromthecrypt commented Aug 5, 2023

#1606 (comment) from @ncruces has basically the concern which was at the crux of this issue.

I would say that the code abandoned and also in the description seems to both say that using blocking via syscall timeout is bad, yet I believe this is actually what go does in net poll.

It makes me wonder how anyone would be able to choose a good value to block the client. Also why anyone would do this if there was only one blocking event. Like the guessed interval would always be wrong even if a little. You are choosing to wait not long enough or too long, unless there's a constant stream of data.

If the project as a whole wants to prevent use of the timeout parameter in the syscall, basically to always block for no time and guess an interval to sleep for (with a syscall each guess).. I feel like the API should change, and remove the ability to give a timeout. (remove the param from File.Poll). Now, it makes even less sense to have a timeout parameter and not use it.

I think personally I've said enough on this and I'll leave any decision up to you all, just maybe decide once and for all in a couple months? because we are exposing the filesystem api and it should make sense why poll would be exposed and never use the timeout val here or in a potential multi-poll scenario.

@codefromthecrypt
Copy link
Contributor

The thing I believe whatever action to take is, that if we believe wazero should not support integrated syscall pollers, we should be very loud about it. It is a different direction than I expected considering the syscall layer otherwise acts like sys calls. I was expecting a comment like what go has on an emulated thing vs emulating poll behavior and intentionally not allowing native polling, by doing so above where someone has control of the impl (above the fs API)

Basically, I would suggest study the topic in go, like here, make a decision and then try to rationalize both in API and also in the RATIONALE why specifically here only in poll we are doing like this, where other places like blocking Read will still block the calling thread etc. I've an idea that folks can figure out a justification, just I don't want to moderate it.

@mathetake mathetake deleted the poll-all-fds branch August 8, 2023 02:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

support non-blocking files
3 participants