Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(stdio): allow usage of file writer on shared non-blocking file descriptors #8725

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

ljmf00
Copy link
Member

@ljmf00 ljmf00 commented Mar 20, 2023

When dealing with shared file descriptors like stdout, stderr and stdin, we should be ready to handle, whatever is their state. In the case of set it as a non-blocking file descriptor, we should retry on EAGAIN, as per POSIX documentation. We should also check for the stream error before assuming it's an error.

Since our API assume everything is written (no non-blocking behaviour), we should retry and not just write whatever is possible.

@dlang-bot
Copy link
Contributor

Thanks for your pull request, @ljmf00!

Bugzilla references

Your PR doesn't reference any Bugzilla issue.

If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog.

Testing this PR locally

If you don't have a local development environment setup, you can use Digger to test this PR:

dub run digger -- build "master + phobos#8725"

Copy link
Contributor

@pbackus pbackus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just need to fix the failed style check.

std/stdio.d Outdated Show resolved Hide resolved
@CyberShadow
Copy link
Member

I'm not sure that looping is the best way to handle this situation. It will put the program into a busyloop which keeps asking the OS to write data and the OS refusing immediately, until the situation is resolved (which may take any amount of time). Though the program will work, the situation will now manifest as a performance problem.

Waiting for the socket to be writable with select would be less wrong, but I'm not sure if this warrants the complexity.

I think we should instead ensure that it's easy to create an object which can be manipulated with the appropriate APIs, whether that's converting a File to a Socket or duplicating the file descriptor to a blocking one so that it can be used with File.

Copy link
Member

@CyberShadow CyberShadow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem right.

@ljmf00
Copy link
Member Author

ljmf00 commented Apr 10, 2023

I think we should instead ensure that it's easy to create an object which can be manipulated with the appropriate APIs, whether that's converting a File to a Socket or duplicating the file descriptor to a blocking one so that it can be used with File.

Duplicating the file descriptor will share the blocking properties, so that's not an option. Despite the added complexity, wouldn't select also interfere with users event pollers?

This won't change the behavior if the file descriptor is blocking, and our write API doesn't have a way to know how many bytes we wrote, so the API is not really prepared for non-blocking operations, therefore, we should probably act as if it was blocking.

I couldn't find any better solution for this problem, but this fails if the user is interacting with some program that sets the stdout/stderr as non-blocking, so we should definitely handle this in some way.

@CyberShadow
Copy link
Member

Duplicating the file descriptor will share the blocking properties, so that's not an option.

Hm, yeah.

Despite the added complexity, wouldn't select also interfere with users event pollers?

How so?

This won't change the behavior if the file descriptor is blocking, and our write API doesn't have a way to know how many bytes we wrote, so the API is not really prepared for non-blocking operations, therefore, we should probably act as if it was blocking.

No, I think throwing is appropriate. We should most definitely not try to smooth over exceptional situations with DWIM hacks which guess what the user wants.

I couldn't find any better solution for this problem, but this fails if the user is interacting with some program that sets the stdout/stderr as non-blocking, so we should definitely handle this in some way.

I think the situation where standard streams are non-blocking falls well beyond what a standard library should reasonably support. It is in the same area as the standard streams being completed closed (so the first file opened gets fd 0 etc.).

@ljmf00
Copy link
Member Author

ljmf00 commented Apr 10, 2023

Despite the added complexity, wouldn't select also interfere with users event pollers?

How so?

I think I misinterpreted select. I thought that listening to the same descriptor it wont notify both. Well, could be an option. How would you suggest to use it?

This won't change the behavior if the file descriptor is blocking, and our write API doesn't have a way to know how many bytes we wrote, so the API is not really prepared for non-blocking operations, therefore, we should probably act as if it was blocking.

No, I think throwing is appropriate. We should most definitely not try to smooth over exceptional situations with DWIM hacks which guess what the user wants.

The problem is, when the user doesn't want it to throw? Should they try/catch? EAGAIN is meant to be run again, so, on those situations the user has to try/catch, which is not the workflow to be used with exceptions.

I couldn't find any better solution for this problem, but this fails if the user is interacting with some program that sets the stdout/stderr as non-blocking, so we should definitely handle this in some way.

I think the situation where standard streams are non-blocking falls well beyond what a standard library should reasonably support. It is in the same area as the standard streams being completed closed (so the first file opened gets fd 0 etc.).

I think it should. After-all its a shared file descriptor, we should be ready for side effects applied by other programs, unrelated to ours.

@CyberShadow
Copy link
Member

CyberShadow commented Apr 10, 2023

I think I misinterpreted select. I thought that listening to the same descriptor it wont notify both. Well, could be an option. How would you suggest to use it?

Hypothetically, on an EAGAIN, check if the fd is non-blocking. If it is, call select with an infinite timeout with just that fd on the read set. If it returns with success, retry the loop.

But, I don't think we should do this.

The problem is, when the user doesn't want it to throw?

Then they should use the std.socket API, or another library, which is better suited for working with non-blocking file descriptors.

I think it should. After-all its a shared file descriptor, we should be ready for side effects applied by other programs, unrelated to ours.

No, and we should not take that stance with anything. DWIM in a general-purpose library, much more in a language's standard library, is a recipe in disaster. The contracts of our APIs should be simple and as short as possible; if you start adding circumstances which our functions "helpfully" try to handle, and then you document all the situations and the possible outcomes of such multiplied with all interactions of each other, the result is a mess that no one can sensibly design a system with predictable behavior out of.

Let's take this to an extreme. Consider the possibility that a root process (or even non-root with ptrace_scope=0) may exist which uses /proc/PID/mem to randomly corrupt bytes in our D program's memory. This is an entirely valid way for one process to interact with another, and it uses a documented feature of the Linux kernel. So, should Druntime keep multiple redundant copies of all data and run threads which detect errors and repair bytes corrupted by such a hostile process?

No. If so, why yes to this? Scaling it back, where do we draw the line?

How exactly do you define, in unambiguous technical terms, where the line is drawn?

Another way to look at this would be: what other languages' standard libraries do the same thing as what this change is proposing?

For std.stdio specifically, one easy argument to make against this proposal is that, because it wraps the C FILE* API, it should generally follow its behavior. Since C programs will fail in the same way as D programs currently, the behavior is as expected, and should not be changed. (However, I think DWIM-isms should be kept out of the standard library in general.)

@ljmf00
Copy link
Member Author

ljmf00 commented Apr 20, 2023

Then they should use the std.socket API, or another library, which is better suited for working with non-blocking file descriptors.

Since the file descriptors are shared you can't control them, so, in theory, every program should handle these cases, it just happens to be an edge case, in most situations.

Let's take this to an extreme. Consider the possibility that a root process (or even non-root with ptrace_scope=0) may exist which uses /proc/PID/mem to randomly corrupt bytes in our D program's memory. This is an entirely valid way for one process to interact with another, and it uses a documented feature of the Linux kernel. So, should Druntime keep multiple redundant copies of all data and run threads which detect errors and repair bytes corrupted by such a hostile process?

Yeah, but this is clearly an impractical situation. The presented solution is not.

I understand your points there, but I do think that our API is not designed to defer errors to the user to handle, specifically, EAGAIN, because we don't give the exact size written to the buffer.

So if we shift from the proposed implementation and have some other alternative: Should we then, like, create an exception that has the written size so the user can catch and proceed? Otherwise, to cover these cases, we can't use the stdio API, which is a bit unfortunate.

My only problem, then, is that exceptions for control flow is fundamentally wrong and slow. Therefore, I don't have another solution for it other than changing the API to be able to defer EAGAIN to the user, which we probably dont want to.

Hypothetically, would adding the size as a return value be as breaking change? ABI signature will change but this already happens across versions.

@CyberShadow
Copy link
Member

CyberShadow commented Apr 20, 2023

Since the file descriptors are shared you can't control them, so, in theory, every program should handle these cases, it just happens to be an edge case, in most situations.

The correct way to handle this situation is to signal an error.

Yeah, but this is clearly an impractical situation. The presented solution is not.

OK, let's extrapolate to a less extreme situation. What if file descriptor 0 is a socket, should we detect that and switch to using send/receive instead of read/write?

Should we then, like, create an exception that has the written size so the user can catch and proceed?

If we were to reach the conclusion that std.stdio should support this situation, then I agree that this would make sense. Otherwise, I don't think that's necessary, because you can query if the file descriptor is non-blocking, and switch to using appropriate (or the low-level C) APIs if desired.

Otherwise, to cover these cases, we can't use the stdio API, which is a bit unfortunate.

We also can't use the stdio API to work with sockets, even though on POSIX they are also file descriptors.

I just don't think this is something that std.stdio needs to be burdened with.

My only problem, then, is that exceptions for control flow is fundamentally wrong and slow. Therefore, I don't have another solution for it other than changing the API to be able to defer EAGAIN to the user, which we probably dont want to.

If we were to reach the conclusion that std.stdio should support these, then one way to achieve that would be to add a rawPartialWrite function OSLT.

Hypothetically, would adding the size as a return value be as breaking change? ABI signature will change but this already happens across versions.

On one hand, adding a return value would make it more symmetric with rawRead. However, this approach is not great for the same reasons that APIs which return errors as regular values (without some kind of @mustuse-like mechanism) are not great - it's much too easy to ignore the error (and adding @mustuse here would be a breaking change). rawRead is, unfortunately, already in that unpleasant territory.

@CyberShadow
Copy link
Member

Philosophy aside, here are some blockers which I think would make sense to address before we decide that we should support this situation in std.stdio:

  • Major programming languages which support I/O to standard input/output streams with non-blocking file descriptions
  • Some authoritative documentation that the underlying C FILE API supports working with non-blocking file descriptions

@RazvanN7
Copy link
Collaborator

RazvanN7 commented May 1, 2023

@ljmf00 what are your plans with this PR?

@LightBender LightBender added the Review:Phantom Zone Has value/information for future work, but closed for now label Oct 27, 2024
Co-authored-by: Paul Backus <[email protected]>
@LightBender
Copy link
Contributor

@rikkimax You're responsible for this one now.

@LightBender LightBender added Review:Salvage This PR needs work, but we want to keep it and it can be salvaged. and removed Review:Phantom Zone Has value/information for future work, but closed for now labels Oct 27, 2024
@rikkimax
Copy link
Contributor

We'll probably need a way to disable this behaviour and possibly a max retry count, but otherwise this guarantees synchronous behaviors.

Please add this configurability and once CI is green can be merged.

@CyberShadow
Copy link
Member

I strongly recommend against inventing ad-hoc workarounds in our standard I/O module which cause us to spin the CPU, with no evidence to support that this what the rest of the world (that D programs live in and interact with) expects us to do or does this as well.

@rikkimax
Copy link
Contributor

A reasonable argument, to not make it on by default. Opt-in only.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Review:Salvage This PR needs work, but we want to keep it and it can be salvaged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants