-
Notifications
You must be signed in to change notification settings - Fork 285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proper way to handle the return value of push()
in the _transform()
implementation of a transform stream
#1791
Comments
I guess given that transformers are always intermediary streams (connected with neither the source nor the sink), the back pressure management does not come under its purview, instead a considerations for the connected streams at either end of the transformer. cc @nodejs/streams to get an opinion. |
You should wait for a See this comment for a good example. |
I recommend to just call |
A transform stream sits between two other streams. Essentially you are moving data: I don't know how much there is to elaborate here. FYI, I'm one of the maintainers of streams themselves. |
@mcollina Because a transform does not just move data, it can also inflate it. A decompressor can increase the amount of data by orders of magnitude, so by having the writing side control the flow, backpressure control goes out the window. See for example this bug: regular/unbzip2-stream#17. The transformer buffers ~1MB of input to be sure to get at least one compressed, full block. But when the stream is ending that last megabyte may turn out to be hundreds of MB worth of compressed data, all of which has to be pushed. |
Or as another example: decompression bomb detection. The internal zlib transformer gained the ability to limit output size (nodejs/node@27253, nodejs/node#33516). If backpressure on the Readable side was properly bounded, the consumer of any type of Transform stream could protect itself against decompression bombs. In zlib the maximum compression ratio is ~1:1000; it is way higher in more effective compression formats. |
@sfriesel I stand by my recommendation in the generic case. Unless you have some very specific problem you want to solve, stay away from adding another level of buffering because you are likely to implement it wrongly. In the few cases this is desperately needed, there is .unshift() to push back data upstream and avoid implementing yet another level of buffering. Given your expertise on the subject, would you like to work on a few PRs to help with this? |
Another case where this seems to be a problem: adaltas/node-csv#408 It doesn't seem like there's actually any way to implement the back pressure yourself without reimplementing Transform completely. I wonder if transform could somehow be augmented to call |
The document for
transform._transform()
says that "Thetransform.push()
method may be called zero or more times" in it.The backpressure document says that you must respect the return value of
.push()
and should stop calling it when it returnsfalse
.However, it is not clear what I should do if
.push()
returnsfalse
when I have more data to push in thetransform._transform()
implementation.The text was updated successfully, but these errors were encountered: