-
Notifications
You must be signed in to change notification settings - Fork 155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BE] Unify buffer_size
across datapipes
#335
Comments
@NivekT does the 👍 mean you agree with my suggestions and I can send a PR? |
@pmeier Sorry I missed this issue before. We are currently exploring a plan to add a |
Is there a design document / issue for this? If not, what would a an instance of this class contain? |
Not for now. This is a requirement from internal usage that they want to provide some special buffer instance because they would be able to replace a certain instance of DataPipe in the graph but using the class BufferSpec:
buffer_size: int If users use an integer as an argument, we should automatically change it into an instance of |
I can't comment on the internal use case, but if the public API still accepts |
Yeah. And, for this issue, we should definitely unify the buffer_size. |
Let's not have two ways to do the same thing. If we decide on a sentinel, everything else should error |
Makes sense to me |
buffer_size
across datapipesbuffer_size
across datapipes
The
buffer_size
parameter is currently fairly inconsistent across datapipes:buffer_size
buffer_size
Here are my suggestion on how to unify this:
buffer_size
everywhere. It makes little difference whether we use1e3
or1e4
given that it is tightly coupled with the data we know nothing about. Given today's hardware / datasets, I would go with 1e4, but no strong opinion.torchvision
simply usesINFINITE_BUFFER_SIZE = 1_000_000_000
, which for all intents and purposes lives up to its name. Which sentinel we use, i.e.-1
orNone
, again makes little difference. I personally would useNone
to have a clear separation, but again no strong opinion other than being consistent.buffer_size=None
. I'm all for having a warning like this in the documentation, but I'm strongly against a runtime warning. For example,torchvision
datasets need to use an infinite buffer everywhere. Thus, by using the infinite buffer sentinel, users would always get runtime warnings although neither them nor we did anything wrong.The text was updated successfully, but these errors were encountered: