Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update torchdata.nodes docs, use sphinx for API #1396

Merged
merged 6 commits into from
Dec 11, 2024

Conversation

andrewkho
Copy link
Contributor

@andrewkho andrewkho commented Dec 11, 2024

Use sphinx for torchdata.nodes api reference
split existing torchdata.nodes into "What is torchdata.nodes?" and "getting started"

This looks like a lot of line changes but it's mostly docstring updates, and splitting the existing docfile into two

Tested locally:

API page:
image

Getting started:
image

Migrating from torch.utils.data
image

What is torchdata.nodes?
image

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 11, 2024
Copy link

pytorch-bot bot commented Dec 11, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/data/1396

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 054e6a6 with merge base e316c5c (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Copy link
Contributor

@divyanshk divyanshk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets goo!!!!

@divyanshk
Copy link
Contributor

@andrewkho actually since you have this PR already open, do you want to add a quick section about perf in migrate_to_nodes_from_utils.rst ?

@andrewkho
Copy link
Contributor Author

There's a section in "why torchdata nodes", do you think we should add a link to it? Also is your WP down?

@divyanshk
Copy link
Contributor

divyanshk commented Dec 11, 2024

@andrewkho yeah, there seems to be an outage.

@divyanshk
Copy link
Contributor

@andrewkho we discussed yesterday to include some perf related info in the docs, so that all the "informational pieces" are visible together in the docs. yeah we can include a pointer to the readme, or include a brief blurb here as well. up to you?

Must call super().reset(initial_state)
* next() -> T - logic for returning the next value in the sequence, or throw StopIteration
* get_state(self) -> dict: returns a dictionary representing state that may be passed to reset()
"""BaseNodes are the base class for creating composable dataloading dags in ``torchdata.nodes``.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question, should it be DAGs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's pretty un-ambiguous but you're correct, i'll update it


We'll demonstrate how to achieve the most common DataLoader features, re-use existing samplers and datasets,
and load/save dataloader state. It performs at least as well as ``DataLoader`` and ``StatefulDataLoader``,
see :ref:`how-does-nodes-perform`.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@divyanshk added xref here

IterableDatasets, each worker needs to figure out (through
``torch.utils.data.get_worker_info``) what data it should be returning.

.. _how-does-nodes-perform:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@divyanshk it links here

# also provides state_dict and load_state_dict methods.
return tn.Loader(node)

Now let's test this out with a useless dataset, and demonstrate how state management works.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: replace useless with simple or something else :D ?

@ramanishsingh ramanishsingh self-requested a review December 11, 2024 20:23
@andrewkho andrewkho merged commit 1608fcb into main Dec 11, 2024
39 checks passed
@andrewkho andrewkho deleted the andrewkh/update-nodes-docs branch December 11, 2024 20:35
andrewkho added a commit that referenced this pull request Dec 12, 2024
* update docstrings for sphinx

* update docstrings for sphinx

* add migration from torch.utils.data

* add performance section

* add xref to performance in migrate guide

* minor pr comments

---------

Co-authored-by: Andrew Ho <[email protected]>
andrewkho added a commit that referenced this pull request Dec 12, 2024
* update docstrings for sphinx

* update docstrings for sphinx

* add migration from torch.utils.data

* add performance section

* add xref to performance in migrate guide

* minor pr comments

---------

Co-authored-by: Andrew Ho <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants