You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I based my work (first ever torchtune attempt) on the first example on this page, and had an error until I added _component_: torchtune.datasets.instruct_dataset which is in the later examples but not the first one.
You are correct, the first example is missing the _component_: torchtune.datasets.instruct_dataset field. Good catch. We welcome a PR to quickly patch that, or we can flag it for now and have it patched soon.
to start answering the (IMO important!) question of: "How do I specify what data to train on?!?"
I agree this is the most important question when fine-tuning an LLM. I've been meaning to make this a bit more visible in the documentation by adding a custom data page under "Basics" or "Tutorials", this is still on our todo list.
https://pytorch.org/torchtune/stable/basics/instruct_datasets.html
I based my work (first ever torchtune attempt) on the first example on this page, and had an error until I added
_component_: torchtune.datasets.instruct_dataset
which is in the later examples but not the first one.Also, as a general note, it took a surprising amount of looking starting from the "first fine-tune" tutorial (https://pytorch.org/torchtune/stable/tutorials/first_finetune_tutorial.html#) and the how-too guide (https://www.llama.com/docs/how-to-guides/fine-tuning/) to start answering the (IMO important!) question of: "How do I specify what data to train on?!?"
The text was updated successfully, but these errors were encountered: