-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restart trainer between stages #45
Conversation
Let's see whether this does better or worse. It moves the logic of "hey you're ready to move on" to the trainer program for a bit.
…he trainer stalled
# Conflicts: # src/opustrainer/trainer.py # test.py
9957bd4
to
7fde43e
Compare
It looks great! A couple of thoughts, both things are optional:
arguments:
early-stopping: 20
learning-rate: 0.03 which would be converted to
|
I initially chose not to do a YAML list for the arguments because they will eventually get converted to string. But Marian has a specific way how config is translated between the config yaml and the command line options which we could play into. I don't want to 100% fix OpusTrainer to marian though, so it would then accept either
Yes, I'm tempted to do that for consistency. You would lose something by restarting, probably, but it would be consistent. The alternative is that, when switching stages, OpusTrainer compares the
Yep, but you'd have the clear checkpoint to fall back on. To be fair, you should have this in all scenarios since it will fail at the start directly after having switched onto the next stage.
… And I find that pretty convincing. A third implementation would be restart between stages if any stage has For testing, maybe a different feature could work. Something where you specify run only X% of data so you can see whether it moves correctly through all the stages but maybe by severely underfeeding it you can make marian stall very quickly and you'd get it to move on quickly. |
Abandoned for now in favour of re-launching OpusTrainer with a different configuration (and different launch arguments for the trainer) altogether. |
This changes the way the trainer (marian) is wrapped. Previously, the trainer was started once, and would be fed until we ran out of data to feed it, or it stopped by closing its stdin.
This change contains:
Example of 1:
I'm a bit confused about whether this is the way to go, since stage-specific modifiers are overriding the global modifier config, while stage-specific
arguments
are appending to the end of the trainer command.On top of that, theEdit: changedtrainer
key in the config is a string that's split withshlex.split()
, whilearguments
is a list of arguments.arguments
to a string that's split usingshlex.split()
, like thetrainer
config option.