-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Need help with splitting the Runspec based on source type #76
Comments
What we mean by splitting RunSpecs by source type is to create a separate RunSpec for each source type. The rest of the RunSpec can be identical--same time spans, same input database, same output database--but for a run with 13 source types, you'd end up with 13 RunSpecs. Your motorcycle RunSpec would have this
Your passenger car RunSpec would have the following
And so on and so forth. With these smaller RunSpecs, the intermediate MariaDB joins will be smaller. Smaller joins typically have better performance than larger joins, so this is why some users may see a performance improvement running 13 RunSpecs sequentially compared to running one single RunSpec. |
@danielbizercox thanks for describing the strategy. Is it advisable to have separate output databases for each split of the runspec file? Can it improve performance if I run all the splits in parallel with worker partitioning? |
Deciding whether or not to use the same output database depends on your post-processing preferences. However, typically we'd recommend using the same output database. When doing so, the only difference in your output between doing 1 run vs. doing 13 runs is that each source type will also have a different MOVESRunID value in your Regarding "worker partitioning", I'm not sure what you mean by that. You can only start one main MOVES process per computer. You can launch additional MOVES workers, which may potentially speed up each individual run, but this typically has a minor impact and we generally do not see much improvement beyond 3 workers. However, if you have multiple computers with MOVES installed (e.g., a cluster of VMs), you can launch each RunSpec in parallel, and that will produce output significantly faster. However, this will result in separate output databases on each computer. To facilitate post-processing in this use case, we have a MOVES Output Grouper tool that can stitch together multiple output databases into a single one: https://github.com/USEPA/EPA_MOVES_Model/blob/master/tools/MOVESOutputGrouper.md |
@danielbizercox by "worker partitioning" I mean that I start multiple sets of workers on multiple command lines each with a different shared folder configuration. Its a lot of manual work but I wanted to test this out. So, essentially:
I store the output of the splits in different output databases. Will this give me accurate results or is it not possible to do this? I believe I can do this as long as I have enough logical processors on one computer to run 13 runspec files in parallel where each one of them have a dedicated set of workers helping them. |
It is not possible to run 13 RunSpec files in parallel; each call to |
Thanks for the information. |
I was reading the document that mentions strategies for making a MOVES run faster: https://github.com/USEPA/EPA_MOVES_Model/blob/master/docs/TipsForFasterMOVESRuns.md
In here one of the strategies talk about splitting the Runspecs by source type. How can we do this? I am attaching a sample runspec file for reference. How do we split it on specific source types?
@danielbizercox can you help me out here? Thanks
The text was updated successfully, but these errors were encountered: