-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SUMMA interacts with more forcing files than needed for specified simulation length #501
Comments
I addressed this in SUMMA-Actors and the way I did it was I modified the file-manager to have two more options. This allows Summa to know how many files it should read. The extra options were: If this is acceptable I can write out a broader plan for changes. |
Hi Kyle,
A few thoughts on this. I'm not sure there's a strong motivation to make
this change. There's value in avoiding extra entries in the file manager
and other input files, mainly to limit clutter (in the file manager and
code).
It's also probably best to avoid having the user provide information that
SUMMA can decipher internally if needed (eg forcingStart or frequency,
which if needed could be determined as SUMMA reads the forcing file list).
Questions:
* does the interaction with more forcing files than needed cause any
significant run time impact or impact another aspect of performance? (you
can test this by giving it a forcing file list that is long versus limited
to just the forcing file needed). Is the run time impact > 0.01%? I would
guess not.
* Can the reduction in this interaction be achieved by the user just
editing the forcingFile list to include those of interest? I don't see why
not.
Perhaps a halfway solution is to add a line or two of code so that SUMMA
stops checking forcing files once it has confirmed it has the desired run
period covered. This might be there now - I forget and it's been a while
since I looked at that part of the code.
But I'm not sure I see the current behavior of SUMMA as significantly
problematic to add new entries to the fileManager. If they are added, they
should also be made backward compatible so that anyone wishing to ignore
the entries can do so. Handling backward compatibility also can lead to
code bloat.
Cheers,
Andy
…On Fri, Oct 28, 2022 at 2:32 PM Kyle Klenk ***@***.***> wrote:
I addressed this in SUMMA-Actors and the way I did it was I modified the
file-manager to have two more options. This allows Summa to know how many
files it should read.
The extra options were:
forcingFreq 'month' ! the frequeny of forcing files (month, year)
forcingStart '1979-01-01' ! starting date of the forcing file list
If this is acceptable I can write out a broader plan for changes.
—
Reply to this email directly, view it on GitHub
<#501 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABIKARKRU4OVMY5T5SHD3NTWFQZ4HANCNFSM5MNQJZBA>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Hi Andy, Thanks for the feedback, I have been experimenting with SUMMA for some time and it is good to get some feedback on places in the code where changes should be avoided like the fileManager. I ran some tests that are hopefully helpful to this open issue. Dataset: North America - 492 Forcing Files - 517,315 GRUs - Chunked at 1000 GRU per chunk Running 1 GRU for one month with 1 forcing file in the forcingFileList.txt
Running 1 GRU for one month with loading 492 forcing files in the forcingFileList.txt
This I think would be the worst case scenario. When running more than one GRU at a time the running of the GRUs starts to take up the majority of the runtime. Hopefully this information is helpful. I did some extra testing on another machine that is closer to a typical consumer PC with 4-CPU, 16GB of RAM and 1 GBit connection. It was getting the benefits of caching from the NAS which made the results of loading in one file vs multiple basically the same. I can share them if needed once the cache is cleared but I expect they will be similar to the above with the difference coming from the network connection speed to our NAS. Thanks again for your comments, Some extra information on why I made changes in my experimental code that should have been addressed on my side before making changes to the code.
|
Hi Kyle,
Thanks, this is excellent testing information. It does appear that there
is an edge case for which loading all the files can be costly. That said,
in that particular testing case (a test running 1 gru for a massive domain
for 1 month), a user could probably be expected to take the additional step
of just using an alternate forcingFile list. Even so, I'm actually
surprised it's that costly and think it's worth taking a look at the code
around loading those files to see if there is anything repetitive going on.
There may be inefficiencies -- we have been rooting these out in the SUMMA
i/o since the start of trying to apply it to full-scale applications, but
I'm sure we're not done.
Based on this, however, I'd still lean against adding extra fileManager
entries. It's a key input file, and is used in a lot of different ways in
different applications, so adding anything extra there means it is
something that cannot be done in any other way. I don't think this rises
to that level, but others are of course free to weigh in here. If it is
added, it should be backward compatible so that current usage is not
required to change.
Thanks for doing the benchmarking on this ... that was really informative.
Best,
Andy
…On Sat, Oct 29, 2022 at 12:36 PM Kyle Klenk ***@***.***> wrote:
Hi Andy,
Thanks for the feedback, I have been experimenting with SUMMA for some
time and it is good to get some feedback on places in the code where
changes should be avoided like the fileManager.
I ran some tests that are hopefully helpful to this open issue.
Dataset: North America - 492 Forcing Files - 517,315 GRUs - Chunked at
1000 GRU per chunk
Machine configuration: 64 CPU - 2TB RAM - Connected to Network Area
Storage with 10 Gbit connection
netCDF version: 4.5.2
Running 1 GRU for one month with 1 forcing file in the forcingFileList.txt
- Run Time: 1 Second
Running 1 GRU for one month with loading 492 forcing files in the
forcingFileList.txt
- Run Time: 30 Seconds for first run
- Run Time: 1-3 Seconds for subsequent runs (looks like caching is
taking effect)
This I think would be the worst case scenario. When running more than one
GRU at a time the running of the GRUs starts to take up the majority of the
runtime.
Hopefully this information is helpful. I did some extra testing on another
machine that is closer to a typical consumer PC with 4-CPU, 16GB of RAM and
1 GBit connection. It was getting the benefits of caching from the NAS
which made the results of loading in one file vs multiple basically the
same. I can share them if needed once the cache is cleared but I expect
they will be similar to the above with the difference coming from the
network connection speed to our NAS.
Thanks again for your comments,
Kyle
Some extra information on why I made changes in my experimental code that
should have been addressed on my side before making changes to the code.
- When I was doing testing for my experimental version of SUMMA I was
using a cluster with slurm and requesting 1-CPU and 1-GB of RAM in an
interactive session. This configuration may have made the problem look
worse than it actually was as there may have been little benefit from
caching. During this testing I was usually testing with one GRU at a time
and not multiple but different configurations of forcing files, again
making the problem look worse than it may be for the typical user.
—
Reply to this email directly, view it on GitHub
<#501 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABIKARLNPD64FZX56EXVWCTWFVVB3ANCNFSM5MNQJZBA>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Hi Kyle, Just dropping a few thoughts here.
Cheers, |
Thanks for the comments, I can take a look at the source code and can report what I find and propose changes here. The reasoning for not changing the fileManager makes sense to me. I applied this change naively in Summa-Actors a while ago and looks like it needs to be reverted. Because I need to do the revert for our experimental version looking into other solutions and proposing them here should be no problem. That is a good point about the caching on HPC systems too. I remember you sharing that conversation a while back but it was a good refresher and I agree not something to rely on. Thanks, |
Hi Andy and Wouter, I went through the source code and this is what I found inside ffile_info.f90. The forcingFileList.txt is read in and then how ever many lines are there is how the length is determined for the array of forcing files. Then inside a do loop, each file in that array is opened and the descriptive information such as variable names and the number of time steps within that netCDF forcing file are used to populate the forcFileInfo data structure. When the user gives too many files in the forcingFileList.txt the loop will not exit until it has opened and populated information for every netcdf file that was listed in forcingFileList.txt. It seems to me a check could be implemented to break the do loop here so once we have passed the forcing files that are not needed we do not continue. This is I think what you had in mind. The scenario of how to handle the situation where the user has more forcing files in the beginning of the list, ie. start date starts at a forcing file that is not the first in the forcingFileList.txt could be covered by the same check but instead of breaking the loop we move to the next iteration. The catch here is that read_force.f90 will still open the beginning files unless iFile is set to the file that corresponds with the start of the simulation and not the start of the forcingFileList. I can clarify as needed if something doesn't make sense. I am still learning how best to communicate these issues. In the end this doesn't happen if the user is responsible for ensuring the number of forcing files is correct for their simulation. Best, |
Hi Kyle,
Thanks for looking into this. I'd guess it could probably be more
tailored. We can't assume that the forcing files are named in a way that
conveys their temporal range, so there's avoiding opening them and
checking the time variable. However, the time variable should be the only
one read before deciding if it's worth reading other variables. If the end
time is before the simulation start time, the process can be escaped, and
once the simulation end time is found in a file, the forcing read can be
escaped. In the end, this is actually quite a common practice for models.
I don't think we need to control it more through additional user input, but
if you see inefficiencies such as reading all the variables before checking
the time limits, we could refine that behavior. And then move on ... it
does seem to be a rare case where this would significantly impact run time,
and there are a lot of big SUMMA fish to catch and fry. (see the issues
list)
In general, I find that as you're working on models, you come across
numerous things that you can improve or fix. The challenge is not finding
imperfections but rather figuring out which to spend time on, and in what
order :)
Cheers,
Andy
…On Wed, Nov 2, 2022 at 9:41 AM Kyle Klenk ***@***.***> wrote:
Hi Andy and Wouter,
I went through the source code and this is what I found inside
ffile_info.f90.
The forcingFileList.txt is read in and then how ever many lines are there
is how the length is determined for the array of forcing files.
Then inside a do loop, each file in that array is opened and the
descriptive information such as variable names and the number of time steps
within that netCDF forcing file are used to populate the forcFileInfo data
structure. When the user gives too many files in the forcingFileList.txt
the loop will not exit until it has opened and populated information for
every netcdf file that was listed in forcingFileList.txt.
It seems to me a check could be implemented to break the do loop here so
once we have passed the forcing files that are not needed we do not
continue. This is I think what you had in mind.
The scenario of how to handle the situation where the user has more
forcing files in the beginning of the list, ie. start date starts at a
forcing file that is not the first in the forcingFileList.txt could be
covered by the same check but instead of breaking the loop we move to the
next iteration. The catch here is that read_force.f90 will still open the
beginning files unless iFile is set to the file that corresponds with the
start of the simulation and not the start of the forcingFileList.
I can clarify as needed if something doesn't make sense. I am still
learning how best to communicate these issues. In the end this doesn't
happen if the user is responsible for ensuring the number of forcing files
is correct for their simulation.
Best,
Kyle
—
Reply to this email directly, view it on GitHub
<#501 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABIKARIGGQPWPKYSZFUUCTTWGKDTHANCNFSM5MNQJZBA>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
btw, 'avoiding' should be 'no avoiding'!
…On Wed, Nov 2, 2022 at 7:46 PM Andrew Wood ***@***.***> wrote:
Hi Kyle,
Thanks for looking into this. I'd guess it could probably be more
tailored. We can't assume that the forcing files are named in a way that
conveys their temporal range, so there's avoiding opening them and
checking the time variable. However, the time variable should be the only
one read before deciding if it's worth reading other variables. If the end
time is before the simulation start time, the process can be escaped, and
once the simulation end time is found in a file, the forcing read can be
escaped. In the end, this is actually quite a common practice for models.
I don't think we need to control it more through additional user input, but
if you see inefficiencies such as reading all the variables before checking
the time limits, we could refine that behavior. And then move on ... it
does seem to be a rare case where this would significantly impact run time,
and there are a lot of big SUMMA fish to catch and fry. (see the issues
list)
In general, I find that as you're working on models, you come across
numerous things that you can improve or fix. The challenge is not finding
imperfections but rather figuring out which to spend time on, and in what
order :)
Cheers,
Andy
On Wed, Nov 2, 2022 at 9:41 AM Kyle Klenk ***@***.***>
wrote:
> Hi Andy and Wouter,
>
> I went through the source code and this is what I found inside
> ffile_info.f90.
>
> The forcingFileList.txt is read in and then how ever many lines are there
> is how the length is determined for the array of forcing files.
>
> Then inside a do loop, each file in that array is opened and the
> descriptive information such as variable names and the number of time steps
> within that netCDF forcing file are used to populate the forcFileInfo data
> structure. When the user gives too many files in the forcingFileList.txt
> the loop will not exit until it has opened and populated information for
> every netcdf file that was listed in forcingFileList.txt.
>
> It seems to me a check could be implemented to break the do loop here so
> once we have passed the forcing files that are not needed we do not
> continue. This is I think what you had in mind.
>
> The scenario of how to handle the situation where the user has more
> forcing files in the beginning of the list, ie. start date starts at a
> forcing file that is not the first in the forcingFileList.txt could be
> covered by the same check but instead of breaking the loop we move to the
> next iteration. The catch here is that read_force.f90 will still open the
> beginning files unless iFile is set to the file that corresponds with the
> start of the simulation and not the start of the forcingFileList.
>
> I can clarify as needed if something doesn't make sense. I am still
> learning how best to communicate these issues. In the end this doesn't
> happen if the user is responsible for ensuring the number of forcing files
> is correct for their simulation.
>
> Best,
> Kyle
>
> —
> Reply to this email directly, view it on GitHub
> <#501 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/ABIKARIGGQPWPKYSZFUUCTTWGKDTHANCNFSM5MNQJZBA>
> .
> You are receiving this because you commented.Message ID:
> ***@***.***>
>
|
Reading Andy's comment, it seems that we can agree that implementing these procedures has the potential to make debugging large-domain, long-term runs a bit more convenient. I would add that making the model's behaviour a bit more intuitive has an extra benefit for new users, who might need a while to realize that reducing the forcing file list size can speed up debugging runs. Seeing how this seems like a fairly low time investment, it might be worth it to implement these suggestions. |
Bug Reports
The text was updated successfully, but these errors were encountered: