-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RAM Requirement Reduction / Periodic Result File Writing #71
Comments
@AnsArn this issue is linked to the fix for output variable selection. |
Hi Marcus. I have also find that cosipy needs huge amount of RAM when writing the output. What was the change you made in the code as I can not see the screenshot you attached |
Hi Sarah. The basic idea is to include the result writing section within the loop starting on line 215 of COSIPY.py, so that the results from nodes are immediately written to the file and not witheld in the worker's memory. I could email you the screenshot which shows the change, if you are unable to view the image I posted above. It is only a rough proposal at present and I haven't thoroughly tested it yet. |
I've also been looking at ways to reduce the size of the output dataset (and therefore RAM requirement), by allowing the user to select the temporal frequency at which values are reported (eg. hourly --> monthly or yearly). I have had this working succesfully for some time, but aggregating the hourly values together (eg. melt) into a yearly cumulative sum often caused a bit of a simulation time overhead. However, I think I may have overcome this issue now. It only requires the user to provide a CSV with their desired dates, such as: Dates [yyyy-mm-dd hh:mm:dd] The output would reduced to a size of 4 years rather than 34,764 hours, with an annual average or sum (variable-dependant) for each year. Hopefully I can share this soon, if there is interest, and this has not been implemented by other the main model developers already. |
Thanks a lot for your suggestions and screenshot. Yes that would be a great enhancement of the model. I do not have resources to implement this at the moment, but it is an urgent point on our list. In the meantime you can also selected the variables which you want to save to the output. |
Hi Anselm, This is certainly a change I could draft on the new version 2.0 code in the new year. Then you could determine whether it is suitable for implementation if you think it is beneficial for the modelling community. |
I have found that the COSIPY model has a tendancy to need a very high amount of memory, particularly if one is trying to simulate over a large spatial and temporal resolutions. I believe the reason for this is due to the fact that the current code structure doesn't allow the threads/workers to release memory until the 'future.result()' method is called and the results are being written at the very end of the simulation.
For context, I am using a computing cluster with 28 threads/workers each with an allocation of 4 GB of RAM to simulate a spatial domain of 2,043 over 745,104 timesteps (hourly between 1939 - 2024). The model input 'DATA' xarray dataset is 48 GB and I quickly have a crash due to insufficient memory upon reaching line 246 of COSIPY.py:
futures.append(client.submit(cosipy_core, DATA.isel(lat=y, lon=x), y, x, ....
Even trying to use fewer workers with an allocation of 12 GB of RAM would still cause this issue. I am not entirely sure but I think this is because the model is trying to distribute the a huge portion of the DATA file to each thread/worker which must be stored in RAM in this line.
I found that by restructuring the code to simulate and write the output results in groups of 28 (the number of available threads/workers that can simulate simultaneously for me), I could reduce the RAM required down to less than 500 MB. The drawback is a slight increase in writing time, but this wasn't particulary significant for me.
Attached is a screenshot of the rough solution I made to my heavily modified version of the COSIPY code.
I think if something like this could be incorporated into COSIPY it would greatly benefit the community - especially those with access to limited computational resources. Perhaps a user-adjustable parameter, that allows the user to choose how frequently to write results to the output file in order to conserve memory.
The text was updated successfully, but these errors were encountered: