Differing statistical results between AWS and VM instances #2481

cwmac · 2024-01-31T04:48:08Z

cwmac
Jan 31, 2024

Having an issue that I can't seem to solve.

Running MET11.0/METplus 5.0 on a Rocky Linux 8 VM and MET11.1/METplus 5.1 on a Rocky Linux 9 AWS box and trying to replicate grid_stat results for a 3km WRF against URMA analysis.

When I run METplus using the same model data and analysis files on both machines, the VM provides plausible statistical results whereas they are completely out to lunch on AWS - FBAR is the same value throughout all of the verification regions and MSESS is highly negative across the board, among other issues. Config files and input data match between machines outside of their respective directory structures.

I've attached the grid_stat log from the AWS machine, and config files and a resulting forecast hour CNT output from each machine for your review. Let me know if there's anything else I can provide that might be of use.

AWSvVM-verif.zip

Answered by jprestop

Feb 5, 2024

Thank you @cwmac! Could you please update your values for MET_PYTHON_CC and MET_PYTHON_LD as shown below?

MET_PYTHON_CC='-I/usr/local/include/python3.12'
MET_PYTHON_LD='-L/usr/local/lib/python3.12/config-3.12-x86_64-linux-gnu -L/usr/local/lib  -lpython3.12 -ldl  -lm'

Then, please try recompiling MET, and let us know how it goes.

View full answer

JohnHalleyGotway · 2024-01-31T16:49:57Z

JohnHalleyGotway
Jan 31, 2024
Maintainer

Chris,

I see that you're seeing unexpected differences in the CNT output computed by Grid-Stat version 11.0.0 on your VM versus 11.1.0 on AWS. Thank's for passing along sample data to illustrate. I definitely agree that it looks like the AWS v11.1.0 output is out to lunch.

I just picked off the first CNT line of output (for TMP/Z2) and compared the VM 11.0.0 output to the AWS 11.1.0 output, and note the following:

The OBAR and OSTDEV columns match exactly.
The FBAR and FSTDEV columns differ greatly.
The differencing in the remaining columns are likely caused by the difference in the forecast values.

Looking in the METplus log file, I see that this output was generated with the following command:

/home/chris.macintosh/met/bin/grid_stat \
-v 2 \
/data/baron/2023060100/baron3k.2023060100.f006.grib2 \
/data/urma/2023060100/urma2p5.2023060106.2dvaranl.grb2 \
/home/chris.macintosh/verif/metplus/parm/met_config/GridStatConfig_wrapped \
-outdir /data/output/sfc/grid_stat/2023060100

I suspect that Grid-Stat is handling the URMA input just fine, but we're seeing differing behavior with the forecast data. So I'd like to start by checking how MET reads the GRIB2 forecast file: /data/baron/2023060100/baron3k.2023060100.f006.grib2

If possible please try running the following to plot the TMP/Z2 data:

On VM using MET version 11.0.0:

/home/chris.macintosh/met/bin/plot_data_plane \
/data/baron/2023060100/baron3k.2023060100.f006.grib2 \
VM_11.0.0_TMP_Z2.ps \
'name="TMP"; level="Z2";' \
-v 3 \
-log plot_VM_11.0.0_TMP_Z2.log

On AWS using MET version 11.1.0... run the same command, but you'll need to figure out the right paths yourself because I don't see a METplus AWS logfile.

My guess is that you'll see very different values plotted (and listed in the DEBUG(3) log output), with the VM data being "correct" and the AWS data "incorrect". And the next question will be why?

If that is indeed the case, please send me...

the baron3k.2023060100.f006.grib2 file so I can take a closer look
the differing PostScript plot images (e.g. VM_11.0.0_TMP_Z2.ps)
the plot_data_plane log files (e.g. plot_VM_11.0.0_TMP_Z2.ps)

You can post it to our anonymous ftp site following these instructions.

0 replies

cwmac · 2024-01-31T17:50:22Z

cwmac
Jan 31, 2024
Author

Hey John,

As expected, the VM ps looks normal while the AWS one is a constant temp of 266K. Files have been uploaded to the FTP server; I also included the model file from both machines just in case.

0 replies

JohnHalleyGotway · 2024-01-31T22:57:38Z

JohnHalleyGotway
Jan 31, 2024
Maintainer

@cwmac thanks for passing along the data. Indeed, this definitely looks like a problem in the AWS build of MET. I'll note that checking the GRIB2 files you posted, the wgrib2 utility is happy with both:

wgrib2 -d 4 -V baron3k.2023060100.f006.grib2.aws
4:1738460:vt=2023060106:2 m above ground:360 min fcst:TMP Temperature [K]:
    ndata=1960000:undef=0:mean=289.782:min=266.433:max=301.411
...
wgrib2 -d 4 -V baron3k.2023060100.f006.grib2.vm 
4:1738456:vt=2023060106:2 m above ground:360 min fcst:TMP Temperature [K]:
    ndata=1960000:undef=0:mean=289.782:min=266.433:max=301.411

The -V verbose option reports the same mean/min/max values for both. It is interesting that the problematic AWS plot shows a constant field of all 266.433 which just so happens to be the minimum value in that field.

I'm happy to report that my local installation of MET version 11.1.0 plots the full range of data just fine. And at verbosity level 4 (-v 4) it reports the expected range of values:

DEBUG 4: Data plane information:
DEBUG 4:    plane min: 266.433
DEBUG 4:    plane max: 301.411

So I'm pretty confident that there's just an issue in the way MET 11.1.0 was compiled in the AWS instance. And for GRIB2 issues, I always start by looking at how the grib2c library was compiled. Please see the 4th item in this list and note the following in particular:

Please note that compiling the GRIB2C library with the -D__64BIT__ option requires that MET also be configured with CFLAGS=”-D__64BIT__”. Compiling MET and the GRIB2C library inconsistently may result in a segmentation fault or an “out of memory” error when reading GRIB2 files.

@jprestop is the person who knows the most of installing MET. And she uses/maintains this compile_MET_all.sh shell script to do so. The commands for compiling the G2C library start on line 490.

Can you please take a look on the AWS instance to see how the G2C library was compiled? What version is being compiled? Do you see "64BIT" used anywhere? Hopefully once we get the compilation of the G2C library sorted, and recompile MET, it will behavior properly on AWS.

0 replies

cwmac · 2024-02-01T18:45:02Z

cwmac
Feb 1, 2024
Author

I'm compiling the dependencies via the wgrib2 package. g2clib version is 1.4.0. I put the make log on the FTP server for your review.

2 replies

cwmac Feb 1, 2024
Author

I did not see 64BIT anywhere. Have re-compiled multiple times today and now I'm getting "out of memory" errors, which definitely suggests the 64BIT issue.

jprestop Feb 2, 2024
Maintainer

Hi @cwmac. Thank you for uploading your make.log file. I am not familiar with the compilation of dependencies via the wgrib2 package. Looking at your make.log file, I see compilation for the following libraries only, but not for g2clib:

netcdf-3.6.3
zlib-1.2.12
libpng-1.2.59
libip2_d
libsp_v2.0.2_d
libgeo
proj-4.8.0
openjpeg-2.5.0

Like you said, "I'm getting "out of memory" errors, which definitely suggests the 64BIT issue." those out of memory errors would point to a 64BIT issue. In our older MET documentation, we stated:

Please note that compiling the GRIB2C library with the -D__64BIT__ option requires that MET also be configured with CFLAGS=”-D__64BIT__”. Compiling MET and the GRIB2C library inconsistently may result in a segmentation fault or an “out of memory” error when reading GRIB2 files.

However, the GRIB2C library developers removed that option entirely back in August 2021.

I would recommend following the instructions for compiling MET and its dependencies under the section "Sample Script For Compiling External Libraries And MET" on this webpage.

cwmac · 2024-02-05T15:05:37Z

cwmac
Feb 5, 2024
Author

Thanks @jprestop. I have gone through the compile_met_all.sh scripts and compiled the dependencies that way; however, once I get to compiling MET, getting a ton of Python errors on the ensemble_stat set. Sent the make.log (make-20240203.log) to the server for your review.

5 replies

jprestop Feb 5, 2024
Maintainer

Hi @cwmac. Thanks for passing along the make-20240203.log file. Could you please pass along your config.log file from the MET directory? I have a feeling your settings for Python embedding may not be set correctly and the config.log file will contain that (among other) helpful information.

cwmac Feb 5, 2024
Author

Sure @jprestop, config-20240203.log is on the server now.

jprestop Feb 5, 2024
Maintainer

Thank you @cwmac! Could you please update your values for MET_PYTHON_CC and MET_PYTHON_LD as shown below?

MET_PYTHON_CC='-I/usr/local/include/python3.12'
MET_PYTHON_LD='-L/usr/local/lib/python3.12/config-3.12-x86_64-linux-gnu -L/usr/local/lib  -lpython3.12 -ldl  -lm'

Then, please try recompiling MET, and let us know how it goes.

Answer selected by cwmac

cwmac Feb 5, 2024
Author

@jprestop, that was the problem - output stats look reasonable now! Thanks to you and @JohnHalleyGotway for all the help!

jprestop Feb 5, 2024
Maintainer

Great news, @cwmac! Thanks for letting us know that everything seems to be in order now. If you don't have any further questions, please select "Mark as Answer" for the answer you feel best resolved the problem and then I'll lock this Discussion to prevent future posts. Our team has decided that once discussions have been answered, and the answer has been confirmed, we’ll lock them. We want to encourage users to ask new questions in new discussions rather than posting to old ones. Hopefully that’ll make the questions/answers easier for other users to follow. So if/when more issues/questions arise, please feel free to start a new discussion.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Differing statistical results between AWS and VM instances #2481

{{title}}

Replies: 5 comments 7 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Differing statistical results between AWS and VM instances #2481

cwmac Jan 31, 2024

Replies: 5 comments · 7 replies

JohnHalleyGotway Jan 31, 2024 Maintainer

cwmac Jan 31, 2024 Author

JohnHalleyGotway Jan 31, 2024 Maintainer

cwmac Feb 1, 2024 Author

cwmac Feb 1, 2024 Author

jprestop Feb 2, 2024 Maintainer

cwmac Feb 5, 2024 Author

jprestop Feb 5, 2024 Maintainer

cwmac Feb 5, 2024 Author

jprestop Feb 5, 2024 Maintainer

cwmac Feb 5, 2024 Author

jprestop Feb 5, 2024 Maintainer

cwmac
Jan 31, 2024

Replies: 5 comments 7 replies

JohnHalleyGotway
Jan 31, 2024
Maintainer

cwmac
Jan 31, 2024
Author

JohnHalleyGotway
Jan 31, 2024
Maintainer

cwmac
Feb 1, 2024
Author

cwmac Feb 1, 2024
Author

jprestop Feb 2, 2024
Maintainer

cwmac
Feb 5, 2024
Author

jprestop Feb 5, 2024
Maintainer

cwmac Feb 5, 2024
Author

jprestop Feb 5, 2024
Maintainer

cwmac Feb 5, 2024
Author

jprestop Feb 5, 2024
Maintainer