Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance METviewer to aggregate and plot the HSS_EC statistic from the MCTS line type. #285

Closed
8 of 21 tasks
JohnHalleyGotway opened this issue Jun 8, 2021 · 14 comments · Fixed by #298 or #318
Closed
8 of 21 tasks
Assignees
Labels
METviewer: User Interface priority: high High Priority requestor: NOAA/CPC NOAA Climate Prediction Center required: FOR DEVELOPMENT RELEASE Required to be completed in the development release for the assigned project type: enhancement Improve something that it is currently doing

Comments

@JohnHalleyGotway
Copy link
Contributor

JohnHalleyGotway commented Jun 8, 2021

Describe the Enhancement

Heidke Skill Score is being updated to accommodate CPC needs. The MCTS line type in MET version 10.1.0 will include two new columns, HSS_EC and EC_VALUE. The EC_VALUE column may also be added to the MCTC line type to avoid making the EC_VALUE a new required configuration option.

MET issue to generate new statistics: dtcenter/MET#1749
METdatadb loader issue: dtcenter/METdataio#54

METviewer should be enhanced to plot the HSS_EC statistic in the exact same way that the existing MCTS:HSS statistics is handled.

Will also need to enhance METcalcpy to updated the MCTC to MCTS aggregation logic:

  • When aggregating multiple MCTC lines, the dimensions and the EC_VALUE would need to remain the same.

Corresponding issue in METcalcpy:
dtcenter/METcalcpy#107

Time Estimate

Estimate the amount of work required here.
Issues should represent approximately 3days of work.

Sub-Issues

Consider breaking the enhancement down into sub-issues.

  • Add a checkbox for each sub-issue here.

Relevant Deadlines

List relevant project deadlines here or state NONE.

Funding Source

Define the source of funding and account keys here or state NONE.

Define the Metadata

Assignee

  • Select engineer(s) or no engineer required: Tatiana
  • Select scientist(s) or no scientist required: John O

Labels

  • Select component(s)
  • Select priority
  • Select requestor(s)

Projects and Milestone

  • Select Repository and/or Organization level Project(s) or add alert: NEED PROJECT ASSIGNMENT label
  • Select Milestone as the next official version or Future Versions

Define Related Issue(s)

Consider the impact to the other METplus components.

Enhancement Checklist

See the METplus Workflow for details.

  • Complete the issue definition above, including the Time Estimate and Funding Source.
  • Fork this repository or create a branch of develop.
    Branch name: feature_<Issue Number>_<Description>
  • Complete the development and test your changes.
  • Add/update log messages for easier debugging.
  • Add/update unit tests.
  • Add/update documentation.
  • Push local changes to GitHub.
  • Submit a pull request to merge into develop.
    Pull request: feature <Issue Number> <Description>
  • Define the pull request metadata, as permissions allow.
    Select: Reviewer(s) and Linked issues
    Select: Repository level development cycle Project for the next official release
    Select: Milestone as the next official version
  • Iterate until the reviewer(s) accept and merge your changes.
  • Delete your fork or branch.
  • Close this issue.
@TatianaBurek
Copy link
Collaborator

Test data on kiowa
/d1/projects/MET/MET_pull_requests/met-10.1.0/met-10.1.0_beta1/feature_1749/MET-feature_1749_hss/test_output/grid_stat

@TatianaBurek
Copy link
Collaborator

4 new columns were added to the end of the mcts line type:
HSS_EC HSS_EC_BCL HSS_EC_BCU EC_VALUE

grid_stat_GRIB1_NAM_STAGE4_120000L_20120409_120000V_mcts.txt

@TatianaBurek
Copy link
Collaborator

@JohnHalleyGotway
Could you explain or point me to how to calculate MCTS statistics using MCTC aggregation logic.

@JohnHalleyGotway
Copy link
Contributor Author

@TatianaBurek HSS_EC is computed by this function:
ContingencyTable::gheidke_ec(double ec_value)

To compute it from the MCTC line type, you'll need the counts of NxN table from that line plus the new EC_VALUE value from the last column. The EC_VALUE is a number between 0 and 1.

And compute HSS_EC = ( DIAG_COUNT - (EC_VALUE * N) ) / (N - (EC_VALUE * N)) where DIAG_COUNT is the sum of the counts on the diagonal and N is the sum of the counts across the whole MCTC table.

@TatianaBurek
Copy link
Collaborator

@JohnHalleyGotway
For the given data how do I calculate DIAG_COUNT and N:
i_value j_valye fi_oj
1 1 49519
1 2 225
1 3 294
1 4 184
2 1 69
2 2 19
2 3 68
2 4 67
3 1 9
3 2 14
3 3 32
3 4 41
4 1 0
4 2 0
4 3 0
4 4 0

DIAG_COUNT = (i_value=1, j_valye=1) +(i_value=2, j_valye=2)+ (i_value=3, j_valye=3)+(i_value=4, j_valye=4) =49519+19+32+0 ???
N=49519+225+294+184..... ???

@JohnHalleyGotway
Copy link
Contributor Author

@TatianaBurek yes, you've got it exactly right. DIAG_COUNT is the sum of counts where i == j and N is the sum of all counts. Assuming EC_VALUE = 0.25 for this line, we'd have:

DIAG_COUNT = 49519 + 19 + 32 + 0 = 49570
N = 50541
EC_VALUE = 0.25
HSS_EC = ( 49570 - (0.25 * 50541) ) / (50541 - (0.25 * 50541)) = 36934.75 / 37905.75 = 0.97438

@TatianaBurek
Copy link
Collaborator

@JohnHalleyGotway
Is the N_CAT value (Dimension of the contingency table) has to be the same for all individual series or it can be different?
Similar to PCT thresholds

@JohnHalleyGotway
Copy link
Contributor Author

JohnHalleyGotway commented Jul 20, 2021

@TatianaBurek, yes, when aggregating multiple MCTC lines together, I do think it makes sense to require that N_CAT remain constant across the input lines. In fact, I expect that typically, the actual thresholds listed in the FCST_THRESH and OBS_THRESH columns would also remain constant. However, if we're not enforcing that requirement in the aggregation of PCT lines, let's not enforce it when aggregating MCTC lines either.

If the script already has the FCST_THRESH and OBS_THRESH columns available to it, you could add a check to see if the string remain constant (separate for FCST_THRESH and OBS_THRESH). And if not, print a warning message to the plot log file about that.

When N_CAT changes while aggregating MCTC lines, Stat-Analysis prints the following type of error message:

ERROR  : aggr_mctc_lines() -> when aggregating MCTC lines the size of the contingency table must remain the same for all lines.  Try setting "-column_eq N_CAT n", 4 != 3

But if the list of thresholds change while N_CAT remains constant, you only get warning messages, like this:

WARNING: For case "", found 2 unique FCST_THRESH values: >0.0,>4.0,>10.0,>0.0,>5.0,>10.0
WARNING: For case "", found 2 unique OBS_THRESH values: >0.0,>4.0,>10.0,>0.0,>5.0,>10.0

@TatianaBurek
Copy link
Collaborator

@JohnHalleyGotway
What if I have more then one rows of data. Each row has it's own counts table and I have n-number of tables. How do I calculate HSS_EC for all rows?
Calculate HSS_EC for the each row and then calculate mean? But this is like the summary statistics...

@JohnHalleyGotway
Copy link
Contributor Author

The exact same aggregation vs summary logic that we use for CTC counts/CTS stats, and for SL1L2 sums/CNT stats, and for PCT counts/PSTD stats applies here.

The "summary" method for HSS_EC would be to compute the stat separately for each input line and then report the mean or median of those scores, based on the user configuration.

The "aggregation" method would be to first aggregate those multiple MCTC lines into one large one, and the derive a single aggregated HSS_EC statistic.

The MCTC aggregation logic is exactly the same as what we do for CTC and PCT... just sum up each cell of the table as well as the TOTAL count.

The same logic should apply to any of the other MCTS statistics that METviewer is serving up.

@TatianaBurek
Copy link
Collaborator

I only have a formula for HSS_EC.
What are other stats and how to calculate them?

@JohnHalleyGotway
Copy link
Contributor Author

JohnHalleyGotway commented Jul 22, 2021 via email

@TatianaBurek
Copy link
Collaborator

I created a separate issue related to the adding other stats:
#314

@TatianaBurek
Copy link
Collaborator

We're not enforcing the requirement of FCST_THRESH and OBS_THRESH columns being constant in the aggregation of PCT lines so I do not enforce it when aggregating MCTC lines either.

TatianaBurek added a commit that referenced this issue Aug 12, 2021
@TatianaBurek TatianaBurek linked a pull request Aug 12, 2021 that will close this issue
12 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
METviewer: User Interface priority: high High Priority requestor: NOAA/CPC NOAA Climate Prediction Center required: FOR DEVELOPMENT RELEASE Required to be completed in the development release for the assigned project type: enhancement Improve something that it is currently doing
Projects
None yet
5 participants